Skip to content

[WORKFLOW] Geospatial Health Informatics Using Census Microdata #2

@TheCedarPrince

Description

@TheCedarPrince

Issue Description

Difficulty: Intermediate
Time: 24 hours

Description:
This issue is to write a blog post introducing geospatial health informatics using census microdata, focusing on workflows enabled by Julia packages such as GeoMakie.jl and IPUMS.jl. The post will explain how to integrate geospatial data (e.g., administrative boundaries) with census microdata to analyze population-level patterns, visualize distributions, and derive insights such as educational attainment across regions.

The goal is to demonstrate a practical, reproducible workflow for combining geospatial and survey/census data in Julia, highlighting data acquisition, preprocessing, mapping, and visualization techniques for health informatics applications.

NOTE: While the examples here use Poland as an example, we will most likely have to adapt the visualizations to publicly available IPUMS data as this uses confidential IPUMS data; further discussion to be had.

Requirements

  • Introduce geospatial health informatics and its relevance to public health research.
  • Explain the role of census microdata (e.g., IPUMS) and geospatial datasets in understanding population-level trends.
  • Introduce GeoMakie.jl and related packages (CairoMakie.jl, GeoInterfaceMakie.jl, GeoDataFrames.jl) for geospatial visualization in Julia.
  • Introduce IPUMS.jl for loading and processing census microdata.
  • Demonstrate data acquisition:
    • Load census microdata and metadata via IPUMS.jl.
  • Show preprocessing steps:
    • Filter and clean geospatial and census data.
    • Assign categorical labels (e.g., educational attainment levels) to microdata.
    • Normalize counts across regions for comparative visualization.
  • Demonstrate integration of geospatial and census data:
    • Merge microdata aggregates with spatial geometries.
  • Visualize results:
    • Create choropleth maps of census-derived metrics (e.g., education levels).
    • Use multiple color schemes and add legends/colorbars for clarity.
    • Optimize map aesthetics (titles, labels, and figure layout).
  • Provide reproducible Julia code with clear documentation for each step.

Expected Outcomes

  1. A blog post draft introducing geospatial health informatics using census microdata.
  2. Practical Julia code examples showing data acquisition, preprocessing, integration, and visualization.
  3. Reproducible workflow demonstrating how to map population-level indicators across regions.
  4. Clear guidance on using GeoMakie.jl and IPUMS.jl together for public health research.
  5. Visualizations including normalized educational attainment across Polish voivodeships as an example.
  6. Recommendations for generalizing the workflow to other countries, datasets, or health indicators.

Notes

Reference Materials

Example Outputs

For a full worked example, see the attached PDF which includes code snippets and visualizations using this combination of data.

geospatial_poland_example.pdf

Example Code Snippets for Implementers

Load Required Packages

using CairoMakie, GeoMakie, GeoInterfaceMakie, GeoDataFrames, DataFrames, StatsBase
import IPUMS: load_ipums_extract, load_ipums_nhgis, parse_ddi

Load IPUMS Census Data

ddi = parse_ddi("poland_data/ipumsi_00001.xml")
df = load_ipums_extract(ddi, "poland_data/ipumsi_00001.dat")

Load and Filter Shapefile for Poland

geodf = load_ipums_nhgis("shapefiles/ENUTS2_2013.shp").geodataframe
filter!(x -> x.CNTRY_NAME == "Poland", geodf)
dropmissing!(geodf, :geometry)
geodf[!, :ENUTS2] = parse.(Int64, geodf[!, :ENUTS2])

Aggregate Educational Attainment by Region

using Chain

edu_df = @chain df begin
  groupby(_[:, [:ENUTS2_2013, :EDUCPL]], [:ENUTS2_2013, :EDUCPL])
  combine(nrow => :Count)
end

primary = [12, 20]
secondary = [40, 41, 42, 43, 50]
university = [70, 71, 72, 73]

edu_df.EDUCPL = convert(Vector{Any}, edu_df.EDUCPL)
replace!(x -> in(x, primary) ? "PRIMARY" : x, edu_df.EDUCPL)
replace!(x -> in(x, secondary) ? "SECONDARY" : x, edu_df.EDUCPL)
replace!(x -> in(x, university) ? "UNIVERSITY" : x, edu_df.EDUCPL)

edu_counts = @chain edu_df begin
  filter!(row -> !isa(row.EDUCPL, Real), _)
  groupby([:ENUTS2_2013, :EDUCPL])
  combine(:Count => sum => :Count)
  groupby([:EDUCPL])
end

Visualize Educational Attainment Across Regions

fig = Figure(size = (1200, 1400), fontsize = 20)
axs = [Axis(fig[x, y]) for x in 1:2 for y in 1:2]

poly!(axs[1], geodf.geometry, color = :white, strokecolor = :black, strokewidth = 2)
axs[1].title = "Voivodeships of Poland"
hidedecorations!(axs[1])
Label(fig[:, :, Top()], "Normalized Educational Attainment across Poland", fontsize = 50, padding = (0,0,30,0))

for (idx, counts) in enumerate(edu_counts)
    geo_counts = outerjoin(counts, geodf; on = [:ENUTS2_2013 => :ENUTS2])
    norm_counts = (geo_counts.Count .- minimum(geo_counts.Count)) / (maximum(geo_counts.Count) .- minimum(geo_counts.Count))

    cmap = cgrad(:Wistia, norm_counts)
    dropmissing!(geo_counts, :geometry)

    ax = axs[idx + 1] 
    poly!(ax, geo_counts.geometry, color = cmap[norm_counts], strokecolor = :black, strokewidth = 2)
    ax.title = "$(counts.EDUCPL |> first)"
    hidedecorations!(ax)
end

Colorbar(fig[:, 3], limits = (0, 1), colormap = :Wistia)
fig

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationworkflowWorkflow supported by JuliaHealth

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions