Issue Description
Difficulty: Intermediate
Time: 24 hours
Description:
This issue is to write a blog post introducing geospatial health informatics using census microdata, focusing on workflows enabled by Julia packages such as GeoMakie.jl and IPUMS.jl. The post will explain how to integrate geospatial data (e.g., administrative boundaries) with census microdata to analyze population-level patterns, visualize distributions, and derive insights such as educational attainment across regions.
The goal is to demonstrate a practical, reproducible workflow for combining geospatial and survey/census data in Julia, highlighting data acquisition, preprocessing, mapping, and visualization techniques for health informatics applications.
NOTE: While the examples here use Poland as an example, we will most likely have to adapt the visualizations to publicly available IPUMS data as this uses confidential IPUMS data; further discussion to be had.
Requirements
Expected Outcomes
- A blog post draft introducing geospatial health informatics using census microdata.
- Practical Julia code examples showing data acquisition, preprocessing, integration, and visualization.
- Reproducible workflow demonstrating how to map population-level indicators across regions.
- Clear guidance on using
GeoMakie.jl and IPUMS.jl together for public health research.
- Visualizations including normalized educational attainment across Polish voivodeships as an example.
- Recommendations for generalizing the workflow to other countries, datasets, or health indicators.
Notes
Reference Materials
Example Outputs
For a full worked example, see the attached PDF which includes code snippets and visualizations using this combination of data.
geospatial_poland_example.pdf
Example Code Snippets for Implementers
Load Required Packages
using CairoMakie, GeoMakie, GeoInterfaceMakie, GeoDataFrames, DataFrames, StatsBase
import IPUMS: load_ipums_extract, load_ipums_nhgis, parse_ddi
Load IPUMS Census Data
ddi = parse_ddi("poland_data/ipumsi_00001.xml")
df = load_ipums_extract(ddi, "poland_data/ipumsi_00001.dat")
Load and Filter Shapefile for Poland
geodf = load_ipums_nhgis("shapefiles/ENUTS2_2013.shp").geodataframe
filter!(x -> x.CNTRY_NAME == "Poland", geodf)
dropmissing!(geodf, :geometry)
geodf[!, :ENUTS2] = parse.(Int64, geodf[!, :ENUTS2])
Aggregate Educational Attainment by Region
using Chain
edu_df = @chain df begin
groupby(_[:, [:ENUTS2_2013, :EDUCPL]], [:ENUTS2_2013, :EDUCPL])
combine(nrow => :Count)
end
primary = [12, 20]
secondary = [40, 41, 42, 43, 50]
university = [70, 71, 72, 73]
edu_df.EDUCPL = convert(Vector{Any}, edu_df.EDUCPL)
replace!(x -> in(x, primary) ? "PRIMARY" : x, edu_df.EDUCPL)
replace!(x -> in(x, secondary) ? "SECONDARY" : x, edu_df.EDUCPL)
replace!(x -> in(x, university) ? "UNIVERSITY" : x, edu_df.EDUCPL)
edu_counts = @chain edu_df begin
filter!(row -> !isa(row.EDUCPL, Real), _)
groupby([:ENUTS2_2013, :EDUCPL])
combine(:Count => sum => :Count)
groupby([:EDUCPL])
end
Visualize Educational Attainment Across Regions
fig = Figure(size = (1200, 1400), fontsize = 20)
axs = [Axis(fig[x, y]) for x in 1:2 for y in 1:2]
poly!(axs[1], geodf.geometry, color = :white, strokecolor = :black, strokewidth = 2)
axs[1].title = "Voivodeships of Poland"
hidedecorations!(axs[1])
Label(fig[:, :, Top()], "Normalized Educational Attainment across Poland", fontsize = 50, padding = (0,0,30,0))
for (idx, counts) in enumerate(edu_counts)
geo_counts = outerjoin(counts, geodf; on = [:ENUTS2_2013 => :ENUTS2])
norm_counts = (geo_counts.Count .- minimum(geo_counts.Count)) / (maximum(geo_counts.Count) .- minimum(geo_counts.Count))
cmap = cgrad(:Wistia, norm_counts)
dropmissing!(geo_counts, :geometry)
ax = axs[idx + 1]
poly!(ax, geo_counts.geometry, color = cmap[norm_counts], strokecolor = :black, strokewidth = 2)
ax.title = "$(counts.EDUCPL |> first)"
hidedecorations!(ax)
end
Colorbar(fig[:, 3], limits = (0, 1), colormap = :Wistia)
fig
Issue Description
Difficulty: Intermediate
Time: 24 hours
Description:
This issue is to write a blog post introducing geospatial health informatics using census microdata, focusing on workflows enabled by Julia packages such as
GeoMakie.jlandIPUMS.jl. The post will explain how to integrate geospatial data (e.g., administrative boundaries) with census microdata to analyze population-level patterns, visualize distributions, and derive insights such as educational attainment across regions.The goal is to demonstrate a practical, reproducible workflow for combining geospatial and survey/census data in Julia, highlighting data acquisition, preprocessing, mapping, and visualization techniques for health informatics applications.
Requirements
GeoMakie.jland related packages (CairoMakie.jl,GeoInterfaceMakie.jl,GeoDataFrames.jl) for geospatial visualization in Julia.IPUMS.jlfor loading and processing census microdata.IPUMS.jl.Expected Outcomes
GeoMakie.jlandIPUMS.jltogether for public health research.Notes
Reference Materials
Example Outputs
For a full worked example, see the attached PDF which includes code snippets and visualizations using this combination of data.
geospatial_poland_example.pdf
Example Code Snippets for Implementers
Load Required Packages
Load IPUMS Census Data
Load and Filter Shapefile for Poland
Aggregate Educational Attainment by Region
Visualize Educational Attainment Across Regions