Transcript Lecture

Geographic Aggregation
GIS & Public Health Class
Thomas Talbot
Chief, Environmental Health Surveillance Section
NYS Department of Health
April 18, 2013
W
• State health departments and federal
agencies such as NCI and CDC provide
county level health indicators.
• Stakeholders want the data at a finer
geographic scale.
Health data can be shown at
different geographic scales
•
•
•
•
Residential address
Census blocks, and tracts
ZIP codes
Towns
Concerns about release
of small area data
• Risk of disclosure of confidential
information.
• Rates of disease are unreliable due to
small numbers.
Rate maps with small numbers
provide very little information.
Rates are suppressed due to confidentiality or are unstable.
Disclosure of confidential information
Census
Blocks
Geographic Smoothed or Aggregated
Count & Rate Maps
• Protect Confidentiality so data can be
shared.
• Reduce random fluctuations in rates due
to small numbers.
Smoothed Rate Maps
• Borrow data from neighboring areas to
provide more stable rates of disease.
– Shareware tools available
– Empirical or Hierarchal Bayesian approaches
– Adaptive Spatial Filters
– Head banging
– etc.
from Talbot et al., Statistics in Medicine, 2000
Problems with smoothing
• Does not provide counts & rates for
defined geographic areas.
• Not clear how to link risk factor data with
smoothed health data.
• Methods are sometimes difficult to
understand - “black boxes”
• May not meet requirements of some
policies or legislation.
Environmental Facilities &
Cancer Incidence Map Law, 2008
§ 3-0317
• Plot cancer cases by census block, except
in cases where such plotting could make it
possible to identify any cancer patient.
• Census blocks shall be aggregated to
protect confidentiality.
Environmental Justice & Permitting
NYSDEC Commissioner Policy 29
• Incorporate existing human health data
into the environmental review process.
• Data will be made available at a fine
geographic scale.
Public Health Support for Brownfield/Land Reuse
in the Areas of Concern for the Great Lakes
CDC-RFA-TS10-1003
• Identification of community health status
indicators for areas of concern
– Environmental data
– Community health concerns
– Public health data
– Pre and post development
Aggregation
• Consider geographic scale
• Consider zone
• In the following example I randomly placed
points on a map with on average 10 points
for each grid cell.
• The observed number of points vs. the
expected number of points changes as we
move the grid or if we change the scale by
combining grids.
Talbot
Aggregated Count or Rate Maps
• Merge small areas with neighboring areas to
provide more stable rates of disease and/or
protect confidentiality.
– Aggregation can be done manually.
– Existing automated tools were difficult to use.
Original ZIP Codes
3 Years Low Birth Weight Incidence Ratios
Aggregated to 250 Births per ZIP Code Group
NYSDOH Geographic Aggregation Tool
Goal
• Aggregate small areas into larger ones.
• User decides how much aggregation is needed.
– Based on cases and/or underlying population
– Example 250 births and at least 3 low birth weight births
• Works with various levels of geography.
– Census blocks, tracts, towns, ZIP codes etc.
– Can nest one level of geography in another
• Uses open source free software.
• Can output results for use in mapping programs.
Aggregation Tool
Regions
Original Block Data †
Block
Block
Cases
Region
122300/2004
2
A
122300/2005
11
A
014500/3005
2
B
Cases
122300/2004
2
122300/2005
11
014500/3005
2
014500/3007
3
014500/3007
3
B
014500/3008
8
014500/3008
8
B
014500/3009
3
014500/3009
3
B
014500/3010
4
014500/3010
4
B
103202/2001
9
103202/2001
9
C
103202/2002
6
103202/2002
6
C
SAS or R Tool
Cases
Region
13
A
20
B
14
C
The Aggregation Process: goals
• Should form a large
number of areas
• The areas should be
reasonably compact
• The areas have
minimum values as
defined by the user.
The Aggregation Process: method
• Pair-wise merges
• Merge until the areas
have minimum values.
– Cases and/or population
– Expected numbers.
The Pairwise Merge: 1st area
• Select those areas which
require merging to meet
minimum values.
Example: 3 low birth
weight babies, 250 births
• Of those, select those
whose value is the
highest percentage of the
minimum value to merge
first.
– 20>3, 8>3 these numbers
not used
– 244/250>85/250
LBW
births
Low birth weight counts
Total births
The Pairwise merge: 2nd area
• Find the adjacent
neighbors of the
selected area
• If a boundary variable
is used, select those
neighbors that are
within the boundary
variable
The Pairwise Merge
• If there are no
adjacent neighbors,
choose the closest
area (according to
distance between
centroids)
Water
The Pairwise merge: two methods
to choose 2nd area
• Choose the area
whose centroid is
closest to the first
area
• Choose the area
which has the
smallest ratio of the
aggregation variable
to the minimum value.
– e.g. 85/250
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
† Simulated
data
9
Cases
98
Population
New York State
Descriptive Statistics
Year 2000 populated census blocks
New Regions:
Level of Aggregation
Statistic (calculated using
populated regions only)
Original Census
Blocks
6 cases
12 cases 24 cases
Number of regions
225,167
39,748
21,525
11,381
Median Population
39
385
770
1,467
Median number of cases
1
10
20
38
Median number of blocks
1
4
7
14
NYS number of cases (5 yrs)
470,000
NYS population 2000
18,976,457
Note: The range in the census block populations is 0 - 23,373 Persons
Performance Measures
• Compactness
• Homogeneity with respect to demographic
factors (measured as index of dissimilarity)
• Similar population sizes.
• Number of aggregated areas.
• Aggregated zones are completely contained
within larger areas.
– e.g. blocks aggregation areas contained within tracts
Index of dissimilarity
the percentage of one group that would have to move to a
different area in order to have a even distribution
Wikipedia
bi = the minority population of the ith area, e.g. census tract
B = the total minority population of the large geographic entity for
which
the index of dissimilarity is being calculated.
wi = the non-minority population of the ith area
W = the total non-minority population of the large geographic entity for
which
the index of dissimilarity is being calculated.
Follow-up Issues
Scale and method of aggregation will
impact map & correlation coefficients.
Modifiable area unit problem
Aggregation
Areas
Counties
ZIP Codes
Compactness
Compactness
GAT Outputs KML Files
What is R?
•
•
•
•
•
A programming language
A software environment
Similar to S or S-plus
Can do statistical computing
Has graphics capabilities
Why R?
• It’s free
• Widely used and accepted
• Works on windows, MacOS,
unix platforms
• Many user-developed packages
that add functionality
• Can run script files
Viewing the Results
• ArcGIS
• MapInfo
• Google Maps
• Google Earth
Lab Exercise
We will be trying out a beta version of GAT in the lab today.
5 years of simulated low birth weight data, NY State.
2003 ZIP Code Scale.
Socio-economic variables for race, poverty and education.
Detailed Instructions are provided for running the GAT Tool
program in the “GAT v12 Manual” which is in the GAT R
directory
You will run the program “GAT vR12” batch program to
aggregate zip codes into larger regions see next slide.
Geographic Aggregation Tool
is available on Talbot web site
Spatial Aggregation
Homework Assignment
Use the Geographic Aggregation Tool to aggregate ZIP Codes from the testdata ZIP
Code Shape File. Each aggregated area should have a minimum of 250 births
(simbir0105) and 3 low birth weight births (Simlbw0105)
1.
Make a thematic map of percent of low birth weight births for the original
unaggregated ZIP codes. Make a second thematic map of percent of low birth
weight births for your new aggregated boundaries. Each map should have at least
5 classes (categories). Use the same class breaks and colors for both maps. Make
sure you include a legend and title on the map. Use ArcGIS or Indie Mapper to
make the Thematic Maps. Attach a copy of your aggregation log file to your lab
write-up along with the two thematic maps.
2.
Open the aggregated boundary in Google Earth. Use the print screen feature in
Windows to show that the file successfully opened in Google Earth. Add the screen
shot to your write up.
3.
Provide at least one suggestion on how the GAT-R program could be made more
user friendly.
4.
Provide at least one suggestion of how the User Guide could be made more useful
or easier to understand.
5.
Provide at least one suggestion of additional features that could be added to the
program.
Lab due May 2, 2013
GeoMasking Tool
Randomly Moves Points within a user defined area
The End