Transcript aggregation
Health Datasets in Spatial
Analyses: The General Overview
Lukáš MAREK
[email protected]
Department of Geoinformatics, Faculty of Science, Palacky
University in Olomouc, Czech Republic
www.geoinformatics.upol.cz
INTRODUCTION
• Advanced methods for spatial analyses
• Exploration of spatial pattern
• Spatial statistics
• Visualization and presentation for nongeographers (doctors, specialist)
www.geoinformatics.upol.cz
SPATIAL EPIDEMIOLOGY
• Disease mapping
– Visual description of spatial variability of the disease incidence
– Maps of incidence risk, identification of areas with high risk
• Analyses of spatial pattern
– Exploration of spatial and spatio-temporal patterns in data
– Disease clusters, randomness, …
• Geographic correlation studies
– Analysis of associations among the incidence and environmental
factors
www.geoinformatics.upol.cz
HEALTH AND MEDICAL DATA
• require specific procedures because of their
confidentiality
– management, presentation and operations
• aggregated, anonymized or incomplete data
sets
• usage of suitable analytical procedures, while
the uncertainty and the inaccuracy of data
characteristics need to be taken into account
www.geoinformatics.upol.cz
DATA PROVIDERS
• International organizations
– WHO, EUROSTAT, OECD
• INSPIRE directive
– Theme Human health and safety (Annex III)
• Institute of Health Information and Statistics
of the Czech Republic
• Czech Statistical Office
• National Institute of Public Health
www.geoinformatics.upol.cz
DATA TYPES
• Case-event data
– locations of individual cases of a disease, or of individual
members of a suitable control group, or covariates.
• Irregular lattice data
– measures aggregated/averaged to the level of census
tracts or other type of administrative district.
• Regular lattice data
– measures aggregated/averaged to a regular grid (typically
arising from remote sensing).
• Geostatistical data
– measurements sampled at point locations.
www.geoinformatics.upol.cz
DATA PRIVACY
• Health and medical data = private,
confidential and sensitive data
• Public health reporting systems and medical
registries were committed to the protection of
the privacy of the individual
• usefulness of the local scale analysis X privacy
protection
• Availability, accessibility and restrictions
www.geoinformatics.upol.cz
SCALE OF THE DATA
• Crucial methodological aspect
• Addresses or coordinates are the most important
information for spatial analyses
– But privacy can be easily abused
• Unlikely to explore the relations on the individual
level (and not necessary)
• Mapping to relatively arbitrary administrative areas
– Scale sensitive information, MAUP
– Different interpretation of findings
www.geoinformatics.upol.cz
ANONYMIZATION
1) spatial and temporal aggregation,
2) adding geographic or etiologic context variables to
original unmasked data and then removing the
geographic identifiers,
3) random small-scale relocation of individual records,
4) limiting access to potentially identifiable data
through a user- and/ or function-restricted
computer environment
www.geoinformatics.upol.cz
RECORD BASED ANONYMIZATION
• Keeping all available records but prevent the reidentification
• Weak anonymization
– Locations are preserved but other properties are limited so
the reconstruction of the individual is limited
– Rarely used, outputs for the internal purposes
• Randomization
– Case locations are preserved but their true positions are
moved in certain distance and/or angle
– General picture of the spatial data distribution without
allowing the identification of individuals
www.geoinformatics.upol.cz
SCALE BASED ANONYMIZATION
• Aggregation
• Most surveillance data are published as summary
statistics for administrative level
• Areal aggregation vs. Point aggregation
• Matching the level of administrative aggregation with
the spatial resolution of data
• Results obtained from aggregated data should not be
used for making assumptions about the nature of an
association at the individual level
www.geoinformatics.upol.cz
CASE STUDY
• Czech Epidemiological Database – EPIDAT
– mandatory reporting, recording and analysis of infectious
diseases in the Czech Republic
• Salmonella cases occurrence in the Olomouc Region
in 2002 – 2011
• Aggregation of 11 000
records (in space and/or time)
www.geoinformatics.upol.cz
CHOROPLETH MAPS
• One of the most common
type of map
• Added demographic context
and irregular lattice
aggregation
• The data are aggregated to
cadastral units and the
frequency of the occurrence
is re-count to the population
• Visual tool for the analysis of
spatial distribution of
phenomenon
• Relative values
www.geoinformatics.upol.cz
• regular hexagonal grid
with the area of average
cadastral unit
• two kinds of information
– the number of
salmonella cases per
population is expressed
by the size of the
hexagon,
– population in the unit is
expressed by the colour
www.geoinformatics.upol.cz
QUADTREE MAPS
• Quadtree is a recursive algorithm that
partitions an area into four initial quadrants
and continues to divide each quadrant into
four smaller quadrants in a hierarchical way
until relatively homogeneous subareas are
obtained
• Used for the data storage, data aggregation
www.geoinformatics.upol.cz
www.geoinformatics.upol.cz
DOT DENSITY MAPS
• Usually used for the visualization of any point
phenomena
• Useful for depicting of the spatial pattern and spatial
distribution in the case of aggregated data sets
• Dots pattern creates a better visual depiction of the
phenomenon in the space
• Whether data are combined with the regular or irregular
polygon units, the dot density map allows to reidentificate individual cases at least in the certain scale
• Dots are usually plotted randomly within boundaries of
the areal unit.
www.geoinformatics.upol.cz
www.geoinformatics.upol.cz
CONCLUSIONS
• The statement about the lack of high-quality health
and medical data sets is not fully true
• The question should not be only about the existence
of data, but about their availability and the
accessibility as well as about restrictions regarding to
their usability and the usefulness of outputting
results
• Results obtained from aggregate data should not be
used for making assumptions about the nature of an
association at the individual level
www.geoinformatics.upol.cz
ACKNOWLEDGEMENT
The author gratefully acknowledge
the support by the Operational
Program Education for
Competitiveness - European Social
Fund (project CZ.1.07/2.3.00/20.0170
of the Ministry of Education, Youth
and Sports of the Czech Republic)
www.geoinformatics.upol.cz
THANK YOU FOR YOUR ATTENTION
Health Datasets in Spatial Analyses:
The General Overview
Lukáš MAREK
[email protected]
Department of Geoinformatics, Faculty of Science, Palacky University in Olomouc, Czech Republic
www.geoinformatics.upol.cz