No Slide Title

Download Report

Transcript No Slide Title

Geographical Data Mining
Stan Openshaw
Centre for Computational Geography
University of Leeds
Ian Turton, CCG, Leeds University
For the latest on Stan
http://www.geog.leeds.ac.uk/staff/s.openshaw/latest.html
Why would we want to do this?
• Geographical Data Explosion
• Public imperative
• Lack of geographically aware tools
Mountains of Data
Swamps of Data
We know what you spend...
…where you spend it...
…who you talk to...
…where you live...
LS2 9JT
What your
neighbours
are like
...Crime data and...
• crime type
• crime location
• insurance data
...Health data
• environmental data
• socio-economic data
• admissions data
Geographical Hyperspace
• Geography
– x,y co-ordinates, postcodes
• Time
– days, hours, months
• Attributes
– place - pollution sources, soil type, distance to
motorway
– cases - type of disease, age, sex
Data Mining
Turning data into knowledge
• How do these data sets fit together?
• Is there anything important hidden in here?
• Does geography make a difference?
Datatype
Nature of Data Interaction
_________________________________________
1.
spatial data
2.
time data
3.
multiple attribute data
4.
geography and time data
5.
time and multiple attribute data
6.
geography and multiple attribute data
7.
geography, time, and multiple
attribute data
HISTORICALLY
these effects have been hidden
by research design
BUT
The result is often data
strangulation
The patterns are being destroyed
or
damaged
by the research design
What is
needed is a
geographic
data mining
technology
that works
How can we do this?
• Developing new
smarter methods
• Testing them
– HPC is vital to this
process
• Disseminating them
– Internet
– Java
Being SMART is not
just a matter of
methodology but also
involves access,
usability, relevancy,
and result
communication factors
The complete novice should
be able to perform some
sophisticated geographical
analysis and get some useful
and understandable results
on the same day the work
started
User Friendly Spatial Analysis
• provides analysis that users need
• simple to perform
• highly automated making it fast and
efficient
• readily understood
• results are self-evident and can be
communicated to non-experts
• safe and trustworthy
What we did in this study
• Comparison of techniques on the same data
• Multiple techniques
–
–
–
–
–
–
GAM/K
GAM/K-T
MAPEX
GDM1/2
FLOCK
Proprietary Data Mining Tools
Study Area
Stan’s Cases
Chris’ cases
How to search the geographic
space
• Exhaustively
– GAM, GEM
• Smartly
– Genetic algorithm
• mapex, gdm
– Flocking
• boids
GAM & GEM
Mapex & GDM
FLOCK
And the Attributes...
• Exhaustively
– GAM, GEM
• Smartly
– Genetic algorithm
• mapex, gdm, boids
GAM & GEM with time
Rock A
Rock B
Rock D
Rock C
Geology Map
railway
2 km
buffer polygon
Combined Geology and Railway Buffer Map
Rock A
Rock B
2 km
Rock D
Rock C
Combinations of Attributes
• If we have 8 attributes with 10 classes each
• There are 3160 permutations of 2 classes
from 80 compared with 24,040,016 if any 5
are used
• Smart searches are essential
– use GA to generate possible combinations of
interest
Proprietary Data Miners
Results
How to visualise
them?
Results
• GAM/K
– did very well
– was not put off by time or attributes
• GAM/KT
– worked well
– time clusters found
• MAPEX / GDM/1
– worked well
Results continued
• FLOCK
– worked very well
• Data mining
– didn’t work at all well out of the box
– could have built a GAM inside them
What next?
• Build a harder data set for more tests
• Re-run the analysis
• Put it all on the web
Thanks to
• European Research Office of the US Army
• ESRC grant R237260 for paying Ian’s
salary.
• ESRC/JISC for the Census data purchase.
• OS for the bits of the maps they own.
To find out more
• Web based Multi-engine spatial analysis
tools James Macgill, Openshaw and Turton
– Session 1A - 14.00 Sunday
• Smart Crime Pattern Analysis using GAM
Ian Turton, Openshaw and Macgill
– Session 7A - 10.40 Tuesday
Contacts
Email ian,stan,[email protected]
check out smart pattern analysis on the web
http://www.ccg.leeds.ac.uk/smart
http://www.ccg.leeds.ac.uk/smart/hyper.doc
Latest news on Stan
http://www.geog.leeds.ac.uk/staff/s.openshaw/latest.html