Geographic Data Mining
Download
Report
Transcript Geographic Data Mining
Geographic Data Mining
Paradigms for Spatial and Spatiotemporal Data Mining
About …
• Mining from spatial and spatio-temporal
data
• Meta-mining as a discovery process
paradigm
• Processes for theory/hypothesis
management
2
Mining from spatial and
spatio-temporal data
• Rule types
• Spatial vs. spatio-temporal data
• Handling second-hand data
3
Rule types
•
•
•
•
•
Spatio-temporal associations
Spatio-temporal generalization
Spatio-temporal clustering
Evolution rules
Meta-rules
4
Spatio-temporal
associations
• X -> Y (c%, s%)
• Require the use of spatial and temporal
predicates
• For temporal association rules, the
emphasis moves form the data itself to
changes in the data
5
Spatio-temporal
generalization
• Concept hierarchies are used to
aggregate data
• Spatial-data-dominant
– ‘South Australian summers are commonly
hot and dry’
• Nonspatial-data-dominant
– ‘Hot dry summers are often experienced by
areas close to large desert systems’
6
Spatio-temporal clustering
• Similar to normal clustering
• Far more complex
• Characteristic features of objects in a
spatio-temporal region OR spatiotemporal characteristics of a set of
objects are sought
7
Evolution rules
• Explicit temporal and spatial context
• Describes the manner in which spatial
entities change over time
• Exponential number of rules can be
generated
– Example predicates
8
Example predicates
• Follows
– One cluster of objects traces the same (or
similar) spatial route as another cluster at a
later time (spatial coordinates are fixed)
9
Example predicates
• Follows
• Coincides
– One cluster of objects traces the same (or
similar) spatial path whenever a second
cluster undergoes specified activity
(temporal coordinates are fixed)
10
Example predicates
• Follows
• Coincides
• Parallels
– One cluster of objects traces the same (or a
similar) spatial pattern but offset in space
(temporal coordinates are fixed)
11
Example predicates
•
•
•
•
Follows
Coincides
Parallels
Mutates
– One cluster of objects transforms itself into
second cluster
12
Meta-rules
• Created when rule sets rather than
datasets are inspected for trends and
coincidental behaviour
• Describe observations discovered
amongst sets of rules
– The support for suggestion X is increasing
13
Spatial vs. Spatiotemporal data
• Dimensioning-up
• Time: uni-directional and linear
– Relational concepts (before, during, etc,)
are easily understood, communicated and
accommodated
• Space: bi-directional and nonlinear
14
Spatial vs. Spatiotemporal data
• Time & space: both continuous
phenomena
– Time: discrete and isomorphic with integers
• Larger granularity often selected (days,
years, etc.)
– Space: isomorphic with real numbers
• Granularity generally smaller (relative to
the domain)
15
Spatial vs. Spatiotemporal data
• Dimensioning-up strategies work poorly
• Are accepted data mining procedures
flawed?
• No: Time scale differences between
data types generally match
characteristics we wish to include in
most analyses of land-cover
16
Spatial vs. Spatiotemporal data
• For example
– Spectral time slice provides discrimination
between vegetation types
– Environmental data provide long-term
conditions witch match germination, growth
and development of largest plants
• Very often, too little consideration is
given to the appropriate temporal
scales necessary
17
Spatial vs. Spatiotemporal data
• Example
– Monitoring of wetlands in dry tropics
– The extend of these land-cover elements
varies considerably through time
– Inter-annual variability in expend is greater
than the average annual variability
– Spectral image without annual and seasonal
and without monthly rainfall and
evaporation figures is meaningless
18
Spatial vs. Spatiotemporal data
• Temporal scales used in conjunction
with spatial data often inconsistent ->
Needs to be chosen more carefully
• Considerably better results will be
achieved by a considered re-coding of
the temporal data
– Palaeo-climate reconstruction
demonstrates: Time can be associated with
the relative positioning of the Earth, the
Sun and the major planets in space
19
Spatial vs. Spatiotemporal data
• Time is a spatial phenomenon
• A point an the Earths surface (latitude,
longitude, elevation) is not static in
space, but moving through a complex
energy environment
• This movement, and the dynamics of
the energy environment is ‘time’
20
Spatial vs. Spatiotemporal data
• Three main components to the
environmental energy field
– Gravity
– Radiation
– Magnetism
• Feedback relationships: Time
21
Spatial vs. Spatiotemporal data
• Most important relationships
– Relative positions of a point on the surface
of the Earth and the Sun (Diurnal cycle)
– Orbit of the Moon around the Earth
– Orbit of the Earth around the Sun
• These relationships have a very
significant relationship with both our
natural, cultural and economic
environments
22
Spatial vs. Spatiotemporal data
• Other relationships
– Solar day: This sweeps a pattern of four
solar magnetic sectors past the Earth in
about 27 days. This correlates with a
fluctuation in the generation of lowpressure systems
23
Spatial vs. Spatiotemporal data
• Other relationships
– Solar day: 27 days
– Lunar cycle: This is a 27.3-day period in the
declination of the moon during witch it
moves north for 13.65 days and south for
13.65 days. This correlates with certain
movements of pressure systems on Earth
24
Spatial vs. Spatiotemporal data
• Other relationships
– Solar day: 27 days
– Lunar cycle: 27.3-day period
– Solar year: The orbit of the sun around the
center of the solar system. This cycle
correlates with long-term variation in a
large number of natural, cultural and
economic indices
25
Spatial vs. Spatiotemporal data
• These relate to both the Earth’s energy
environment and the sorts of scales we
are most concerned with in data mining
• Recoding the time stamp on data to a
relevant continues variable (eg. time of
the Solar year) provides most
‘intelligent’ data mining software a
considerably better chance of
identifying important relationships in
spatio-temporal data
26
Handling second-hand
data
• The need to re-use data collected for
other purposes
• Few data collection methods take into
account the non-deterministic nature of
data mining
• Results into heterogeneous data
sources being brought together
27
Possible errors
• The rules reflect the heterogeneity of
the data sets rather than any
differences in the observed
phenomenon.
• The data sets being temporally
incompatible
• The collection methods being
incompatible
28
About …
• Mining from spatial and spatio-temporal
data
• Meta-mining as a discovery process
paradigm
• Processes for theory/hypothesis
management
29
Meta-mining as a
discovery process
paradigm
• Target of mining: traditionally data itself
• Increase in data & polynomial
complexity of many mining algorithms
– Extraction of useful rules becomes difficult
• A solution: mine from either summaries
of the data or from results of previous
mining exercises
30
Meta-mining as a
discovery process
paradigm
31
Meta-mining as a
discovery process
paradigm
• For each rule generated some
‘irrelevant’ data is removed
– Support and confidence ratings must be
taken into account
– Clusters may use criteria that may mask
important outlying facts
32
About …
• Mining from spatial and spatio-temporal
data
• Meta-mining as a discovery process
paradigm
• Processes for theory/hypothesis
management
33
Processes for
theory/hypothesis
management
• Analysis into geographic, geo-social,
socio-political and environmental issues
require a more formal, strongly ethical
driven approach
– Environmental science uses a formal
scientific experimentation process requiring
the formulation and refutation of a credible
null hypothesis
34
Processes for
theory/hypothesis
management
• Data mining over the past few years
– Largely oriented towards the discovery of
previously unknown but potentially useful
rules
– Some useful rule can be mined
– Potential for either logical or statistical error
is extremely high
– Result of much data mining is at best a set
of suggested topics for further investigation
35
The process of scientific
induction
• Two distinct forms of knowledge
discovery
– Process modeling approach: Real world is
modeled in a mathematical manner
– Pattern matching approach: Prediction is
made on past experience
• Data mining is latter
36
Using data mining to
support scientific induction
37
The process of scientific
induction
• Another view of scientific induction
– Given an infinitely large hypothesis space
– Rule extracted from data used to constrain
the hypothesis space
• Very complex (search space is
exponential)
– Less than useful answers or high
computational overhead
38
Using data mining to
support scientific induction
• Develop hypotheses that will constrain
the search space by defining areas
within which the search is to take place
– Starting point: user supplied conceptual
model
– Hypothesis supported: weight is added to
confidence of conceptual model
– Hypothesis not supported: change to
conceptual model or need for external input
is indicated
39
Using data mining to
support scientific induction
40
Using data mining to
support scientific induction
• Three aspects of interest
– Able to accept alternative conceptual
models an provide a ranking. Also allows for
modification to a conceptual model
– Hypothesis generation component may yield
new unexplored insights into accepted
conceptual models
– Reasonably efficient because of directed
mining algorithms
41