Co-location pattern mining (for CSCI 5715) Charandeep Parisineti
Download
Report
Transcript Co-location pattern mining (for CSCI 5715) Charandeep Parisineti
Co-location pattern mining
(for CSCI 5715)
Charandeep Parisineti,
Bhavtosh Rath
Chapter 7: Spatial Data Mining
[1]Yan Huang, Shashi Shekhar, Hui Xiong. Discovering Co-location patterns from Spatial
Datasets: A General Approach. IEEE Transactions on Knowledge and Data Engineering, 2004.
[2]Sajib Barua, Jörg Sander. Mining Statistically Significant Co-location and Segregation
Patterns. IEEE Transactions on Knowledge and Data Engineering, 26(5), 2014.
[3] Shashi Shekar, et al., Trends in Spatial Data Mining
Where do we find Co –location?
In ecology
Symbiotic species : Ox-pecker and giraffe
Public Safety
To determine possible causes of disease outbreak
(London cholera)
In cities
{‘auto dealers’, ‘auto repair shops’}
{‘departmental stores’, ’gift stores’}
Other domains: Earth science, public
safety, transportation, tourism etc.
What are association rules?
Association rules vs Co-location rules
Class Exercise:
Co-location rule mining approaches
b) Reference feature centric approach c) Data partition approach d) Event centric model
• Co-location (A,B) will not be found in (b) since it does not involve the reference feature
• Data partition approach can have many distinct ways of partitioning the data, each yielding
a distinct set of transactions and hence support.
In (c) support of (A,B) is different for different partitions
Event centric model finds the subsets of spatial features likely to occur in a neighborhood
around instances of given subsets of event types
Event Centric model
Drawbacks of event centric model
• In (a) there are only a few instances of A but B is abundant.
As many Bs are without As B’s participation ratio will be small which results the
participation index of {A,B} to be low
• In (b) A and B are abundant in a spatial area but randomly distributed
We might see enough instances of {A,B} even without true spatial dependency
• In (c) both features A and B are spatially auto-correlated. A cluster of A and a cluster of B
happen to overlap by chance.
Enough instances of {A,B} will falsely report a spatial co-location
Statistical approaches
Similar problems as above in association rule mining are handled using Interest
measures such as phi coefficient for a 2x2 contingency table
Coffee
~Coffee
Tea
15
5
~Tea
75
5
~Coffee means the transactions which
don’t contain coffee
The absence of an item doesn’t make sense in spatial data as boolean spatial
features are embedded in continuous space
Hence phi coefficient, odds ratio etc, used in traditional data mining don’t work for
Spatial data
The basic idea in Spatial statistical approach is comparing the observed PI value to a
PI value under no spatial relationship instead of a global threshold
The process is repeated over several spaces with CSR using Monte Carlo simulation
Computing co-location patterns using cross k function for all possible co-location
patterns can be computationally expensive
Q?