Oracle Spatial 8.2 Projects Database Functions

Download Report

Transcript Oracle Spatial 8.2 Projects Database Functions

Spatial-enabled Mining in Oracle
Ravi Kothuri
Spatial Technologies
Oracle USA
Dagstuhl 2006
Copyright Oracle Corporation
Oracle Spatial: Store, Analyze
and Visualize Spatial Data
Spatial Data Types
Mapviewer
Oracle10g
Spatial
Vector (feature/topological), Spatial Relationships
Raster,
Route Computation
Network types,
Raster Manipulation
Versioning
Visualization
Scalability & Seamless Integration for Spatial Data
Dagstuhl 2006
Copyright Oracle Corporation
Oracle Spatial: Future Projects
• 3-D
– Extensions to SDO_GEOMETRY
• Composite Surface and Composite/Multi-Solid
• Support different operators: Anyinteract, Filter, NN,
Within_distance
– Scalable Storage and Management of PointCloud Data:
Partitioning and Visibility Query (LOD)
• TIN generation: need to experiment with variety
of approaches
• Intelligent Map Caching, WFS,…
Dagstuhl 2006
Copyright Oracle Corporation
Oracle Data Mining
• Preprocessing, data clean up: number of
transformations, normalization functions
– Binning, Spatial Binning,…
• Data Mining Functions:
– Classification: Decision Trees, Adaptive Bayes,…
– Clustering: KMeans, KModes, Oracle-specific
• Spatial: BIRCH+Agglomerative Clustering
– Association Rules: Apriori
– Regression:
• SVM with linear kernel and more…
Robust Framework for Mining Data in Oracle
Dagstuhl 2006
Copyright Oracle Corporation
Spatial Data Mining
• Where result patterns have a spatial component
– Clustering
– Colocation of data items
• Spatial-enabled: Include Spatial Info in Data Mining
– Information is implicit (not materialized)
– What information to materialize?
• Spatial correlation with target data (e.g., habitats of birds)
• Spatial auto-correlation in Regression
– Target Variable Y = a .X + p W Y
– Where p is the spatial autocorrelation and W is neighborhood matrix
• First step: materialize target variable estimates
– How to incorporate spatial auto-correlation
• Materialize spatial information, estimates as additional attributes
Dagstuhl 2006
Copyright Oracle Corporation
Materializing Neighborhood Influence
• Compute a weighted-sum of interesting
information (target variable, other attributes)
from neighbors
– E.g., if you are estimating CRIME for a region/point
T take a “distance-based” weighted sum of crime of
neighbors.
– Additionally, you can also estimate population-in- T
10mile radius (based on race) etc.
C(T) =
A
B
C(A)/d(A,T) + C(B)/d(B, T)
(1/d(A, T) + 1/d(B, T) )
– Oracle Spatial provides specific functions to
compute such neighborhood-based estimates
Dagstuhl 2006
Copyright Oracle Corporation
Spatial-enabled Mining
Table
Neighborhood
Estimates
e.g. population in 2-miles,
Crime in neighborhood,…
Augmented
Table
Oracle
Data Mining
Dagstuhl 2006
Mining Results
Copyright Oracle Corporation
Spatial-enabled Mining
Mapviewer
ODM
applications
Spatial Analysis
(building blocks)
Dagstuhl 2006
Classification,
Regression,
Association Rules,…
Spatial Binning,
Spatial Estimates,
Clustering for polygons
(BIRCH+agglomerative)
Copyright Oracle Corporation
Case Study for Spatial-enabled Mining:
How helpful are these estimates?
• Test on a specific dataset
– US Block groups from Census for CA (21K)
– Crime Data for US Blockgroups (from a partner
company)
• Crimerate is number of crimes per 1000 of population
– Separate the data into TRAINING data and TEST
data
– Compute Data Mining models using TRAINING
data
Dagstuhl 2006
Copyright Oracle Corporation
Evaluation
• Predict Crime for TEST regions with and without
spatial estimates using ODM’s Mining functions
– Test Regions: 450 locations in San Francisco area
– Classification (Adaptive Bayes Network)
• Create Bins or “classes” of the data and results
• So how well the model predicts the “class” for new test regions
– Regression (Support Vector Machines)
• Predict the exact value of Regression analysis using SVM
crimerate
– Estimates for spatial neighborhood
Dagstuhl 2006
Copyright Oracle Corporation
Spatial Neighborhood
– How do you define neighborhood?
• Buffer around test location? Quarter-mile, to 10 mile
• Nearest-neighbors? 2 to 20
– Compute spatial estimates for crime,
– Can also be done for population (white, asian, black,
hispanic,..)
Dagstuhl 2006
Copyright Oracle Corporation
Some Results:
• Classification:
– Accuracy increases from 62% to 89% with 7 nearest
neighbors
• Regression:
–
Root-Mean-Square-Error between predicted and
actual value improves from ~25 to 8 (5-7 Neighbors)
• Detailed results in a white paper on
http://technet.oracle.com/products/spatial
• Visualize the results with Mapviewer
Dagstuhl 2006
Copyright Oracle Corporation
Dagstuhl 2006
Copyright Oracle Corporation
Dagstuhl 2006
Copyright Oracle Corporation
Summary of the case study
• Adding Neighborhood Influence to Data
– Improves classification accuracy from 62% to 89%
– Best Neighborhood for this case study: 5-7 neighbors or 2-mile
distance
• Details, Additions: White paper on OTN
– http://technet.oracle.com/products/spatial
• Recommendation for Businesses : Spatial-enable the data
– Always geocode customer/business locations
– Materialize demographic information from spatial neighborhood
– Test the data and perform mining tasks
Dagstuhl 2006
Copyright Oracle Corporation
More research needed…
• Current case study:
– SVM w/o spatial, although worse than with spatial, is
still good: Which attributes are helping?
• Colocation Mining
– “Co-location” of items as opposed to “co-occurrence”
in a transaction
– E.g., which sets of items are colocated and what are the
implications (interesting patterns)
– One approach: identify items that co-occur within
“tiled” regions
– Needs tighter integration with association rule mining
Dagstuhl 2006
Copyright Oracle Corporation