CS548S15_Showcase_Model_Trees

Download Report

Transcript CS548S15_Showcase_Model_Trees

CS548 Spring 2015 Showcase
By Yang Liu, Viseth Sean, Azharuddin Priyotomo
Showcasing work by Le,
Abrahart, and Mount on "M5
Model Tree applied to modelling
town centre area activities for
the city of Nottingham"
bbc.co.uk
References
[WFH 2011] Ian H. Witten, Eibe Frank, Mark A. Hall (2011). Data Mining: Practical Machine
Learning Tools and Techniques 3rd Edition (pp. 67-68, 251-259). Burlington, MA: Morgan
Kaufmann.
[LAM 2007] T. K. T. Le, R. J. Abrahart, N. J. Mount (2007). M5 Model Tree applied to modelling
town centre area activities for the city of Nottingham. Proceedings of the 9th International
Conference on GeoComputation National Centre for Geocomputation, National University of
Ireland, Maynooth, September 2007
[WW 1997] Y. Wang, I. H. Witten. Induction of model trees for predicting continuous classes. In
Proc European Conference on Machine Learning Poster Papers, pages 128-137, Prague, Czech
Republic, 1997.
[Quin 1992] Ross J. Quinlan. Learning with Continuous Classes. In 5th Australian Joint
Conference on Artificial Intelligence, Singapore, pages 343-348, 1992.
Content
❖ Regression Tree & Model Tree
❖ Model Tree Induction Algorithm
❖ Real Application
Regression Tree & Model Tree
Taken from [WFH 2011]
Taken from [WFH 2011]
Regression Tree:
Number @ each leaf
Taken from [WFH 2011]
Model Tree: Linear Regression
Model @ each leaf
Model Tree Induction (M5)
❖ Following ordinary decision tree induction algorithm to build
an initial tree.
❖ Splitting criterion: Std Dev instead of entropy; but, based on
the same rationale: The lower the Std Dev, the shallower the
subtree and the shorter the tree/rule.
❖ Pruning algorithm stays the same except replacing a subtree by a regression plane instead of a constant.
❖ Smoothing: remove any sharp discontinuities that exist
between neighboring leaves of the pruned tree.
Real Application
❖ Analyzing patterns of city
❖ Main Attributes: TCPIs
activities using spatial data
❖ Spatial data is usually stored
as coordinates and
topology, and is data that
can be mapped. Spatial data
is often accessed,
manipulated or analyzed
through Geographic
Information Systems (GIS).
TCPI
❖ Town Center Performance Indicator
❖ Indicators used for defining vital activities in a town
center
❖ Publicly agreed over a set of 8 TCPIs for this application
Town Center Performance Indicators
Taken from [LAM 2007]
GIS input layers: 8 considered TCPIs for Nottingham’s town centre
The Main Problems
❖ The different perceptions of the significance of each
TCPI and their relative importance
❖ How to choose representative sample
❖ How many linear models in the tree
Spatial Data Collection (Cool Stuff)
Change spray size
Wipe the map
Add New Area
When done click
on “send” button
Write in comments
User sprays
on map
Model Tree Creation
❖ Data instances: 4250 instances as training set,
generated by random sampling
❖ Attributes: 8 TCPIs (Leisure, car park, commerce,
public, pedestrian, industry, population, education)
❖ Splitting input space of the training set (town center
area activities) into sub spaces (sub-areas)
❖ Building a linear regression model (at the leaves) for
each sub-space
An Example of M5 Algorithm
Taken from [LAM 2007]
Splitting the input space of the training set[X1, X2] using M5 algorithm
Each model is a linear regression model Y = a0 + a1X1 + a2X2
The Model Tree
Associated
indicators
❖ Commerce
❖ Pedestrian
❖ Leisure
❖ Car_park
❖ Public_building
Taken from [LAM 2007]
Tree model results from 4250 instances for eight TCPIs
Less associated
indicators
❖ Population
❖ Industry
❖ Education
Why choose 14 linear models?
Taken from [LAM 2007]
Nottingham Mental Map
● Target output of the
model
● The darker the red
color is, the more
confident those areas
belong to town center
area activities.
Taken from [LAM 2007]
A single overall public mental town centre
map(web-based GIS survey)
Result In Maps
14th linear model
High dense in commercial and
pedestrian flow
13th linear model
High dense in commercial
Taken from [LAM 2007]
12th linear model
Less dense in commercial
& High dense in leisure
and pedestrian flow
Result In Maps
3rd linear model
2nd linear model
Less dense in commercial &
High in Residential use
High dense in Industry
Taken from [LAM 2007]
Pros of this Model Tree
❖ Tells the story of how significant each indicator
(attribute) is for prediction
❖ Tells to which degree each indicator explains the output
(town center area activities)
❖ Is particularly useful for natural temporal and complex
characteristics of urban city
Thank You!!!