Data Visualization using IRIS Explorer
Download
Report
Transcript Data Visualization using IRIS Explorer
Data Mining and Visualization
Jeremy Walton
NAG Ltd, Oxford
24 October 2002
Data Mining & Visualization
1
Tools
NAG Data Mining Components
Statistical & machine learning routines (written in C)
Data cleaning
Data transformation
Model building
classification, clustering, prediction
IRIS Explorer
Modular visualization environment,
Visual programming interface
Extensible - users can add new modules
24 October 2002
Data Mining & Visualization
2
Example
Image from Landsat Multi-Spectral Scanner
36 independent variables per region
Each region = 3 by 3 array of pixels
4 spectral bands per pixel
Each pixel is in one of six classes
Each pixel = 80 m2
types of land use
Want to extrapolate for class values elsewhere
24 October 2002
Data Mining & Visualization
3
Treatment (1)
Use principal component analysis
reduces 36 dimensions to 2
Choose three classes
explains ~ 85% of variance
2 - cotton crop
4 - damp grey soil
5 - soil with vegetation stubble
2 independent variables, 1 class variable
1364 points
24 October 2002
Data Mining & Visualization
4
Treatment (2)
Model data using a decision tree (Quinlan’s C4.5)
Classify original data using decision tree
Each node splits data into two sets
Aims to maximise separation of classes
Splitting continues recursively
each point is assigned to a class
92% agreement with original classes
Classify new data using decision tree
e.g. points on a grid
establish boundaries of classes
24 October 2002
Data Mining & Visualization
5
Visualization
24 October 2002
Data Mining & Visualization
6
Original class values
24 October 2002
Data Mining & Visualization
7
Predicted class values
24 October 2002
Data Mining & Visualization
8
Visualization
24 October 2002
Data Mining & Visualization
9
Lessons learnt / next steps
Visual programming interface
Easy to link components & IRIS Explorer
building modules using C interface
Extensible to other data sets?
What about larger data sets?
24 October 2002
Data Mining & Visualization
10