Data Visualization using IRIS Explorer

Download Report

Transcript Data Visualization using IRIS Explorer

Data Mining and Visualization
Jeremy Walton
NAG Ltd, Oxford
24 October 2002
Data Mining & Visualization
1
Tools

NAG Data Mining Components




Statistical & machine learning routines (written in C)
Data cleaning
Data transformation
Model building


classification, clustering, prediction
IRIS Explorer



Modular visualization environment,
Visual programming interface
Extensible - users can add new modules
24 October 2002
Data Mining & Visualization
2
Example

Image from Landsat Multi-Spectral Scanner


36 independent variables per region



Each region = 3 by 3 array of pixels
4 spectral bands per pixel
Each pixel is in one of six classes


Each pixel = 80 m2
types of land use
Want to extrapolate for class values elsewhere
24 October 2002
Data Mining & Visualization
3
Treatment (1)

Use principal component analysis

reduces 36 dimensions to 2


Choose three classes




explains ~ 85% of variance
2 - cotton crop
4 - damp grey soil
5 - soil with vegetation stubble
2 independent variables, 1 class variable

1364 points
24 October 2002
Data Mining & Visualization
4
Treatment (2)

Model data using a decision tree (Quinlan’s C4.5)




Classify original data using decision tree



Each node splits data into two sets
Aims to maximise separation of classes
Splitting continues recursively
each point is assigned to a class
92% agreement with original classes
Classify new data using decision tree


e.g. points on a grid
establish boundaries of classes
24 October 2002
Data Mining & Visualization
5
Visualization
24 October 2002
Data Mining & Visualization
6
Original class values
24 October 2002
Data Mining & Visualization
7
Predicted class values
24 October 2002
Data Mining & Visualization
8
Visualization
24 October 2002
Data Mining & Visualization
9
Lessons learnt / next steps


Visual programming interface
Easy to link components & IRIS Explorer



building modules using C interface
Extensible to other data sets?
What about larger data sets?
24 October 2002
Data Mining & Visualization
10