Tightly Integrated Visualization

Download Report

Transcript Tightly Integrated Visualization

DataJewel1:
Tightly Integrating Visualization
with Temporal Data Mining
Mihael Ankerst, David H. Jones,
Anne Kao, Changzhou Wang
1US
patent pending
DataJewel: A novel Architecture
for temporal data mining
Motivation:
In different domains, different kind of patterns are of interest
 Architecture that provides access to many temporal
mining algorithms
Databases are built based on organizational needs
 Architecture that links together databases
Databases can be huge in size
 Data has to be compressed
Current Data Mining tools are for data mining experts
 Architecture that is very intuitive and easy to use
Visual Data Mining
Data Mining Algorithms
Visualization
-
Evaluation
+
+
Flexibility
-
+
User Interaction
-
+
Actionable
Data
Mining
Visual
Data
Mining
-
Information
Visualization
Visual Data Mining Architecture:
Tightly Integrated Visualization
Data
Data
Visualization of
DM-Algorithm
the result
DM-Algorithm
step 1
DM-Algorithm
step n
DM-Algorithm
Result
Result
Result
Visualization of
the result
Knowledge
Knowledge
Knowledge
Preceding
Visualization (PV)
Subsequent
Visualization (SV)
Tightly integrated
Visualization (TIV)
Visualization + Interaction
Visualization of
the data
Data
Architecture of DataJewel
Data source layer
Access and link multiple heterogeneous
databases, data sources
Statistical layer
Compression, aggregation, sampling
Data mining layer
Extensible set of data mining
algorithms for automatic pattern discovery
Visualization layer
Extensible set of visualizations for representing
data and the patterns
+ interaction capabilities for the user to incorporate
domain expertise
The Visualization Component
Time
Event type
Location
…
09/11/2001
Door broken
Seattle
…
09/12/2001
…
…
…
January 2002
S M T W T F S
Tuesday,
Jan 1st 2002
Doors
Lights
Engine
Landing Gear
The Temporal Mining Component
Goal: Mining algorithms should be

Very efficient (result in interactive times)

Types of patterns:
single event:
recurrence, periodicity,…
multiple events: similarity, causality, clustering,…

Tightly integrated with the visualization
Solution:

Algorithm computes pattern and updates visualization
by assigning unique colors just to events which are contained
in the pattern
All algorithms result in updating the color assignment:
- CalendarView visualizes the data and the patterns
- Same color assignment interface is used by the user and the algorithm
The Temporal Mining Component
Implemented new mining algorithms

LongestStreak

Most Deviations

Correlated Events

Basic ideas of algorithms are motivated by control charting
(stabilized p-chart)
Frequency
mean
time
7
5 5
6
10
The Statistical & Database Component
• Access to data from different databases
• Precompute compressed/aggregated/ sampled data
• Use lookup tables to further compress data
 Currently, we can analyze millions of records in real-time
The Statistical & Database Component
Airline_a
Procurement DB
Date
ATA
Maintenance DB
Complaint_t …
xt
12/1/2000 73
….
…
12/1/2000 73
…
…
15/1/2000 49
…
…
Maintenance DB
Airline_b
Date
ATA
Complaint_t …
xt
1/1/2000
35
….
…
1/1/2000
35
…
…
1/1/2000
39
…
…
The Statistical & Database Component
Airline_a
Procurement DB
Date
ATA
Complaint_t …
xt
12/1/2000 73
….
…
Aggregate data with:
12/1/2000 73
…
…
15/1/2000 49
…
…
Select Date, ATA, count(*) as Freq
From airline_a
GROUP BY Date, ATA
ORDER BY Date, ATA
Maintenance DB
Airline_b
Date
ATA
Complaint_t …
xt
1/1/2000
35
….
…
Date
1/1/2000
35
…
…
1/1/2000
39
…
…
12/1/1999 73
27
15/1/1999 49
9
…
…
ATA
…
Freq
The Statistical & Database Component
Airline_a
Procurement DB
Date
ATA
Maintenance DB
Complaint_t …
xt
1/1/2000
35
….
…
Aggregate data with:
1/1/2000
35
…
…
1/1/2000
39
…
…
Select Date, ATA, count(*) as Freq
From airline_b
GROUP BY Date, ATA
ORDER BY Date, ATA
Airline_b
Date
ATA
Complaint_t …
xt
1/1/2000
35
….
…
Date
ATA
Freq
1/1/2000
35
…
…
1/1/2000
39
…
…
1/1/2000
35
344
1/1/2000
39
193
…
…
…
User-Centric Data Mining
User selects data source/ attributes
Data is compressed and loaded
Data is visualized
User invokes
algorithm
User interacts
with visualization
User selects
visualization technique
User selects
date range
Raw data
is shown
DataJewel – Scenario: Mining Algorithm
Using 41 “different” colors…
DataJewel – Scenario: Mining Algorithm
DataJewel – Scenario: Mining Algorithm
Press here for running mining algorithm
DataJewel – Scenario: Mining Algorithm
DataJewel – Scenario: Mining Algorithm
DataJewel – Scenario: Mining Algorithm
DataJewel – Scenario: User Interaction
DataJewel – Scenario: User Interaction
DataJewel – Scenario: User Interaction
DataJewel – Scenario: User Interaction
Screenshots
One airline, one model, ATA: 49 (airborne auxiliary power)
Conclusions
Data mining algorithms and visualization technique can nicely
complement each other
CalendarView is a new visualization technique, representing
frequency of daily events
DataJewel uses the same visualization to represent the data and
the patterns. The color assignment interface is used by both the
user (to incorporate domain knowledge) and for the computer (to
represent the discovered patterns). These two key properties
greatly improve the applicability of the system by domain experts.
Future work: user studies, new visualizations, algorithms, …