Worcester Polytechnic Institute XMDVTOOL 4.2B

Download Report

Transcript Worcester Polytechnic Institute XMDVTOOL 4.2B

XmdvTool
Interactive Visual Data Exploration System for
High-dimensional Data Sets
http://davis.wpi.edu/~xmdv
Matthew O. Ward, Elke A. Rundensteiner,
Jing Yang, Punit Doshi, Geraldine Rosario,
Allen R. Martin, Ying-Huey Fua, Daniel Stroe
Worcester Polytechnic Institute
This work partially funded by NSF Grants IIS-9732897, IRIS-9729878 and IIS-0119276
1
XmdvTool Features
•
•
Hierarchical visualization and interaction tools for
exploring very large high-dimensional data sets to
discover patterns, trends and outliers
Applications:







•
•
Bioterrorism Detection
Bioinformatics and Drug Discovery
Space Science
Geology and Geochemistry
Systems Monitoring and Performance Evaluation
Economics and Business
Simulation Design and Analysis
Multi-platform support (Unix, Linux, Windows)
Public domain software: http://davis.wpi.edu/~xmdv
2
Xmdv: Main Features
• Scale-up to High Dimensions: Visual Hierarchical
Dimension Reduction
• Scale-up to Large Data Sets: Interactive Hierarchical
Displays, Database Backend with Minmax Encoding,
Semantic Caching and Adaptive Prefetching
• Interlinked Multi-Displays: Parallel Coordinates,
Glyphs, Scatterplot Matrices, Dimensional Stacking
• Visual Interaction Tools: N-Dimensional Brushes,
Structure-Based Brushing, InterRing
3
Scale-Up for Large Number of Dimensions
Solution to High Dimensional Datasets:
• Group Similar Dimensions into
Dimension Hierarchy
• Navigate Dimension Hierarchy by
InterRing
• Form Lower Dimensional Spaces by
Dimension Clusters
• Convey Dimension Cluster
Information by Dissimilarity Display
4
Visual Hierarchical Dimension Reduction Process
5
Visual Hierarchical Dimension Reduction Process
A 42-dimensional Data Set
A 4-Dimensional Subspace
Dimension Hierarchy
Interaction Tool:
InterRing
6
InterRing - Dimension Hierarchy
Navigation and Manipulation
Roll-up/Drill-down
Distort
Rotate
Zoom in/out
Modify
7
Dissimilarity Display
Three Axes Method
Diagonal Plot Method
Axis Width Method
Mean-Band Method
8
Scale-up for Large Number of Records
Solution to Large Scale Datasets:
• Group Similar Records into
Data Hierarchy
• Navigate Data Hierarchy by
Structure-Based Brushing
• Represent Data Clusters by
Mean-Band Method
• Provide Database Backend Support
using MinMax Tree, Caching,
Prefetching
9
Interactive Hierarchical Display
2D example
Hierarchical Clustering
Structure-Based Brushing
10
Interactive Hierarchical Display
Flat Display
Hierarchical Display
Mean-Band Method in Parallel Coordinates
11
Interactive Hierarchical Display
Flat Display
Hierarchical Display
Mean-Band Method in Parallel Coordinates
12
Scalability of Data Access
• Approach
• Attach database system to visualization front-end
• MinMax hierarchy encoding
• Key idea: avoid recursive processing
• Pre-computed
• Caching
• Key idea: reduce response time and network traffic
• Prefetching
• Key idea: use application hints and predict user patterns
• Performed during idle time
13
Scalability of Data Access:
MinMax Hierarchy Encoding
• Pre-compute object positions
– level-of-detail (L)
– extent values (x,y)
– preserve tree structure
level of detail
• New query semantics
– objects are now rectangles
– select objects that touch L
– select objects that touch (x, y)
– structure-based brush =
intersection of two selections
L
x
y
L
extent values
query = (x, y, L)
x
y
14
Scalability of Data Access: Caching
• Purpose
• reduce response time and network traffic
• Issues
• visual query cannot directly translate into object IDs
 high-level cache specification to avoid complete scans
• Semantic caching
• queries are cached rather than objects
• minimize cost of cache lookup
• dynamically adapt cached queries to patterns of queries
15
Scalability of Data Access: Prefetching
• Strategy
– Speculative (no specific hints)
– navigation remains local
– both user and data set influence exploration
– Adaptive (strategy changes over time)
– Evolves as more knowledge becomes available
– Non-pure (interruptible prefetching)
– leave buffer in consistent state
• Requirements
– non-pure prefetching + large transactions & small object
size + semantic caching  small granularity (object level)
– speculative, non-pure prefetcher  cache replacement
policy + guessing method
16
Scalability of Data Access:
Experimental Evaluation
Effectiveness of Prefetcher
200
160
120
80
40
0
Client OFF
Server OFF
Client OFF
Server ON
Client ON Server Client ON Server
OFF
ON
Caching
% Improvement
in Response
Time
Response Time (seconds)
Effectiveness of Caching
30
25
20
15
10
5
0
0
2
4
6
Delay between User Operations (seconds)
8
Conclusions:
 Caching reduces response time by 80%
 Prefetching further reduces response time by 30%
 Designing better prefetching strategies might help
further reduce response time
17
Scalability of Data Access: Prefetching
Mean Strategy
Random Strategy
p  14
p  14
Direction Strategy
p
(m-1)
1
m
m(n-1)
m(n)
m(n+1)
(m+1)
4
m(n-2)
p  14
Localized Speculative Strategies
Exponential Weight
Average Strategy
Focus Strategy
m(n-1)
Current
Navigation
Window
Hot
Regions
Data Set Driven Strategy
m(n)
m(n+1)
m(n-2)
Vector Strategies
18
Xmdv System Implementation
OFF-LINE PROCESS
• Tools
–
–
–
–
–
C/C++
TCL/TK
OpenGL
Oracle 8i
Pro*C
MinMax
Labeling
DB DB DB
Loader
Schema
Info
Translator
MEMORY
Hierarchical
Data
User
Rewriter
Exploration Buffer
Variables Queries
GUI
Prefetcher
Library:
Buffer
ON-LINE PROCESS
Flat
Data
Estimator
Random
Direction
Focus
Mean
EWA
19
Publications (available at http://davis.wpi.edu/~xmdv)
• Jing Yang, Matthew O. Ward and Elke A. Rundensteiner, "InterRing:
An Interactive Tool for Visually Navigating and Manipulating
Hierarchical Structures", InfoVis 2002, to appear
• Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward and Daniel
Stroe, “Prefetching For Visual Data Exploration.”
Technical Report #: WPI-CS-TR-02-07, 2002
• Jing Yang, Matthew O. Ward and Elke A. Rundensteiner, “Interactive
Hierarchical Displays: A General Framework for Visualization and
Exploration of Large Multivariate Data Sets”, Computers and Graphics
Journal, 2002, to appear
• Daniel Stroe, Elke A. Rundensteiner and Matthew O. Ward, “Scalable
Visual Hierarchy Exploration”, Database and Expert Systems
Applications, pages 784-793, Sept. 2000
• Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner,
“Hierarchical Parallel Coordinates for Exploration of LargeDatasets”,
IEEE Proc. of Visualization, pages 43-50, Oct. 1999
• Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner,
“Navigating Hierarchies with Structure-Based Brushes”, IEEE
20
Proceedings of Visualization, pages 43-50, Oct. 1999