Paper Presentation

Download Report

Transcript Paper Presentation

Research Report
Design and Evaluation of Incremental
Data Structures and Algorithms for
Dynamic Query Interfaces
Institution & Authors
University of Maryland

Department of Computer Science

Human-Computer Interaction Laboratory

http://www.cs.umd.edu/projects/hcil
Authors
Egemen Tanin
 Richard Beigel
 Ben Shneiderman

8 May 2001
info vis - spr 2001
2
Abstract
Dynamic query interfaces (DQIs) are a recently
developed database access mechanism that
provides continuous real-time feedback to the user
during query formulation. Previous work shows that
DQIs are an elegant and powerful interface to small
databases. Unfortunately, when applied to large
databases, previous DQI algorithms slow to a crawl.
We present a new incremental approach to DQI
algorithms and display updates that work well
with large databases, both in theory and in practice.
8 May 2001
info vis - spr 2001
3
Outline & Keywords
Outline
1.
2.
3.
4.
5.
6.
7.
Keywords
Dynamic Querying
The Incremental
Approach
Data Structures &
Algorithms
Theoretical Complexity
Experiments
Conclusions
Future Work
8 May 2001







info vis - spr 2001
Data Structure
Algorithm
Database
User Interface
Information Visualization
Direct Manipulation
Dynamic Query
4
Dynamic Querying
Unlike textual query
languages such as SQL,
dynamic query interfaces
(DQIs) are graphical
Continuous feedback to user
as the query is being
formulated
Tightly coupled: as the hit set
varies all widgets are
updated to show the hit set’s
bounding rectangle
“Details on demand”
8 May 2001
Query Input: Widgets




Range sliders
Alphanumeric sliders
Toggles
Checkboxes
Query Output:
Graphical Display



info vis - spr 2001
Starfield
Bars
Charts
5
8 May 2001
info vis - spr 2001
6
8 May 2001
info vis - spr 2001
7
The Incremental Approach
In DQIs


Queries formed
incrementally
Intermediate result
visualization



8 May 2001
Sliders
Conjunctions
Constraints
Efficient algorithms and
supporting data
structures required
Incremental query
formulation paradigm
info vis - spr 2001
8
Incremental Approach
Innovations for Efficiency
Active Subset

Reprocessing
Of limited size, stored in
main memory


Time, not space is
constraint in DQI
algorithms
Auxiliary data structures
are only reconstructed
when user clicks on a
widget

Auxiliary data structures

Augment active subset
with data structures that
facilitate continuous
querying

8 May 2001
1 second (or less) delay
for recomputation
Incremental Display

Response time needed
of ~0.1 seconds for
continuous operations
info vis - spr 2001
Slight changes in query
tend to cause slight
changes in output

Compute and display
differences for
continuous updates
9
Data Structures & Algorithms i
Control Structure
Query Previewer


DQI algorithms to be
used in tandem with
a query previewer
Allows user to
browse a huge
database and select
a manageably small
subset to scan
8 May 2001



info vis - spr 2001
Selected subset
passed from query
previewer to DQI
Bounding rectangle
determines extremes
for each attribute
When user action
extends beyond
subset, control is
passed back
10
(0,100)
(100,100)
(60,40)
DQI Subset
(20,20)
(0,0)
8 May 2001
(100,0)
info vis - spr 2001
11
Data Structures & Algorithms ii
Major Operations



Setup
Selection
Querying
Setup




8 May 2001
info vis - spr 2001
when query previewer
passes control to DQI
Initial display is
generated from active
subset
Copies and scales each
attribute to the range
[1,p] where p is number
of pixels in attribute’s
range slider
Happens infrequently,
more time allotted for this
task
12
Data Structures & Algorithms iii
Querying
Selection


When user clicks on a
range slider
Algorithm calculates
auxiliary data structures


Depends on currently
selected attribute and
current ranges for other
attributes
8 May 2001

Occurs continuously as the
user drags the mouse to
update a slider
 Each changed mouse
position is a single
query
~ 0.1 second response
required

DQI computes maximum
hit set during selection
DQI computes
information needed for
redisplay during selection
info vis - spr 2001
13
1 second response
required or users get
annoyed



Can spend memory to
optimize, if needed
Data Structures & Algorithms iv
Compute Max Hit Set




Determined by extreme
values for attribute and
current ranges for other
attributes
Partitions max hit set into
p buckets, one for each
user-specificable value
for the current attribute
Store each bucket and all
left-to-right partial sums
of these sizes
Linear-time counting
sort
8 May 2001
info vis - spr 2001
14
Data Structures & Algorithms v
Compute Redisplay Info





Facilitate computation of
histogram and tight
coupling of range sliders
Two dimensional array
(size p2) maintained
Allows determination of
ranges for all other
sliders in constant time
Scan buckets for old
value vs. new value
Display update in time
linear with number of
changes
8 May 2001
info vis - spr 2001
15
Theoretical Complexity i
r = # records in active subset
a = # attributes
b = # bytes needed to store
value of single attribute
p = length in pixels of each
range slider
f = area in pixels of the starfield
u = average # pixels to be
updated in starfield per query
(non-trivial dependencies)
m = # records in max hit set
8 May 2001
Active Subset
(r  a  b) bytes
Rescaled Active Subset
O(r  a) bytes
Bucket Partition
O(r  a) bytes
Data structures for tight coupling
O(a  p) bytes
Data structures range histograms
O(a  p2) bytes
Starfield
f bytes
info vis - spr 2001
16
Theoretical Complexity ii
r = # records in active subset
a = # attributes
b = # bytes needed to store
value of single attribute
p = length in pixels of each
range slider
f = area in pixels of the starfield
u = average # pixels to be
updated in starfield per query
(non-trivial dependencies)
m = # records in max hit set
8 May 2001
Setup Time
O (r  a  b)
info vis - spr 2001
17
Theoretical Complexity iii
r = # records in active subset
a = # attributes
b = # bytes needed to store
value of single attribute
p = length in pixels of each
range slider
f = area in pixels of the starfield
u = average # pixels to be
updated in starfield per query
(non-trivial dependencies)
m = # records in max hit set
1.
2.
3.
4.
5.
1.
8 May 2001
Selection Time
Determine max hit set
O (r  a)
Sort max hit set
O (m)
Compute auxiliary data
structures for tight coupling
O (a  p + m  a)
Compute auxiliary data
structures for histograms
O (a  p2 + m  a)
Total Selection Time
O (a  (r + m + p2 )) =
O (a  (r + p2 ))
info vis - spr 2001
18
Theoretical Complexity iii
r = # records in active subset
a = # attributes
b = # bytes needed to store
value of single attribute
p = length in pixels of each
range slider
f = area in pixels of the starfield
u = average # pixels to be
updated in starfield per query
(non-trivial dependencies)
m = # records in max hit set
8 May 2001
1.
Querying Time
Tight coupling
O (a )
2.
Computing Histograms
O (a  p)
3.
Starfield Update
O (u )
Total Querying Time
1.
O (a  p + u )
info vis - spr 2001
19
Experiments i
Preliminary experiments
show that the incremental
approach can deal with
active subset of 100,000
records with 10 attributes
each
Film Finder could handle
database 10,000 records
with 10 attributes
8 May 2001
Improvement by
one order magnitude
1.
2.
3.
Experimental Environment
Experiments and Results
Experimental Complexity
info vis - spr 2001
20
Experiments ii
Experimental Environment
Sample DQI with range
sliders



Starfield display, preview
bar, range sliders
Variable sizes for display,
point size and sliders
SUN SPARC Station 5



32MB ram
UNIX os
Motif and C
Batch Processing
8 May 2001
Measurements


Setup, Selection, Querying,
File read, Data structure
setup, Sub-selection, Subsetup
Repeated without graphics
for ‘pure’ query time
Varied the following:

Total attributes, total
records, starfield size, point
sizes on starfield, range
slider sizes and jump sizes
Worst Case Analysis
info vis - spr 2001
21
Experiments iii
Attributes (a)
Experiments and Results
7200 runs


2,4,6,8 or 10
3600 with starfield display
and preview bar disabled
3600 with starfield display
and preview bar
1st 3600: Pure query no
more than 20 milliseconds
(average 10 milliseconds)
faster

Time to update internal
data structures negligible
compared to starfield
update times
8 May 2001
Starfield Size (f)
4002, 5002 or 6002 pixels
Point Size (d)
12, 32, 52 or 72 pixels
Range Slider Size (p)
150, 200, 250 pixels
Dataset Size (r)
10,000, 25,000, 50,000, 75,000 or
100,000 records
Jump Size/range slider size
(j/p)
1/50, 1/25, 1/10, or 1/5
info vis - spr 2001
22
Experiments iv
Experimental Complexity
Complexity analysis based
on 2nd 3600 queries with
starfield display and
preview bar enabled
Ideally, experimental
results will equate with
theoretical prediction

Predicted run-time to
hopefully equal
experimental run-time
8 May 2001
Multiple linear regression
used to generate best fit
line from data

Error compensation


Experimental error
Algorithmic error
Setup
Selection
Querying
info vis - spr 2001
23
Experiments v
8 May 2001
info vis - spr 2001
24
Experiments vi
Evaluation
1.
X2 test to assess correlation
between experiments and
theoretical terms

2.
Actual values for Setup,
Selection, Query highly
correlated with estimated
values
Predictive test to see if future
outcomes can be estimated
based on past experiment



Setup deviation 9.5%
Selection deviation: 3.97%
Querying deviation: 16.63%
8 May 2001
Discussion
Incremental approach
achieved better querying and
display times than previous
implementations using
standard data structures
Consumed less memory

Data structures created when
needed
Preview bar, histogram, tight
coupling info without making
additional queries or spending
additional processor time
Memory is secondary problem
compared to starfield updates
which is significant for large r
info vis - spr 2001
25
Conclusions & Future Work
Conclusions
Future Work
The new incremental approach for
queries and display updates
introduces a better way of dealing
with large databases. Experiments
show this approach is faster than
previous approaches and can deal
with an order of magnitude of larger
datasets. The querying time is
dominated by the starfield update
time. The incremental approach
enables faster display because only
the difference between consecutive
queries is updated in the data
structures and on the starfield
display.
8 May 2001




info vis - spr 2001
Make another order of
magnitude increase in size of
datasets that DQIs can handle
Implement other widgets
Try spatial data structures like
K-D tree
Combine DQIs with query
previewer to produce a new
state of the art in interactive
dynamic database access
26
8 May 2001
info vis - spr 2001
27
Data Visualization Sliders
AT&T Bell Laboratories
Stephen G. Eick
Abstract
Computer sliders are a generic user input mechanism
for specifying a numeric value from a range. For data
visualization, the effectiveness of sliders may be
increased by using the space inside the slider as



An interactive color scale,
A barplot for discrete data, and
A density plot for continuous data.
The idea is to show the selected values in relation to
the data and its distribution. Furthermore, the
selection mechanism may be generalized using a
painting metaphor to specify arbitrarily, disconnected
intervals while maintaining an intuitive user-interface.
8 May 2001
info vis - spr 2001
29
Outline & Keywords
Outline



Introduction
Data Visualization
Sliders
Summary
Keywords





8 May 2001
info vis - spr 2001
High Interaction
Thresholding
Information Visualization
Selection
Dynamic Graphics
30
Introduction
Sliders are a generalpurpose user input
mechanism to specify an
input from a well-defined
range
Sliders are easy to use,
intuitive, and provide a
sensitive mechanism for
specifying values.
Sliders have a threshold bar
positioned within a scale that
the user manipulates with a
mouse to select a value.
8 May 2001
There are many slider or
slider-like applications
Common idea is filtering.
The pruning of visual clutter
from data-rich displays by
adjusting sliders is
particularly effective in
information visualization, and
even more so when done
dynamically.
info vis - spr 2001
31
Data Visualization Sliders
Eick improves
sliders - use of
internal slider space


For color scale
For data values



‘tick marks’
Density plot or bar
length
‘painting metaphor’
8 May 2001



Distribution of data

Pop-up menu options



info vis - spr 2001
Toggle slider views
Color rescale for
increased color fidelity
Range zooming for
increased scale
sensitivity
Animation
Labeling to print statistic
values
Interactive partition
adjustment
32
Data Visualization Sliders
A: combines visualization
color scale and slider with
interaction techniques
B: extends A by showing
smooth distribution
C: maps info to one color
coded bar
D: generalizes C to encode
info in bar’s length
8 May 2001
info vis - spr 2001
33
Summary
Generalizes the generic
functionality of sliders along
several orthogonal directions
User can specify
disconnected intervals while
preserving intuitive slider
interface
Internal space as a color
scale
Interactively rebinding colors
to active bars
Adjusting color divisions
8 May 2001
Presenting distribution of
data
Showing individual data
values
Move between
representations under user
control
Linking sliders to data they
control suggests many
natural and obvious
extensions
info vis - spr 2001
34
the end
8 May 2001
info vis - spr 2001
35