DM15: Visualization and Data Mining
Download
Report
Transcript DM15: Visualization and Data Mining
Visualization
and
Data Mining
Introduction
What is Data Visualization?
How does Data Visualization Work?
History -Jacques Bertin
Image Theory
“Image” a definition
Data Visualization and its use today
What are the benefits of Data Visualization?
Examples of Data Visualization
Conclusion
References
What is Data Visualization
Data visualization is the process of converting raw data
into easily understood pictures of information that
enable fast and effective decisions.
Data -> Easily Understood
Pictures
Jacques Bertin who wrote the classic works of graphical
visualization “Semiology of Graphics” states that the
“transformation from numbers to insight requires two stages.”
Data/Processes
Image
Algorithm
Perception
Insight
Bertin’s 7 Visual Variables
Seven Visual Variables
position
form
orientation
color
texture
value
size
combined with a visual semantics for linking data attributes to visual elements
Image Theory
Visual Processing occurs in 3 steps.
1) formation of the retinal image,
2) decomposition of the retinal image information into
an array of specialized representations and
3) reassembly of the information into object perception.
Uses Today
Data-driven actions are increasingly made without access
to information provided by traditional information
presentation
Information visualization is emerging as an important
fusion of graphics, scientific visualization, database, and
human-computer interaction.
In Military, Commercial Industries use Data Visualization to convey
complex results as understandable images.
What are the benefits of Data
Visualization?
Data visualization allows users see several different
perspectives of the data.
Data visualization makes it possible to interpret vast
amounts of data
Data visualization offers the ability to note exceptions in
the data.
Data visualization allows the user to analyze visual
patterns in the data.
Outline
Graphical excellence and lie factor
Representing data in 1,2, and 3-D
Representing data in 4+ dimensions
Parallel coordinates
Scatterplots
Stick figures
9
Visualization Role
Support interactive exploration
Help in result presentation
Disadvantage: requires human eyes
Can be misleading
10
Bad Visualization:
Spreadsheet
Year Sales
1999 2,110
2000 2,105
2001 2,120
2002 2,121
2003 2,124
Sales
2130
2125
2120
2115
2110
2105
2100
2095
Sales
1999
2000
2001
What is wrong with this graph?
11
2002
2003
Bad Visualization:
Spreadsheet with misleading Y –axis
Year Sales
1999 2,110
2000 2,105
2001 2,120
2002 2,121
2003 2,124
Sales
2130
2125
2120
2115
2110
2105
2100
2095
Sales
1999
Y-Axis scale gives WRONG
impression of big change
12
2000
2001
2002
2003
Better Visualization
Year Sales
1999 2,110
2000 2,105
2001 2,120
2002 2,121
2003 2,124
Sales
3000
2500
2000
1500
Sales
1000
500
0
1999
2000
Axis from 0 to 2000 scale gives
correct impression of small change
13
2001
2002
2003
Lie Factor
size of effect shown in graphic
Lie Factor
size of effect in data
(5.3 0.6)
7.833
0
.
6
14.8
(27.5 18.0) 0.528
18
Tufte requirement: 0.95<Lie Factor<1.05
(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)
14
Tufte’s Principles of
Graphical Excellence
Give the viewer
the greatest number of ideas
in the shortest time
with the least ink in the smallest space.
Tell the truth about the data!
(E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)
15
Visualization Methods
Visualizing in 1-D, 2-D and 3-D
well-known visualization methods
Visualizing more dimensions
Parallel Coordinates
Other ideas
16
1-D (Univariate) Data
Representations
7
Tukey box plot
5
low
3
1
Middle 50%
high
Mean
0
Histogram
17
20
2-D (Bivariate) Data
Scatter plot, …
price
mileage
18
3-D Data (projection)
price
19
3-D image
(requires 3-D blue and red glasses)
Taken by Mars Rover Spirit, Jan 2004
20
Visualization Summary
Many methods
Visualization is possible in more than 3-D
Aim for graphical excellence
21