Transcript Slide 1


Defined the goal of information visualization and discussed the
visualization tasks for BI.

Identified methods of enhancing understanding and amplifying
cognition:
◦ Reduce search time and enhance recognition of patterns (using pre-attentive
processing);
◦ provide focus/emphasis through afforandances.

Reviewed heuristics from Tufte and Nielsen.

Saw an example of a multivariate visualization for the task of
communication.
1

Understand quantitative relationships (optional review)
◦ Nominal vs. ordinal vs. interval vs. hierarchical relationships
◦ Ranking vs. ratio vs. correlation
◦ Measures of average and distribution

Concepts of tables and graphs
◦ Tables are used to see individual values; graphs are used to reveal
relationships among multiple values
◦ Tables and graphs should be sorted to highlight key message.
◦ Relative use of pie charts, bar charts, line charts, sparkline, small
multiples, box plot...
◦ Showing relationships vs. deviation vs. correlation vs. ranking vs.
time-series vs. part-to-whole vs. distribution
◦ Importance of sorting tables and graphs.
2

Finish evaluating a few sample individual visualizations.

Explain how visualizations fit within the overall BI
architecture.

Discuss the differences between OLAP and data
mining.

Present dashboards as the most common OLAP
visualization tool.

Begin discussion of data mining.
3
4
5
6

“Business Intelligence” is making purposeful use of
data in decision making.

The goals of BI are:
◦ To support human decision making by providing as much
understandable, complete, relevant, well-organized
information as necessary and helpful.
◦ To automate some decisions to relieve humans of routine
decision making tasks.
◦ To discover new issues/relationships/correlations that may
not be able to be readily conceived by humans.
No data marts option
Applications
(Visualization)
Data
Sources
Access
ETL
Process
Select
Legacy
Metadata
Extract
POS
Transform
Enterprise
Data warehouse
Integrate
Other
OLTP/wEB
Data mart
(Finance)
Load
Replication
External
data
Data mart
(Engineering)
Data mart
(...)
/ Middleware
Data mart
(Marketing)
API
ERP
Routine
Business
Reporting
Data/text
mining
OLAP,
Dashboard,
Web
Custom built
applications
8

Data Sources available for input.

ETL tools to bring input data into an integrated data source.

Integrated Data Source (usually a data warehouse).
◦ Structured and unstructured data.
◦ Internal and external data.

Metadata repository.
◦ Data definitions and meanings.
◦ Business rules and process decisions.

Analytical tools.
◦ OLAP: Online Analytical Processing
◦ Statistical analysis.
◦ Data Mining.

Data Visualization.
◦ Graphical, tables, pictures.

The vast majority of output from BI is OLAP-related.

Provide information to support both ad-hoc and consistent queries
for managerial decision making.

Provide multi-dimensional data analysis techniques.

Work primarily with data aggregation.

Data mart/derived data model.

Provide advanced statistical analysis.

Support access to very large databases through additional data
structures such as SQL Server Analysis Services (cubes).

Contain enhanced query optimization algorithms to facilitate query
processing speed (SQL Server Analysis Services).
OLAP Results
Generates relatively
standardized reports to ad-hoc
queries.
Answers questions such as:
Which products sold the most
quantity - by type of product
and geographic region?
Which stores are currently
most profitable? Which are
least profitable?
Used frequently to support
short and long term managerial
decision making.
OLAP Visualization
Presented in standard displays
that are accessed frequently
Dashboard format used to
provide quick and
comprehensive overview of
business status.
Presented in Excel or other
spreadsheet format.
Display the output using either
a standard report generator
(Crystal Reports, Access, etc.)
Display the output graphically.

Data mining is the set of activities used to find new,
hidden or unexpected patterns in data.

Data mining tools:
◦ use large sets of data;
◦ uncover patterns based on statistical and artificial intelligence
algorithms;
◦ form computer models based on the findings; and
◦ use the models to predict business behavior.

Common synonyms for data mining include knowledge
discovery, information harvesting, & pattern analysis.

Proactive tools, used for discovery and prediction.
Data Mining Results
Generates information about
patterns in data.
Data mining provides answers to
previously ambiguous questions;
but a question area must be
defined.
May produce information such
as:
Which products should be promoted
to a pre-defined type/category of
customer?
Which patients have the greatest
likelihood of being hospitalized within
the next year?
Which securities are the most
profitable to buy/sell in a particular
environment?
Data Mining Visualization
Focus is on discovery and
analysis, rather than
reporting, monitoring or
communicating a message.
Uses primarily graphical
output to display the
patterns.
Included as part of the data
mining tool.
Can also incorporate the
results in standardized
reporting tools and/or
dashboards, but information
is already “discovered” by
that time.

How many people between the ages of 15-30 are
diagnosed with type 2 diabetes?

What is the quantity breakdown by county in the U.S.
for people diagnosed with type 2 diabetes?

What is the relationship between weight, exercise, age
smoking, and the prevalence of type 2 diabetes?

What demographic factors are related to type 2
diabetes?

How many different customers did we serve? How many
applicants did we place?

Which customer was our most profitable?

Which customers have the greatest likelihood of increasing
their number of temporary employees next year?

Which geographic region was our most profitable last
quarter?

Which geographic region has the fastest growth rate
measured by number of employees placed over the last 3
years?
15

Most common visualization method for OLAP.

Visual display – not printed.

Must have metrics. What is a metric, again??

Key Information
◦ Most important information to monitor one or more objectives
◦ Usually related directly to key performance indicators
◦ Consolidated

Fits on one screen (no scrolling!)

Designed to be monitored at a glance
16

http://www.infosol.com/business%20intelligence/library
-dashboards.aspx

http://www.dundas.com/dashboard/online-examples/

http://www.tableausoftware.com/

http://www.exceluser.com/dash/samples.htm

http://dashboardsbyexample.com/

http://www.dashboardzone.com/
17

http://www.it-performs.com/services/dashboardcentre/dashboard-videos

http://www.youtube.com/watch?v=3Stuh7-RyuE

http://www.youtube.com/watch?v=EJ9CNhgh8EY

http://www.dminebi.com/dmine-dashboard-videos/

http://www.youtube.com/watch?v=V9GMCSWjyI&feature=related

http://www.youtube.com/watch?v=0AS9TIK1QFk&feat
ure=related
18

Derived from the work on executive information
systems (late 1980’s through 1990’s).

Further roots in the work on the “balanced
scorecard” concept to broaden perspective from
financials alone.

Uses the dashboard metaphor to develop fast
recognition and appeal.
19
Strategic
Audience
Use
Design
Issues
and
Cautions
Analytical
Operational
Executives,
managers
Managers, analysts
Executives,
managers, BOG
High-level
performance;
Relationships
Detailed
Run daily, weekly,
understanding of KPI monthly operations
factors
Simple displays;
Provide context;
Include forecasts
Rich comparisons;
more context,
multivariate
Maintain awareness
through dynamic,
simple displays
Beware too much
information;
Avoid subtle
gradations;
Link to KPI;
Don’t bother with
real-time data
Provide drill-down;
Enable exploration;
Show movement;
Allow examination
of causes;
Probably doesn’t
require real-time data
Specific information
available; provide
drill-down;
Exceptions are
critical; requires
real-time data; use
hovering
20
Category Measures
Category Measures
Sales
Finance
Billings
Bookings
# of Orders
Order Amounts
Revenues
Expenses
Profits
Category Measures
Category
Marketing
Category
Tech
Support
Measures
Market share
Ad campaign $
Cust. Demographics
Measures
# of support calls
Resolved cases
Customer satisfaction
Call duration
IT
Network downtime
System usage
Fixed app defects
Category
Measures
Human
Employee satisfaction
Resources Employee turnover
Count of open positions
Count of late reviews
21

Overall design
◦
◦
◦
◦

Exceeding boundaries of a single screen.
Limiting design to the dashboard metaphor.
Choosing ineffective or inappropriate visualization methods.
Poor flow/arrangement of presentation of data.
Content
◦ Choosing a deficient, inappropriate or ineffective measure.
◦ Supplying inadequate context for the data.
◦ Displaying excessive detail or precision.

Detailed design (look and feel)
◦ Misusing or overusing color; meaningless variety of color and shape.
◦ Poor highlighting of important data.
◦ Cluttering the display with useless decoration.
22

Delivers information that is:
◦
◦
◦
◦
◦
Exceptionally well-organized.
Condensed.
Provides summaries and exceptions.
Specific to the requirements of the audience.
Presented on the media of choice for the audience (computer,
phone, tablet, etc.)
◦ Flexible.
◦ Able to be pursued in more detail beyond the dashboard.
23

Understand and make best use of screen real estate

Maximize the data-ink /total-ink ratio (or data
pixels/total pixels ratio...)

Eliminate all unnecessary non-data pixels

De-emphasize all non-data pixels and make them slip
into the background of the overall design

Highlight the most important data pixels
24
Emphasized
Neither emphasized
or de-emphasized
Emphasized
Neither emphasized
or de-emphasized
De-emphasized
25
Salesperson
Jan
Feb
Mar
Bill Bassett
2,834
4,340
4,885
Jenny Martin
5,890
7,439
6,493
Luis Marquez
3,899
6,889
8,593
Bob Taylor
1,250
3,445
5,443
Jan
Feb
Mar
Bill Bassett
2,834
4,340
4,885
Jenny Martin
5,890
7,439
6,493
Luis Marquez
3,899
6,889
8,593
Bob Taylor
1,250
3,445
5,443
Salesperson
26
15000
10000
Store 1
5000
Store 44
Store 8
0
Store 1
Store 6
12000
10000
8000
6000
4000
2000
0
Store 1
Store 44
Store 8
Store 6
27

Grid lines in graphs that don’t need precision

Backgrounds that don’t provide delineation of sections
on the dashboard

3-D that doesn’t provide additional variables or layers
of analysis

Drawings that are not part of the data – including
detailed logos

Colors that don’t highlight or emphasize data

Meters and gauges that don’t incorporate preattention
28

Arrange the overall design to reflect how the intended
audience “thinks” about the decisions to be made.

Group related data.

Arrange the data in a meaningful order (low to high;
high to low)

Use bright colors sparingly and judiciously.

Avoid use of a colored background.

White space is an effective delimiter.

Use fonts with good legibility and readability.
29

Also graphical, but designed for an analyst to discover
patterns, not to communicate information for
managerial decision making.

Must understand a bit more about data mining while
discussing visualization.
30
Opening Vignette:
Data Mining Goes to Hollywood!
Class No.
Range
(in $Millions)
1
2
3
<1
>1
> 10
(Flop) < 10
< 20
Dependent
Variable
Independent
Variables
A Typical
Classification
Problem
4
5
6
7
8
9
> 20 > 40
> 65
> 100
> 150
> 200
< 40 < 65
< 100
< 150
< 200
(Blockbuster)
Independent Variable
Number of
Possible Values
Values
MPAA Rating
5
G, PG, PG-13, R, NR
Competition
3
High, Medium, Low
Star value
3
High, Medium, Low
Genre
10
Sci-Fi, Historic Epic Drama,
Modern Drama, Politically
Related, Thriller, Horror,
Comedy, Cartoon, Action,
Documentary
Special effects
3
High, Medium, Low
Sequel
1
Yes, No
Number of screens
1
Positive integer
Model
Development
process
The DM
Process
Map in IBM
SPSS
Modeler
Model
Assessment
process
Prediction Models
Individual Models
Performance
Measure
SVM
ANN
Ensemble Models
C&RT
Random
Forest
Boosted
Tree
Fusion
(Average)
Count (Bingo)
192
182
140
189
187
194
Count (1-Away)
104
120
126
121
104
120
Accuracy (% Bingo)
55.49%
52.60%
40.46%
54.62%
54.05%
56.07%
Accuracy (% 1-Away)
85.55%
87.28%
76.88%
89.60%
84.10%
90.75%
0.93
0.87
1.05
0.76
0.84
0.63
Standard deviation
* Training set: 1998 – 2005 movies; Test set: 2006 movies