A Look at Data Mining
Download
Report
Transcript A Look at Data Mining
A Look at Data Mining
Presented by:
Charles Hollingsworth
Flavia Peynado
Ritch Overton
DSc8020, Group Presentation, July 31, 2002
What is Data Mining?
It may be described as the process of
extracting previously unidentified, valid, and
actionable information from large databases
and then using the information to make crucial
business decisions.
Why the need for data mining?
Business environment is constantly
changing.
Customer Behavior Patterns
Market Saturation
New niche markets
Increased commoditization
Time to market
Shorter product life cycles
Increased competition and business risks
Drivers
The Customer
Products
Competition
Operations/Data
Assets.
Enablers
Data flood
Growth of data
warehousing
New IT solutions
New research in
machine learning
Process overview contd.
1.
2.
3.
4.
5.
6.
7.
Business Understanding
Data understanding
Data Preparation
Data Transformation
Data Mining
Analysis of results
Assimilation of results
Effort needed at each stage of data mining
60
50
40
30
20
10
Effort
0
ti
a
c
i
ntif
e
Id
on
o
of
bje
e
c ti v
pa
e
r
P
s
io
rat
no
ing
...
n
i
e
M
g
ta
ed
l
a
w
D
no
K
nd
a
ults
s
re
a ta
D
f
of
s
i
lys
a
An
Visualization
Goal is to provide a summary and overview of a dataset
Promotes Understanding: Deconstructive process
Promotes Trust: Constructive process
Narrows the gap between human and computer during
data analysis
Types of Visualization Tools
Histograms
Bar Charts
Time-Series Plots
Decision Trees
Scatter plots
Coxcomb Plots
Pie Charts
Stereograms
Line Plots
Mosley’s X-ray’s
Histogram
Graphically illustrates how many
observations fall in various categories
Histogram for Diam eter
100
80
60
40
20
Category
>0
.5
45
<=
0.
45
5
.4
55
-.
46
5
.4
65
-.
47
5
.4
75
-.
48
5
.4
85
-.
49
5
.4
95
-.
50
5
.5
05
-.
51
5
.5
15
-.
52
5
.5
25
-.
53
5
.5
35
-.
54
5
0
Bar Chart
Categories are placed on the vertical axis,
instead of the horizontal axis in a histogram
Scatter Plot
Graphical representation of the relationship
between two variables.
Scatter Plot
25
Salary
20
15
Salary
10
5
0
0
50
100
Domestic Gross
150
200
Pie Chart
Radii are used to divide a circle into wedges. The
resulting angles represent the values of the wedges.
Spring 2000 Salary Survey
<$30,000
$30,000 to $39,999
$40,000 to $49,999
$50,000 to $59,999
$60,000 to $69,999
More than $70,000
No Answer
Line Plot
Connects consecutive data points to
enhance visualization
Time-Series Plot: Playfair’s
•Helpful in forecasting future values
•Time variable is placed on the
horizontal axis
•Makes patterns in data more
apparent
•The area between two time-series
curves was emphasized to show the
difference between them,
representing the balance of trade.
Decision Trees
Conventions for Decision Trees:
1.
Composed of nodes (points in time) and branches (possible
decisions).
2.
Squares represent decision nodes, circles represent
probability nodes, triangles represent end nodes.
3.
Probabilities are listed on probability branches.
4.
Monetary values are listed on the branches where they
occur.
5.
Decision maker has no control over probability branches.
Decision Trees
Coxcomb Plot
In 1858, Florence Nightingale
constructed graphs of her own design,
which she called “Coxcombs".
The radii in a Coxcomb vary as opposed
to the angle of the wedge in a pie chart.
Stereogram
Luigi Perozzo, from the Annali di Statistica,
1880
The population of Sweden from 1750-1875
by age groups
Mosley’s X-ray’s
Caused Henry Mosley to discover that
the atomic number is more than a serial
number; that it has some physical basis.
Moseley proposed that the atomic
number was the number of electrons in
the atom of the specific element.
Other Visualization Tools
Doughnut
Area Chart
Box Plot
Radar
Algorithms
Predictive
Regression
Classification
Descriptive
Parallel Formulation
of Classification
Association Rule
Discovery
Sequential Pattern
Discovery Analysis
Clustering
Applying
Relevance to managers
Decreasing Costs
Valuing Appropriately
Effective Implementation
Conclusion
Converging Developments
Data compilation
Processing power
Maturing Algorithms
Visualization
Accessible Resources