DAT439: Data Mining Algorithms and Usage
Download
Report
Transcript DAT439: Data Mining Algorithms and Usage
Data Mining
in SQL Server 2000
and Yukon
Richard Lees
[email protected]
RichardLees.com.au
Agenda
What isn’t Data Mining
Demo
What is Data Mining
Demo
What’s Coming in Yukon
Create a data mine
4 ways to view data mine
Demo
Questions
Throughout
Which Questions are Data Mining?
Who are our biggest customers?
What are customers buying with cigars?
What are the customer retention levels of our
branches?
Which customers have bought olives, feta cheese
but no ciabatta bread?
Which regions have the highest male/female ratio of
single 20 somethings?
Which region has lowest customer retention levels
and list out lost customers?
Demonstration
Ad hoc query
Drill through to details
Business Intelligence tool
History of OLAP and Data
Mining
19xx
Custom
Data
Mining
available
to
Fortune
100
1993
Codd’s
Defined
12 rules
for
OLAP
1998
Microsoft
SQL 7
• OLAP v1
1999
2000
OLAP on
the Web
• ThinSlicer
• Many
others
Microsoft
SQL 2000
• OLAP v2
• Data Mining
• English
Query
SAS and SPSS offer Data Mining tools
To those who can afford
Future
Data Mining
V2
• SQL 2005
• BI Tools
Sample Data I Will be Using
Wellington
Libraries Loan DB
We wanted sample data for data mining
They were just writing off a data
warehouse project
“The experts have spent 12 months trying
to import data!”
“How could Microsoft help us?
The data are in IBM databases!”
What is Data Mining?
“Data mining is the use of powerful software tools to
discover significant traits or relationships, from databases or
data warehouses and often used to predict future events”
It exploits
statistical algorithms such as decision trees, clustering,
sequence clustering, association, naïve bayes, neural
network and time series algorithms
Once the “knowledge” is extracted it:
Can be used to discover
Can be used to predict values of other cases
OLAP versus Data Mining
OLAP
Data Mining
Is about fast ad hoc querying
Analysis by dimensions and measures
Gives precise answers
May use rdbms or OLAP source
Is about discovering and predicting
Gives imprecise answers
OLAP is not a prerequisite for data mining, but it almost always
comes first
(learning to ride a bike before a car)
Clusters
Annual
Income
Age
Library Clusters
Decision Trees
Input data
About cases
Discovering relationships
Predicting outcomes
Data Mining
Demo with real data
Build a data mine
View data mine
1.
2.
3.
4.
5.
Browse dependencies
Browse decision trees
Query using MDX
Query using ThinMiner
Batch update
Elite
Embedded
Uses of Data Mining
Risk assessment
Claim likelihood
Customer profitability predictions
Fraud detection
Treatment efficacy
Product suggestions
Web shopping
Call centre tool
Successful Data Mining Projects
Two additional Critical Success Factors
1.
2.
Discover something interesting
Profit from discovery
For example
ComputerFleet
(Localhost)
What’s Coming in Yukon
Decision Trees
Sequence Clustering
Clustering
Association
Time Series
Naïve Bayes
Confusion
Matrix
Neural Networks
Lift Charts
Naïve Bayes
NOK
OK
.90 (.27)
.27 /.41
J NOK
(.3x.9)+(.7x.2)
=.41
=.67
.14 /.41
=.33
.30
.10 (.03)
.03 /.59
.70
.80 (.56)
Actual
=.05
.20 (.14)
Actual declared
J OK
(.3x.1)+(.7x.8
)
=.59
Judged
.56 /.59
=.95
Posterior
(actual)
Demonstration
Yukon
Development
New algorithms
Lift chart
Profit curve
Query tool
Questions:
References
Microsoft Research http://Research.Microsoft.com/research/pubs
Richard Lees
[email protected]
http://RichardLees.com.au