Database Clustering and Summary Generation

Download Report

Transcript Database Clustering and Summary Generation

Promising “Newer” Technologies to Cope with the
Information Flood




Knowledge Discovery and Data Mining (KDD)
Agent-based Technologies
Ontologies and Knowledge Brokering
Non-traditional data analysis techniques
Model Generation
As an Example
To Explain /
Discuss Technologies
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
1
Why Do We Need so many
Data Mining / Analysis Techniques?



No generally good technique exists.
Different methods make different assumptions with respect to the
data set to be analyzed
Cross fertilization between different methods is desirable and
frequently helpful in obtaining a deeper understanding of the
analyzed dataset.
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
2
Data Mining and Business Intelligence
Increasing potential
to support
business decisions
Making
Decisions
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
End User
Business
Analyst
Data
Analyst
Data Exploration
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
OLAP, MDA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
DBA
3
Example: Decision Tree Approach
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
4
Decision Tree Approach2
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
5
Decision Trees
Example:
• Conducted survey to see what customers were
interested in new model car
• Want to select customers for advertising campaign
sale
custId
c1
c2
c3
c4
c5
c6
car
taurus
van
van
taurus
merc
taurus
age
27
35
40
22
50
25
city newCar
sf
yes
la
yes
sf
yes
sf
yes
la
no
la
no
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
training
set
6
One Possibility
sale
custId
c1
c2
c3
c4
c5
c6
age<30
Y
N
city=sf
Y
likely
car
taurus
van
van
taurus
merc
taurus
age
27
35
40
22
50
25
city newCar
sf
yes
la
yes
sf
yes
sf
yes
la
no
la
no
car=van
N
unlikely
Y
likely
N
unlikely
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
7
Another Possibility
sale
custId
c1
c2
c3
c4
c5
c6
car=taurus
Y
N
city=sf
Y
likely
car
taurus
van
van
taurus
merc
taurus
age
27
35
40
22
50
25
city newCar
sf
yes
la
yes
sf
yes
sf
yes
la
no
la
no
age<45
N
unlikely
Y
likely
N
unlikely
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
8
Summary KDD

KDD: discovering interesting patterns from large amounts of data

A natural evolution of database technology, in great demand, with wide
applications

A KDD process includes data cleaning, data integration, data selection,
transformation, data mining, pattern evaluation, and knowledge presentation

Mining can be performed in a variety of information repositories

Data mining functionalities: characterization, discrimination, association,
classification, clustering, outlier and trend analysis, etc.

Multi-disciplinary activity

Important Issues: KDD-methodologies and user-interactions, scalability, tool
use and tool integration, preprocessing, interpretation of results, finding good
parameter settings when running data mining tools,…
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
9
Where to Find References?





Data mining and KDD (SIGKDD member CDROM):
– Conference proceedings: KDD, and others, such as PKDD, PAKDD, etc.
– Journal: Data Mining and Knowledge Discovery
Database field (SIGMOD member CD ROM):
– Conference proceedings: ACM-SIGMOD, ACM-PODS, VLDB, ICDE, EDBT,
DASFAA
– Journals: ACM-TODS, J. ACM, IEEE-TKDE, JIIS, etc.
AI and Machine Learning:
– Conference proceedings: Machine learning, AAAI, IJCAI, etc.
– Journals: Machine Learning, Artificial Intelligence, etc.
Statistics:
– Conference proceedings: Joint Stat. Meeting, etc.
– Journals: Annals of statistics, etc.
Visualization:
– Conference proceedings: CHI, etc.
– Journals: IEEE Trans. visualization and computer graphics, etc.
Christoph F. Eick: Introduction Knowledge Discovery and Data Mining (KDD)
10