PowerPoint 簡報

Download Report

Transcript PowerPoint 簡報

Introduction to Data Mining
by
Yen-Hsien Lee
Department of Information Management
College of Management
National Sun Yat-Sen University
March 4, 2003
Outline
•
•
•
•
What is Data Mining
Data Mining Process
Properties of Data Mining Applications
Data Mining Techniques
What is Data Mining?
• Data mining is the process of extracting
previously unknown, valid, and actionable
patterns, knowledge, or high-level information
from large databases.
Data Mining Process
Interpretation/
Evaluation
Mining
Transformation
Knowledge
Preprocessing
Patterns
Selection
Data
Transformed
Data
Target
Data
Preprocessed
Data
Properties of Data Mining Applications
• Business-question-driven process
• Multiple data mining technique potentially
appropriate for a data mining task
• Hybrid approach for better data mining
results
• Importance of data prospecting (selection)
and cleaning (preprocessing)
• Unavoided knowledge post-processing
• etc.
Data Mining Techniques
• Classification
– Process that establishes classes with attributes
from a set of instances (called training examples)
in a database.
• Clustering Analysis
– Process of creating a partition so that all members
of each cluster are similar according to some
metric (e.g., distance between objects).
• Association Rule Analysis
– Discovery of association rules showing attributevalue conditions that occur frequently together in
a given set of data
Data Mining Techniques (Cont’d)
• Sequential Pattern Analysis
– Discovery the sequential occurrence of items
across ordered transactions over time.
• Time-series Similarity Analysis
– To find those sequences that are similar to a query
sequence Q (called whole matching), or to identify
the sequences that contain subsequences similar to
Q (called subsequence matching).
• Link Analysis
• Text Mining