Knowledge Discovery in Database

Download Report

Transcript Knowledge Discovery in Database

Knowledge Discovery in
Database (KDD)
Knowledge Discovery Process
• The whole process of extraction of implicit,
previously unknown and potentially useful
knowledge from a large database
– It includes data selection, cleaning,
enrichment, coding, data mining, and
reporting
– Data Mining is the key stage of Knowledge
Discovery Process
• The process of finding the desired information from
large database
Knowledge Discovery Process
• Example: the database of a magazine
publisher which sells five types of magazines
– cars, houses, sports, music and comics
– Data mining: Find interesting customer properties
• What is the profile of a reader of a car magazine?
• Is there any correlation between an interest in cars and
an interest in comics?
• Apply knowledge discovery process
Data Selection
• Select the information about people who
have subscribed to a magazine
Cleaning
• Pollutions: Type errors, moving from one place
to another without notifying change of address,
people give incorrect information about
themselves
– Pattern Recognition Algorithms
Cleaning
• Lack of domain consistency
Enrichment
• Need extra information about the clients
consisting of date of birth, income, amount
of credit, and whether or not an individual
owns a car or a house
Enrichment
• The new information need to be easily
joined to the existing client records
– Extract more knowledge
Coding
• We select only those records that have
enough information to be of value (row)
• Project the fields in which we are interested
(column)
Coding
Coding
• Code the information which is too detailed
–
–
–
–
–
–
Address to region
Birth date to age
Divide income by 1000
Divide credit by 1000
Convert cars yes-no to 1-0
Convert purchase date to month numbers
starting from 1990
• The way in which we code the information will
determine the type of patterns we find
• Coding has to be performed repeatedly in order to get
the best results
Coding
• We are interested in the relationships
between readers of different magazines
– Perform flattening operation
Knowledge Discovery Process
Business-Question-Driven Process
Steps of a KDD Process
• Learning the Application Domain
– Relevant Prior Knowledge and Goals of Application
• Creating a Target Data Set
– Data Selection
• Data Cleaning and Enrichment
– May Take 60% of Effort
• Data Reduction and Transformation (Coding)
– Find Useful Features, Dimensionality Reduction
• Choosing Functions of Data Mining
– Summarization, Association, Classification, Regression, Clustering, …
• Choosing the mining algorithm(s)
• Data mining
– Search for Patterns of Interest
• Pattern Evaluation and Knowledge Presentation
– Visualization, Transformation, Removing Redundant Patterns, etc.
• Use of Discovered Knowledge
Exercises 1
1. 何謂 RFM 指標? 功能為何?
2. 何謂資料探勘 (Data Mining)?目標為
何?
3. 為何小型公司不需要資料探勘 ?
4. 大型公司要如何了解客戶?
5. 請描述一個資料探勘的應用實例 (不
可以與投影片的例子相同).
6. 請列出並解釋Knowledge Discovery
in Database (KDD) 處理的步驟.