Machine Learning, Data Mining

Download Report

Transcript Machine Learning, Data Mining

Copyright R. Weber
Machine Learning, Data Mining
INFO 629
Dr. R. Weber
The picnic game
• How did you reason to find the rule?
Copyright R. Weber
• According to Michalski (1983) A theory and
methodology of inductive learning. In Machine
Learning, chapter 4,
“inductive learning is a heuristic search through a
space of symbolic descriptions (i.e., generalizations)
generated by the application of rules to training
instances.”
Learning
• Rote Learning
– Learn multiplication tables
• Supervised Learning
– Examples are used to help a program identify a concept
– Examples are typically represented with attribute-value pairs
– Notion of supervision originates from guidance from
examples
• Unsupervised Learning
Copyright R. Weber
– Human efforts at scientific discovery, theory formation
Inductive Learning
• Learning by generalization
• Performance of classification tasks
– Classification, categorization, clustering
• Rules indicate categories
• Goal:
Copyright R. Weber
– Characterize a concept
Concept Learning is a Form of
Inductive Learning
•Learner uses:
Copyright R. Weber
–positive examples (instances ARE examples of a
concept) and
–negative examples (instances ARE NOT examples of
a concept)
Concept Learning
Copyright R. Weber
• Needs empirical validation
• Dense or sparse data determine quality of
different methods
Validation of Concept Learning i
• The learned concept should be able to correctly
classify new instances of the concept
Copyright R. Weber
– When it succeeds in a real instance of the concept it
finds true positives
– When it fails in a real instance of the concept it
finds false negatives
Validation of Concept Learning ii
• The learned concept should be able to correctly
classify new instances of the concept
Copyright R. Weber
– When it succeeds in a counterexample it finds true
negatives
– When it fails in a counterexample it finds false
positives
Basic classification tasks
Copyright R. Weber
• Classification
• Categorization
• Clustering
Copyright R. Weber
Categorization
Copyright R. Weber
Classification
Copyright R. Weber
Clustering
Clustering
•
•
•
•
Copyright R. Weber
Data analysis method applied to data
Data should naturally possess groupings
Goal: group data into clusters
Resulting clusters are collections where objects within a cluster
are similar to each other
• Objects outside the cluster are dissimilar to objects inside
• Objects from one cluster are dissimilar to objects in other
clusters
• Distance measures are used to compute similarity
Rule Learning
Copyright R. Weber
• Learning widely used in data mining
• Version Space Learning is a search method to
learn rules
• Decision Trees
Version Space i
Copyright R. Weber
A=1,B=1,C=1  Outcome=1
A=0,B=.5,C=.5  Outcome=0
A=0,B=0,C=.3  Outcome=.5
• Creates tree that includes all possible combinations
• Does not learn for rules with disjunctions (i.e. OR
statements)
• Incremental method, trains additional data without the
need to retrain all data
Decision trees
Copyright R. Weber
• Knowledge representation formalism
• Represent mutually exclusive rules (disjunction)
• A way of breaking up a data set into classes or
categories
• Classification rules that determine, for each instance
with attribute values, whether it belongs to one or
another class
Decision trees
consist of:
-leaf nodes (classes)
- decision nodes
(tests on attribute values)
-from decision nodes
branches grow for each
possible outcome of the test
From Cawsey, 1997
Decision tree induction
Copyright R. Weber
• Goal is to correctly classify all example data
• Several algorithms to induce decision trees: ID3
(Quinlan 1979) , CLS, ACLS, ASSISTANT, IND,
C4.5
• Constructs decision tree from past data
• Not incremental
• Attempts to find the simplest tree (not guaranteed
because it is based on heuristics)
ID3 algorithm
•From:
– a set of target classes
–Training data containing objects of more than one
class
Copyright R. Weber
•ID3 uses test to refine the training data set into
subsets that contain objects of only one class
each
•Choosing the right test is the key
How does ID3 chooses tests
Copyright R. Weber
• Information gain or ‘minimum entropy’
• Maximizing information gain corresponds to
minimizing entropy
•Predictive features (good indicators of the
outcome)
Copyright R. Weber
ID3 algorithm
No.
Student
First last year?
Male?
Works hard?
Drinks?
First this year?
1
Richard
yes
yes
no
yes
yes
2
Alan
yes
yes
yes
no
yes
3
Alison
no
no
yes
no
yes
4
Jeff
no
yes
no
yes
no
5
Gail
yes
no
yes
yes
yes
6
Simon
no
yes
yes
yes
no
Copyright R. Weber
ID3 algorithm
No.
Student
First last year?
Male?
Works hard?
Drinks?
First this year?
1
Richard
yes
yes
no
yes
yes
2
Alan
yes
yes
yes
no
yes
3
Alison
no
no
yes
no
yes
4
Jeff
no
yes
no
yes
no
5
Gail
yes
no
yes
yes
yes
6
Simon
no
yes
yes
yes
no
Copyright R. Weber
ID3 algorithm
No.
Student
First last year?
Male?
Works hard?
Drinks?
First this year?
1
Richard
yes
yes
no
yes
yes
2
Alan
yes
yes
yes
no
yes
3
Alison
no
no
yes
no
yes
4
Jeff
no
yes
no
yes
no
5
Gail
yes
no
yes
yes
yes
6
Simon
no
yes
yes
yes
no
Copyright R. Weber
ID3 algorithm
No.
Student
First last year?
Male?
Works hard?
Drinks?
First this year?
1
Richard
yes
yes
no
yes
yes
2
Alan
yes
yes
yes
no
yes
3
Alison
no
no
yes
no
yes
4
Jeff
no
yes
no
yes
no
5
Gail
yes
no
yes
yes
yes
6
Simon
no
yes
yes
yes
no
Copyright R. Weber
ID3 algorithm
No.
Student
First last year?
Male?
Works hard?
Drinks?
First this year?
1
Richard
yes
yes
no
yes
yes
2
Alan
yes
yes
yes
no
yes
3
Alison
no
no
yes
no
yes
4
Jeff
no
yes
no
yes
no
5
Gail
yes
no
yes
yes
yes
6
Simon
no
yes
yes
yes
no
Copyright R. Weber
ID3 algorithm
No.
Student
First last year?
Male?
Works hard?
Drinks?
First this year?
1
Richard
yes
yes
no
yes
yes
2
Alan
yes
yes
yes
no
yes
3
Alison
no
no
yes
no
yes
4
Jeff
no
yes
no
yes
no
5
Gail
yes
no
yes
yes
yes
6
Simon
no
yes
yes
yes
no
Copyright R. Weber
ID3 algorithm
No.
Student
First last year?
Male?
Works hard?
Drinks?
First this year?
1
Richard
yes
yes
no
yes
yes
2
Alan
yes
yes
yes
no
yes
3
Alison
no
no
yes
no
yes
4
Jeff
no
yes
no
yes
no
5
Gail
yes
no
yes
yes
yes
6
Simon
no
yes
yes
yes
no
Explanation-based
learning
Copyright R. Weber
• Incorporates domain knowledge into the
learning process
• Feature values are assigned a relevance factor
if their values are consistent with domain
knowledge
• Features that are assigned relevance factors
are considered in the learning process
Familiar Learning Task
•
•
•
•
Copyright R. Weber
Learn relative importance of features
Goal: learn individual weights
Commonly used in case-based reasoning
Methods include a similarity measure to get feedback
about verify their relative importance: feedback
methods
• Search methods: gradient descent
• ID3
Classification
using Naive Bayes
• Naïve Bayes classifier uses two sources of information to classify a new
instance
– The distribution of the rtaining dataset (prior probability)
– The region surrounding the new instance in the dataset (likelihood)
Copyright R. Weber
• Naïve because assumes conditional independence not always applicable
• It is made to simplify the computation and in this sense considered to be
“Naïve”.
• Conditional independence reduces the requirement for large number of
observations
• Bias in estimating probabilities often may not make a difference in practice - it is the order of the probabilities, not their exact values, that determine the
classifications.
• Comparable in performance with classification trees and with neural
networks
• Highly accurate and fast when applied to large databases
• Some links:
– http://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm
– http://www.statsoft.com/textbook/stnaiveb.html
KDD: definition
Knowledge Discovery in Databases (KDD) is the
non-trivial process of identifying valid, novel, and
potential useful and understandable patterns in data.
(R.Feldman,2000)
KDD is the non-trivial process of identifying valid,
novel,
potentially
useful,
and
ultimately
understandable patterns in data (Fayad, PiatetskyShapiro, Smyth 1996 p. 6).
Copyright R. Weber
Data mining is one of the steps in the KDD process.
Text mining concerns applying data mining
techniques to unstructured text.
SELECTED
PROCESSED
TRANSFORMED
DATA
DATA
DATA
DATA
The KDD Process
preprocessing
Data mining
filtering
transformation
interpretation
patterns
KNOWLEDGE
browsing
Data mining tasks i
• Predictive modeling/risk
assessment
Classification, decision trees
• Database segmentation
Copyright R. Weber
Kohonen nets, clustering techniques
Data mining tasks ii
• Link analysis
Rules:
• Association generation
• Relationships between entities
• Deviation detection
Copyright R. Weber
• How things change over time, trends
KDD applications
• Fraud detection
– Telecom (calling cards, cell phones)
– Credit cards
– Health insurance



Loan approval
Investment analysis
Marketing and sales data analysis
Copyright R. Weber



Identify potential customers
Effectiveness of sales campaign
Store layout
Text mining
Copyright R. Weber
The problem starts with a query and
the solution is a set of information
(e.g., patterns, connections, profiles,
trends) contained in several different
texts that are potentially relevant to
the initial query.
Text mining applications
• IBM Text Navigator
– Cluster documents by content;
– Each document is annotated by the 2 most frequently
used words in the cluster;
Copyright R. Weber
• Concept Extraction (Los Alamos)
– Text analysis of medical records;
– Uses a clustering approach based on trigram
representation;
– Documents in vectors, cosine for comparison;