Lecture 8 , Feb - 16

Download Report

Transcript Lecture 8 , Feb - 16

Machine Learning
Mehdi Ghayoumi
MSB rm 132
[email protected]
Ofc hr: Thur, 11-12 a
Machine Learning
Machine Learning
• “Learning denotes changes in a system that ... enable a
system to do the same task more efficiently the next time.”
–Herbert Simon
• “Learning is constructing or modifying representations of
what is being experienced.”
•
–Ryszard Michalski
• “Learning is making useful changes in our minds.”
•
–Marvin Minsky
Machine Learning
• Decision Tree
• Hunt and colleagues use exhaustive search decision-tree
methods (CLS) to model human concept learning in the
1960’s.
• In the late 70’s, Quinlan developed ID3 with the information
gain heuristic to learn expert systems from examples.
• Quinlan’s updated decision-tree package (C4.5) released in
1993.
Machine Learning
 Classification:
predict a categorical output from categorical and/or real inputs
 Decision trees are most popular data mining tool
 Easy to understand
 Easy to implement
 Easy to use
 Computationally cheap
Machine Learning
• Extremely popular method
– Credit risk assessment
– Medical diagnosis
– Market analysis
– Bioinformatics
– Chemistry …
Machine Learning
Machine Learning
• Internal decision nodes
– Univariate: Uses a single attribute, xi
– Multivariate: Uses all attributes, x
• Leaves
– Classification: Class labels, or proportions
– Regression: Numeric; r average, or local fit
• Learning is greedy; find the best split recursively
Machine Learning
• Occam’s razor: (year 1320)
– Prefer the simplest hypothesis that fits the data.
– The principle states that the explanation of any phenomenon should make
as few assumptions as possible, eliminating those that make no difference
in the observable predictions of the explanatory hypothesis or theory.
• Albert Einstein:
Make everything as simple as possible, but not simpler. Why?
– It’s a philosophical problem.
– Simple explanation/classifiers are more robust
– Simple classifiers are more understandable
Machine Learning
• Objective:
Shorter trees are preferred over larger Trees
• Idea:
want attributes that classifies examples well. The best
attribute is selected. Select attribute which partitions the
learning set into subsets as “pure” as possible.
Machine Learning
Machine Learning
 Each branch corresponds to attribute value
 Each internal node has a splitting predicate
 Each leaf node assigns a classification
Machine Learning
• Entropy (disorder, impurity) of a set of examples, S, relative
to a binary classification is:
Entropy( S )   p1 log 2 ( p1 )  p0 log 2 ( p0 )
where p1 is the fraction of positive examples in S and p0 is
the fraction of negatives.
Machine Learning
Entropy(S) 
 9
 5
14
log( 9
log( 5
14
 0.94
14
14
)
)
Machine Learning
• If all examples are in one category, entropy is zero (we define
0log(0)=0)
• If examples are equally mixed (p1=p0=0.5), entropy is a maximum of 1.
• Entropy can be viewed as the number of bits required on average to
encode the class of an example in S where data compression (e.g.
Huffman coding) is used to give shorter codes to more likely cases.
• For multi-class problems with c categories, entropy generalizes to:
c
Entropy( S )    pi log 2 ( pi )
i 1
Machine Learning
Thank you!