Transcript Part 3
Introduction to
Machine Learning
and Text Mining
Carolyn Penstein Rosé
Language Technologies Institute/
Human-Computer Interaction
Institute
Naïve Approach: When all you have
is a hammer…
Data
Target
Representation
Slightly less naïve approach:
Aimless wandering…
Data
Target
Representation
Expert Approach: Hypothesis driven
Data
Target
Representation
Suggested Readings
Witten, I. H., Frank, E., Hall,
M. (2011). Data Mining:
Practical Machine Learning
Tools and Techniques, third
edition, Elsevier: San
Francisco
What is machine learning?
Automatically or semi-automatically
Inducing
concepts (i.e., rules) from data
Finding patterns in data
Explaining data
Making predictions
Data
Learning Algorithm
Model
New Data
Classification Engine
Prediction
If Outlook = sunny, no
else if Outlook = overcast, yes
else if Outlook = rainy and Windy = TRUE, no
else yes
Perfect on
training data
If Outlook = sunny, no
else if Outlook = overcast, yes
else if Outlook = rainy and Windy = TRUE, no
else yes
Performance
on
Not perfect on
training
testing data?
data
If Outlook = sunny, no
else if Outlook = overcast, yes
else if Outlook = rainy and Windy = TRUE, no
else yes
IMPORTANT!
If you evaluate the performance
of your rule on the same data
you trained on, you won’t
get an accurate estimate of
how well it will do on new data.
Simple Cross Validation
Fold: 1
TEST
1
TRAIN
2
TRAIN
3
TRAIN
4
TRAIN
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
First train on 2, 3, 4, 5, 6,7
and apply trained model to
1
The results is Accuracy1
Simple Cross Validation
Fold: 2
TRAIN
1
TEST
2
TRAIN
3
TRAIN
4
TRAIN
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
First train on 1, 3, 4, 5, 6,7
and apply trained model to
2
The results is Accuracy2
Simple Cross Validation
Fold: 3
TRAIN
1
TRAIN
2
TEST
3
TRAIN
4
TRAIN
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
First train on 1, 2, 4, 5, 6,7
and apply trained model to
3
The results is Accuracy3
Simple Cross Validation
Fold: 4
TRAIN
1
TRAIN
2
TRAIN
3
TEST
4
TRAIN
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
First train on 1,2, 3, 5, 6,7
and apply trained model to
4
The results is Accuracy4
Simple Cross Validation
Fold: 5
TRAIN
1
TRAIN
2
TRAIN
3
TRAIN
4
TEST
5
TRAIN
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
First train on 1, 2, 3, 4, 6,7
and apply trained model to
5
The results is Accuracy5
Simple Cross Validation
Fold: 6
TRAIN
1
TRAIN
2
TRAIN
3
TRAIN
4
TRAIN
5
TEST
6
TRAIN
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
First train on 1, 2, 3, 4, 5, 7
and apply trained model to
6
The results is Accuracy6
Simple Cross Validation
Fold: 7
TRAIN
1
TRAIN
2
TRAIN
3
TRAIN
4
TRAIN
5
TRAIN
6
TEST
7
Let’s say your data has
attributes A, B, and C
You want to train a rule to
predict D
First train on 1, 2, 3, 4, 5, 6
and apply trained model to 7
The results is Accuracy7
Finally: Average Accuracy1
through Accuracy7
Working with Text
Represent text as a vector where each
position corresponds to a term
This is called the “bag of words” approach
Cheese
Cows
Eat
Hamsters
Make
Seeds
Cows make cheese.
110010
Hamsters eat seeds.
001101
Represent text as a vector where each
position corresponds to a term
This is called the “bag of words” approach
But same representation
for “Cheese makes cows.”!
Cheese
Cows
Eat
Hamsters
Make
Seeds
Cows
make cheese.
110010
Hamsters
001101
eat seeds.
Part of Speech Tagging
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
1. CC Coordinating
conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition/subord
7. JJ Adjective
8. JJR Adjective,
comparative
9. JJS Adjective, superlative
10.LS List item marker
11.MD Modal
12.NN Noun, singular or
mass
13.NNS Noun, plural
14.NNP Proper noun,
singular
15.NNPS Proper noun, plural
16.PDT Predeterminer
17.POS Possessive ending
18.PRP Personal pronoun
19.PP Possessive pronoun
20.RB Adverb
21.RBR Adverb, comparative
22.RBS Adverb, superlative
Part of Speech Tagging
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
23.RP Particle
24.SYM Symbol
25.TO to
26.UH Interjection
27.VB Verb, base form
28.VBD Verb, past tense
29.VBG Verb,
gerund/present participle
30.VBN Verb, past participle
31.VBP Verb, non-3rd ps.
sing. present
32.VBZ Verb, 3rd ps. sing.
present
33.WDT wh-determiner
34.WP wh-pronoun
35.WP Possessive whpronoun
36.WRB wh-adverb
Basic Types of Features
Unigram
Single
words
prefer, sandwhich, take
Bigram
Pairs
of words next to each other
Machine_learning, eat_wheat
POS-Bigram
Pairs
of POS tags next to each other
DT_NN, NNP_NNP
Keep this picture in mind…
Machine learning isn’t magic
But it can be useful for
identifying meaningful patterns
in your data when used
properly
Proper use requires insight into
your data
?