Transcript Learning

Machine Learning Overview
Tamara Berg
CS 590-133 Artificial Intelligence
Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan
1
Announcements
• HW4 is due April 3
• Reminder: Midterm2 next Thursday
– Next Tuesday’s lecture topics will not be included (but material
will be on the final so attend!)
• Midterm review
– Monday, 5pm in FB009
Midterm Topic List
Be able to define the following terms and answer basic questions about them:
Reinforcement learning
–
–
–
–
–
–
–
Passive vs Active RL
Model-based vs model-free approaches
Direct utility estimation
TD Learning and TD Q-learning
Exploration vs exploitation
Policy Search
Application to Backgammon/Aibos/helicopters (at a high level)
Probability
–
–
–
–
–
Random variables
Axioms of probability
Joint, marginal, conditional probability distributions
Independence and conditional independence
Product rule, chain rule, Bayes rule
Midterm Topic List
Bayesian Networks General
– Structure and parameters
– Calculating joint and conditional probabilities
– Independence in Bayes Nets (Bayes Ball)
Bayesian Inference
– Exact Inference (Inference by Enumeration, Variable Elimination)
– Approximate Inference (Forward Sampling, Rejection Sampling, Likelihood
Weighting)
– Networks for which efficient inference is possible
Naïve Bayes
–
–
–
–
–
Parameter learning including Laplace smoothing
Likelihood, prior, posterior
Maximum likelihood (ML), maximum a posteriori (MAP) inference
Application to spam/ham classification
Application to image classification (at a high level)
Midterm Topic List
HMMs
–
–
–
–
Markov Property
Markov Chains
Hidden Markov Model (initial distribution, transitions, emissions)
Filtering (forward algorithm)
Machine Learning
– Unsupervised/supervised/semi-supervised learning
– K Means clustering
– Training, tuning, testing, generalization
Machine learning
Image source: https://www.coursera.org/course/ml
Machine learning
• Definition
– Getting a computer to do well on a task
without explicitly programming it
– Improving performance on a task based on
experience
Big Data!
What is machine learning?
• Computer programs that can learn from data
• Two key components
– Representation: how should we represent the data?
– Generalization: the system should generalize from its
past experience (observed data items) to perform well
on unseen data items.
Types of ML algorithms
• Unsupervised
– Algorithms operate on unlabeled examples
• Supervised
– Algorithms operate on labeled examples
• Semi/Partially-supervised
– Algorithms combine both labeled and unlabeled
examples
Clustering
– The assignment of objects into groups (aka clusters)
so that objects in the same cluster are more similar to
each other than objects in different clusters.
– Clustering is a common technique for statistical data
analysis, used in many fields, including machine
learning, data mining, pattern recognition, image
analysis and bioinformatics.
Euclidean distance, angle between data vectors, etc
K-means clustering
•
Want to minimize sum of
squared Euclidean distances
between points xi and their
nearest cluster centers mk
D( X , M ) 
  (x  m )
i
cluster k point i in
cluster k
k
2
Source: Hinrich Schutze
Hierarchical clustering strategies
• Agglomerative clustering
• Start with each data point in a separate cluster
• At each iteration, merge two of the “closest” clusters
• Divisive clustering
• Start with all data points grouped into a single cluster
• At each iteration, split the “largest” cluster
P
P
P
Produces a hierarchy of clusterings
P
P
Divisive Clustering
• Top-down (instead of bottom-up as in Agglomerative
Clustering)
• Start with all data points in one big cluster
• Then recursively split clusters
• Eventually each data point forms a cluster on its own.
Flat or hierarchical clustering?
• For high efficiency, use flat clustering (e.g. k means)
• For deterministic results: hierarchical clustering
• When a hierarchical structure is desired: hierarchical
algorithm
• Hierarchical clustering can also be applied if K cannot be
predetermined (can start without knowing K)
Source: Hinrich Schutze
Clustering in Action – example
from computer vision
Recall: Bag of Words Representation
 Represent document as a “bag of words”
Bag-of-features models
Slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Bags of features for image
classification
1. Extract features
Bags of features for image
classification
1. Extract features
2. Learn “visual vocabulary”
Bags of features for image
classification
1. Extract features
2. Learn “visual vocabulary”
3. Represent images by frequencies of
“visual words”
1. Feature extraction
…
2. Learning the visual vocabulary
…
2. Learning the visual vocabulary
…
Clustering
2. Learning the visual vocabulary
Visual vocabulary
…
Clustering
Example visual vocabulary
Fei-Fei et al. 2005
frequency
3. Image representation
…..
Visual words
Types of ML algorithms
• Unsupervised
– Algorithms operate on unlabeled examples
• Supervised
– Algorithms operate on labeled examples
• Semi/Partially-supervised
– Algorithms combine both labeled and unlabeled
examples
Example: Sentiment analysis
http://gigaom.com/2013/10/03/stanford-researchers-to-open-source-model-they-say-has-nailed-sentiment-analysis/
http://nlp.stanford.edu:8080/sentiment/rntnDemo.html
Example: Image classification
input
desired output
apple
pear
tomato
cow
dog
horse
http://yann.lecun.com/exdb/mnist/index.html
Surface wave magnitude
Example: Seismic data
Earthquakes
Nuclear explosions
Body wave magnitude
The basic classification
framework
y = f(x)
output classification
function
input
• Learning: given a training set of labeled examples
{(x1,y1), …, (xN,yN)}, estimate the parameters of the
prediction function f
• Inference: apply f to a never before seen test example
x and output the predicted value y = f(x)
Naïve Bayes classifier
f ( x )  arg max y P( y | x )
 arg max y P( y ) P( x | y )
 arg max y P( y ) P( xd | y )
d
A single
dimension or
attribute of x
Example: Image classification
Car
Input: Image Representation
Classifier (e.g. Naïve
Bayes, Neural Net, etc
Output:
Predicted
label
Example: Training and testing
Training set (labels known)
Test set (labels
unknown)
• Key challenge: generalization to unseen
examples
Some classification methods
Nearest neighbor
Neural networks
106 examples
Shakhnarovich, Viola, Darrell 2003
Berg, Berg, Malik 2005
…
LeCun, Bottou, Bengio, Haffner 1998
Rowley, Baluja, Kanade 1998
…
Support Vector Machines and Kernels
Conditional Random Fields
Guyon, Vapnik
Heisele, Serre, Poggio, 2001
…
McCallum, Freitag, Pereira 2000
Kumar, Hebert 2003
…
Classification … more soon
Types of ML algorithms
• Unsupervised
– Algorithms operate on unlabeled examples
• Supervised
– Algorithms operate on labeled examples
• Semi/Partially-supervised
– Algorithms combine both labeled and unlabeled
examples
Supervised learning has many successes
•
•
•
•
•
•
recognize speech,
steer a car,
classify documents
classify proteins
recognizing faces, objects in images
...
Slide Credit: Avrim Blum
However, for many problems, labeled
data can be rare or expensive.
Need to pay someone to do it, requires special testing,…
Unlabeled data is much cheaper.
Slide Credit: Avrim Blum
However, for many problems, labeled
data can be rare or expensive.
Need to pay someone to do it, requires special testing,…
Unlabeled data is much cheaper.
Speech
Customer modeling
Images
Protein sequences
Medical outcomes
Web pages
Slide Credit: Avrim Blum
However, for many problems, labeled
data can be rare or expensive.
Need to pay someone to do it, requires special testing,…
Unlabeled data is much cheaper.
[From Jerry Zhu]
Slide Credit: Avrim Blum
However, for many problems, labeled
data can be rare or expensive.
Need to pay someone to do it, requires special testing,…
Unlabeled data is much cheaper.
Can we make use of cheap
unlabeled data?
Slide Credit: Avrim Blum
Semi-Supervised Learning
Can we use unlabeled data to augment a
small labeled sample to improve learning?
But maybe still has
useful regularities that
we can use.
But unlabeled data is
missing the most
important info!!
But…
But…
But…
Slide Credit: Avrim Blum