learningx - Tamara L Berg

Download Report

Transcript learningx - Tamara L Berg

Machine Learning Overview
Tamara Berg
CS 560 Artificial Intelligence
Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan
1
Announcements
• HW3 due Oct 29, 11:59pm
Machine learning
Image source: https://www.coursera.org/course/ml
Machine learning
• Definition
– Getting a computer to do well on a task
without explicitly programming it
– Improving performance on a task based on
experience
Big Data!
What is machine learning?
• Computer programs that can learn from data
• Two key components
– Representation: how should we represent the data?
– Generalization: the system should generalize from its
past experience (observed data items) to perform well
on unseen data items.
Types of ML algorithms
• Unsupervised
– Algorithms operate on unlabeled examples
• Supervised
– Algorithms operate on labeled examples
• Semi/Partially-supervised
– Algorithms combine both labeled and unlabeled
examples
Clustering
– The assignment of objects into groups (aka clusters)
so that objects in the same cluster are more similar to
each other than objects in different clusters.
– Clustering is a common technique for statistical data
analysis, used in many fields, including machine
learning, data mining, pattern recognition, image
analysis and bioinformatics.
Euclidean distance, angle between data vectors, etc
K-means clustering
•
Want to minimize sum of
squared Euclidean distances
between points xi and their
nearest cluster centers mk
D( X , M ) 
  (x  m )
i
cluster k point i in
cluster k
k
2
K-means
x’s – indicate initialization
for 3 cluster centers
Iterate until convergence:
1) Compute assignment of
data points to cluster
centers
2) Update cluster centers
with mean of assigned
points
4
3.5
3
2.5
2
1.5
1
0.5
0
x
x
x
0
1
2
3
Flat vs Hierarchical Clustering
• Flat algorithms
– Usually start with a random partitioning of docs into
groups
– Refine iteratively
– Main algorithm: k-means
• Hierarchical algorithms
– Create a hierarchy
– Bottom-up: agglomerative
– Top-down: divisive
Hierarchical clustering strategies
• Agglomerative clustering
• Start with each data point in a separate cluster
• At each iteration, merge two of the “closest” clusters
• Divisive clustering
• Start with all data points grouped into a single cluster
• At each iteration, split the “largest” cluster
P
P
P
Produces a hierarchy of clusterings
P
P
Divisive Clustering
• Top-down (instead of bottom-up as in Agglomerative
Clustering)
• Start with all data points in one big cluster
• Then recursively split clusters
• Eventually each data point forms a cluster on its own.
Flat or hierarchical clustering?
• For high efficiency, use flat clustering (e.g. k means)
• For deterministic results: hierarchical clustering
• When a hierarchical structure is desired: hierarchical
algorithm
• Hierarchical clustering can also be applied if K cannot be
predetermined (can start without knowing K)
Clustering in Action – example
from computer vision
Recall: Bag of Words Representation
 Represent document as a “bag of words”
Bag of features for images
 Represent images as a “bag of words”
Bags of features for image
classification
1. Extract features
Bags of features for image
classification
1. Extract features
2. Learn “visual vocabulary”
Bags of features for image
classification
1. Extract features
2. Learn “visual vocabulary”
3. Represent images by frequencies of
“visual words”
1. Feature extraction
…
2. Learning the visual vocabulary
…
2. Learning the visual vocabulary
…
Clustering
2. Learning the visual vocabulary
Visual vocabulary
…
Clustering
Example visual vocabulary
Fei-Fei et al. 2005
frequency
3. Image representation
…..
Visual words
Types of ML algorithms
• Unsupervised
– Algorithms operate on unlabeled examples
• Supervised
– Algorithms operate on labeled examples
• Semi/Partially-supervised
– Algorithms combine both labeled and unlabeled
examples
Example: Sentiment analysis
http://gigaom.com/2013/10/03/stanford-researchers-to-open-source-model-they-say-has-nailed-sentiment-analysis/
http://nlp.stanford.edu:8080/sentiment/rntnDemo.html
Example: Image classification
input
desired output
apple
pear
tomato
cow
dog
horse
http://yann.lecun.com/exdb/mnist/index.html
Surface wave magnitude
Example: Seismic data
Earthquakes
Nuclear explosions
Body wave magnitude
The basic classification
framework
y = f(x)
output classification
function
input
• Learning: given a training set of labeled examples
{(x1,y1), …, (xN,yN)}, estimate the parameters of the
prediction function f
• Inference: apply f to a never before seen test example
x and output the predicted value y = f(x)
Naïve Bayes Classification
The class that maximizes:
P(C,W1,...W n ) = P(C)Õ P(W i | C)
i
= argmax P(c)Õ P(W i | c)
c ÎC
i
Example: Image classification
Car
Input: Image Representation
Classifier (e.g. Naïve
Bayes, Neural Net, etc
Output:
Predicted
label
Example: Training and testing
Training set (labels known)
Test set (labels
unknown)
• Key challenge: generalization to unseen
examples
Some classification methods
Nearest neighbor
Neural networks
106 examples
Shakhnarovich, Viola, Darrell 2003
Berg, Berg, Malik 2005
…
LeCun, Bottou, Bengio, Haffner 1998
Rowley, Baluja, Kanade 1998
…
Support Vector Machines and Kernels
Conditional Random Fields
Guyon, Vapnik
Heisele, Serre, Poggio, 2001
…
McCallum, Freitag, Pereira 2000
Kumar, Hebert 2003
…
Classification … more soon
Types of ML algorithms
• Unsupervised
– Algorithms operate on unlabeled examples
• Supervised
– Algorithms operate on labeled examples
• Semi/Partially-supervised
– Algorithms combine both labeled and unlabeled
examples
Supervised learning has many successes
•
•
•
•
•
•
recognize speech,
steer a car,
classify documents
classify proteins
recognizing faces, objects in images
...
Slide Credit: Avrim Blum
However, for many problems, labeled
data can be rare or expensive.
Need to pay someone to do it, requires special testing,…
Unlabeled data is much cheaper.
Slide Credit: Avrim Blum
However, for many problems, labeled
data can be rare or expensive.
Need to pay someone to do it, requires special testing,…
Unlabeled data is much cheaper.
Speech
Customer modeling
Images
Protein sequences
Medical outcomes
Web pages
Slide Credit: Avrim Blum
However, for many problems, labeled
data can be rare or expensive.
Need to pay someone to do it, requires special testing,…
Unlabeled data is much cheaper.
[From Jerry Zhu]
Slide Credit: Avrim Blum
However, for many problems, labeled
data can be rare or expensive.
Need to pay someone to do it, requires special testing,…
Unlabeled data is much cheaper.
Can we make use of cheap
unlabeled data?
Slide Credit: Avrim Blum
Semi-Supervised Learning
Can we use unlabeled data to augment a
small labeled sample to improve learning?
But maybe still has
useful regularities that
we can use.
But unlabeled data is
missing the most
important info!!
But…
But…
But…
Slide Credit: Avrim Blum