Machine Learning - Carleton College
Download
Report
Transcript Machine Learning - Carleton College
Machine Learning
Definition 1
– “The subfield of AI concerned with
programs that learn from experience”
– Russell / Norvig, AIMA
Definition 2
– “the application of induction algorithms,
which is one step in the knowledge
discovery process.”
– Machine Learning definition in glossary from
Machine Learning at
http://robotics.stanford.edu/~ronnyk/glossary.html
David R. Musicant
Supervised Learning Classification
Example: Cancer diagnosis
Patient ID # of Tumors Avg Area Avg Density Diagnosis
1
5
20
118
Malignant
2
3
15
130
Benign
3
7
10
52
Benign
4
2
30
100
Malignant
Use this training set to learn how to classify patients
where diagnosis is not known:
Patient ID # of Tumors Avg Area Avg Density Diagnosis
101
4
16
95
?
102
9
22
125
?
103
1
14
80
?
Input Data
Training Set
Test Set
Classification
The input data is often easily obtained, whereas the
classification is not.
David R. Musicant
Classification Problem
Goal: Use training set + some learning method to
produce a predictive model.
Use this predictive model to classify new data.
Sample applications:
Application
Medical Diagnosis
Input Data
Noninvasive tests
Optical Character
Recognition
Protein Folding
Scanned bitmaps
Research Paper
Acceptance
David R. Musicant
Classification
Results from invasive
measurements
Letter A-Z
Amino acid construction Protein shape (helices,
loops, sheets)
Words in paper title
Paper accepted or rejected
Application: Breast Cancer Diagnosis
Research by Mangasarian,Street, Wolberg
David R. Musicant
Breast Cancer Diagnosis Separation
Research by Mangasarian,Street, Wolberg
David R. Musicant
Application: Document Classification
The Federalist Papers
– Written in 1787-1788 by Alexander Hamilton, John Jay, and
James Madison to persuade residents of the State of New
York to ratify the U.S. Constitution
– All written under the pseudonym “Publius”
Who wrote which of them?
– Hamilton wrote 56 papers
– Madison wrote 50 papers
– 12 disputed papers, generally understood to be written by
Hamilton or Madison, but not known which
Research by Bosch, Smith
David R. Musicant
Federalist Papers Classification
Graphic by Fung
David R. Musicant
Research by Bosch, Smith
Application: Face Detection
Training data is a collection of Faces and NonFaces
Rotation and Mirroring added in to provide robustness
Image obtained from work by Osuna, Freund, and Girosi at
http://www.ai.mit.edu/projects/cbcl/res-area/object-detection/face-detection.html
David R. Musicant
Face Detection Results
Image obtained from work by Osuna, Freund, and Girosi at
http://www.ai.mit.edu/projects/cbcl/res-area/object-detection/face-detection.html
David R. Musicant
Nearest Neighbor
Simple effective approach for supervised
learning problems
Patient ID # of Tumors Avg Area Avg Density Diagnosis
1
5
20
118
Malignant
2
3
15
130
Benign
3
7
10
52
Benign
4
2
30
100
Malignant
Envision each example as a point in ndimensional space
– Picture with 2 of them
Classify test point same as nearest training
point (Euclidean distance)
David R. Musicant
k-Nearest Neighbor
Nearest Neighbor can be subject to
noise
– Incorrectly classified training points
– Training anomalies
k-Nearest Neighbor
– Find k nearest training points (k odd) and
vote on which classification
Training time?
Testing time?
Works on numerical data
David R. Musicant