Machine Learning - Carleton College

Download Report

Transcript Machine Learning - Carleton College

Machine Learning

Definition 1
– “The subfield of AI concerned with
programs that learn from experience”
– Russell / Norvig, AIMA

Definition 2
– “the application of induction algorithms,
which is one step in the knowledge
discovery process.”
– Machine Learning definition in glossary from
Machine Learning at
http://robotics.stanford.edu/~ronnyk/glossary.html
David R. Musicant
Supervised Learning Classification

Example: Cancer diagnosis
Patient ID # of Tumors Avg Area Avg Density Diagnosis
1
5
20
118
Malignant
2
3
15
130
Benign
3
7
10
52
Benign
4
2
30
100
Malignant

Use this training set to learn how to classify patients
where diagnosis is not known:
Patient ID # of Tumors Avg Area Avg Density Diagnosis
101
4
16
95
?
102
9
22
125
?
103
1
14
80
?
Input Data

Training Set
Test Set
Classification
The input data is often easily obtained, whereas the
classification is not.
David R. Musicant
Classification Problem



Goal: Use training set + some learning method to
produce a predictive model.
Use this predictive model to classify new data.
Sample applications:
Application
Medical Diagnosis
Input Data
Noninvasive tests
Optical Character
Recognition
Protein Folding
Scanned bitmaps
Research Paper
Acceptance
David R. Musicant
Classification
Results from invasive
measurements
Letter A-Z
Amino acid construction Protein shape (helices,
loops, sheets)
Words in paper title
Paper accepted or rejected
Application: Breast Cancer Diagnosis
Research by Mangasarian,Street, Wolberg
David R. Musicant
Breast Cancer Diagnosis Separation
Research by Mangasarian,Street, Wolberg
David R. Musicant
Application: Document Classification

The Federalist Papers
– Written in 1787-1788 by Alexander Hamilton, John Jay, and
James Madison to persuade residents of the State of New
York to ratify the U.S. Constitution
– All written under the pseudonym “Publius”

Who wrote which of them?
– Hamilton wrote 56 papers
– Madison wrote 50 papers
– 12 disputed papers, generally understood to be written by
Hamilton or Madison, but not known which
Research by Bosch, Smith
David R. Musicant
Federalist Papers Classification
Graphic by Fung
David R. Musicant
Research by Bosch, Smith
Application: Face Detection


Training data is a collection of Faces and NonFaces
Rotation and Mirroring added in to provide robustness
Image obtained from work by Osuna, Freund, and Girosi at
http://www.ai.mit.edu/projects/cbcl/res-area/object-detection/face-detection.html
David R. Musicant
Face Detection Results
Image obtained from work by Osuna, Freund, and Girosi at
http://www.ai.mit.edu/projects/cbcl/res-area/object-detection/face-detection.html
David R. Musicant
Nearest Neighbor

Simple effective approach for supervised
learning problems
Patient ID # of Tumors Avg Area Avg Density Diagnosis
1
5
20
118
Malignant
2
3
15
130
Benign
3
7
10
52
Benign
4
2
30
100
Malignant

Envision each example as a point in ndimensional space
– Picture with 2 of them

Classify test point same as nearest training
point (Euclidean distance)
David R. Musicant
k-Nearest Neighbor

Nearest Neighbor can be subject to
noise
– Incorrectly classified training points
– Training anomalies

k-Nearest Neighbor
– Find k nearest training points (k odd) and
vote on which classification



Training time?
Testing time?
Works on numerical data
David R. Musicant