slides (in ppt) - Cinvestav

Download Report

Transcript slides (in ppt) - Cinvestav

Pattern Recognition &
Machine Learning
Debrup Chakraborty
[email protected]
The Initials
Time:
Wednesday: 16 hrs to 18 hrs
Friday
: 16 hrs to 18 hrs
Course website:
http://delta.cs.cinvestav.mx/~debrup/Machine_Learning.html
Books:
1) Pattern Classification: Duda, Hart and Stork
2) Machine Learning: Mitchell
3) Neural Networks : Haykin
The Initials (contd.)
Grading policies:
4 homeworks (20%)
2 exams (30%)
1 project/term paper ( 50%)
The lectures would be in English, I am really sorry
about it.
We are good at recognizing patterns
We can recognize faces with ease
We can understand spoken words
We can read handwritings
……… many more
It would be nice to make machines
perform these tasks.
Informally pattern recognition deals
with techniques and methods to make
machines recognize patterns.
Satellite image of
Kolkata, in 4
channels: Red,
Green, Blue and
Infrared
Which parts are land and which are water?
Where is the airport?
Protein Fold Prediction
1) Proteins are sequences of amino acid molecules.
There are 20 distinct type of amino acids and
their sequences can form many-many protein
molecules.
2) A amino acid chain must fold in a certain
manner which helps it in its specified activity.
3) The activity and function of a protein depends
on the way it folds. Thus the fold information is
necessary for determining the function of a
protein.
Problem: Given an amino acid sequence, tell me
the fold it will undergo.
Text Categorization
Given a large corpus of text documents (Say news items).
Tell me the categories of the documents.
Similar problems:
Classify music in a music corpus
Classify images in a image corpus
Retrive documents/music/images from a
database which are similar to x.
Other Problems
•Predict whether a patient hospitalized due to heart attack will
have a second heart attack. The prediction is to be based on
demographic, diet and clinical measurements of the patient.
•Predict the price of a stock in 6 months from now on the basis
of company performance measures and econimic data.
•Identify the numbers in a handwritten zip code (CP) from
digitized images.
•Estimate the amount of glucose in the blood of a diabetic
patient from the infra red absorption spectrum of the blood.
•Estimate the risk factors for prostrate cancer based on clinical
and demographic variables.
Pattern Recognition
(What the experts say?)
Duda Hart, 1973 : “A field concerned
with machine recognition of meaningful
regularities”.
Bezdek, 1981: “Pattern recognition is the
search for structure in data”
Types of Data
Object Data
Wt Ht
20 4
Co
b
Relational
Data
legs
4
Object Data – Numeric Features
The characteristics of an object is encoded in a vector called
the Feature Vector
Each component of the vector represents some attribute of the
object. These components are called features.
Example:
Iris data : A data in R4. Representing iris flowers. The
features are the sepal length, sepal width, petal length and petal
width of 150 iris flowers of 3 different types.
Multichannel satellite Image: Images captured by different
sensors which captures the frequency information of the
electromagnetic radiation from earths surface.
Relational Data
A relational matrix whose values are assigned by humans or
computed from features:

R  rij
Russian
nn
German
rij 
Chinese
Japanese
German
1.0
0.6
0.6
1.0
0.2
0.1
0.25
0.2
Chinese
0.2
0.1
1.0
0.8
Japanese
0.25
0.2
0.8
1.0
Russian
Pattern Recognition Systems
Preprocessing
Feature extraction
Feature analysis
The main recognition task
Recognition Involves Learning
Learning in this context means:
1) Extracting knowledge from past experience
2) Representing the knowledge efficiently
3) Using the knowledge for future predictions/recognitions
Types of Learning
Supervised learning
Learning with a teacher
Unsupervised learning
Learning without a teacher
Reinforcement learning
Learning with a critic
Supervised Learning
Training Set
Learning
Algorithm
New
example
h
Prediction
Supervised Learning (Contd.)
Supervised learning systems varies according to the
type of function learned.
Basic types:
Function approximation systems (Numeric outputs)
h:R  R
p
q
Classifier systems (outputs are classes)
h:R  R
p
q
Supervised Learning (Contd.)
It is assumed that x and y bear a (unknown)
functional relationship say
y  S (x)
It is assumed that (xi, yi) is generated from a
fixed (but unknown) time invariant probability
distribution.
The Training set
L
x i , yi , i  1,2,..., N 
Goal: To find h which closely resembles S given L.
Supervised Learning (Contd.)
How to measure whether h
resembles S.
Training error: The error on training data points
N
J   (h( xi )  yii )
i 1
Test error (Generalization Error): The error on
points not in the training set (difficult to measure)
Bad Generalization – An example
y
y
x
x
y
y
x
x
Thank You