Team 2 - K-NN - Seidenberg School of Computer Science and

Download Report

Transcript Team 2 - K-NN - Seidenberg School of Computer Science and

Pattern Recognition
K-Nearest Neighbor Explained
By
Arthur Evans
John Sikorski
Patricia Thomas
Overview

Pattern Recognition, Machine Learning,
Data Mining: How do they fit together?

Example Techniques

K-Nearest Neighbor Explained
Data Mining




Searching through electronically stored
data in an automatic way
Solving problems with already known
data
Essentially, discovering patterns in data
Has several subsets from statistics to
machine learning
Machine Learning




Construct computer programs that
improve with use
A methodology
Draws from many fields: Statistics,
Information Theory, Biology, Philosophy,
Computer Science . . .
Several sub-disciplines: Feature
Extraction, Pattern Recognition
Pattern recognition



Operation and design of systems that
detect patterns in data
The algorithmic process
Applications include image analysis,
character recognition, speech analysis,
and machine diagnostics.
Pattern Recognition Process





Gather data
Determine features to use
Extract features
Train your recognition engine
Classify new instances
Artificial Neural Networks



A type of artificial intelligence that attempts
to imitate the way a human brain works.
Creates connections between processing
elements, computer equivalent of neurons
Supervised technique
ANN continued



Tolerant of errors in data
Many applications: speech recognition,
analyze visual scenes, robot control
Best at interpreting complex real world
sensor data.
The Brain




Human brain has about 1011 neurons
Each connect to about 104 other neurons
Switch in about 10-3 seconds

Slow compared to a computers at 10-13 seconds
Brain recognizes a familiar face in about 10-1
seconds


Only 200-300 cycles at its switch rate
Brain utilizes MASSIVE parallel processing,
considers many factors at once.
Neural Network Diagram
Stuttgart Neural Network Simulator
Bayesian Theory

Deals with statistical probabilities

One of the best for classifying text

Require prior knowledge about the
expected probabilities
Conveyor Belt Example



Want to sort apples and orange on
conveyor belt.
Notice 80% are orange, therefore 80%
are oranges.
Bayesian theory says
Decide worg if P(worg|x) > P(wapp|x);
otherwise decide wapp
Clustering



A process of partitioning data into
meaningful sub-classes (clusters).
Most techniques are unsupervised.
Two main categories:


Hierarchical: Nested classes displayed as a
dendrogram
Non-Hierarchical: Each class in one and
only one cluster – not related.
Phylogenetic Tree - Hierarchical
Ciliary Neurotrophic Factor
Rattus norvegicus
Mus musculus
Homo sapiens
Equus caballus
Gallus gallus
Oryctolagus cuniculus
Macaca mulatta
Non-Hierarchical
K-Means Method

Initial cluster seeds

Initial cluster boundaries

After one iteration.

New cluster assignments
Decision Tree





A flow-chart-like tree structure
Internal node denotes a test on an
attribute (feature)
Branch represents an outcome of the test
All records in a branch have the same value
for the tested attribute
Leaf node represents class label or class
label distribution
Example of Decision Tree
outlook
sunny
humidity
high
N
overcast
rain
windy
P
normal
P
Decision Tree forecast for playing golf
true
N
false
P
Instance Based Learning



Training consists of simply storing data
No generalizations are made
All calculations occur at classification


Referred to as “lazy learning”
Can be very accurate, but
computationally expensive
Instance Based Methods

Locally weighted regression

Case based reasoning

Nearest Neighbor
Advantages




Training stage is trivial therefore it is
easily adaptable to new instances
Very accurate
Different “features” may be used for
each classification
Able to model complex data with less
complex approximations
Difficulties



All processing done at query time:
computationally expensive
Determining appropriate distance metric
for retrieving related instances
Irrelevant features may have a negative
impact
Case Based Reasoning



Does not use Euclidean space
Represented as complex logical
descriptions
Examples



Retrieve help desk information
Legal reasoning
Conceptual design of mechanical devices
Case Based Process


Based on idea that current problems
similar to past problems
Apply matching algorithms to past
problem-solution pairs
Nearest Neighbor



Assumes all instances correspond to
points in n-dimensional space
Nearest neighbor defined as instance
closest in Euclidean space
D=
(Ax-Bx)2+(Ay-By)2…
Feature Extraction



Features: unique characteristics that
define an object
Features used depend on the problem
you are trying to solve
Developing a good feature set is more
art than science
Sample Case – Identify Flower
Species

Consider two features:



Petal count: range from 3-15
Color: range from 0-255
Assumptions:


No two species have exactly the same
color
Multiple species have same petal count
Graph of Instances
Species A
Species B
Query
Petal count
Calculate Distances
Species A
Species B
Query
Petal count
Species is Closest Neighbor
Nearest Neighbor
Species A
Species B
Query
Petal count
Problems

Data range for each feature is different

Noisy data may lead to wrong conclusion

One attribute may hold more importance
Without Normalization
255
0
3
15
Petal count
Normalized
Normalize by
subtracting smallest
value from all then
divide by largest
All values range
from 0-1
1
0
0
Petal count
1
Noise Strategies

Take an average of the k closest
instances


K-nearest neighbor
Prune noisy instances
K-Nearest Neighbors
K=5
Species A
Species B
Query
Petal count
Identify as
majority of k
nearest
neighbors
Prune “Noisy” Instances


Keep track of how often an instance
correctly predicts a new instance
When the value drops below a certain
threshold, remove it from the graph
“Pruned” Graph
Species A
Species B
Query
Petal count
Avoid Over Fitting - Occams Razor
A: Poor but
simple
B: Good but less
simple
C: Excellent but
too data specific
Weights
Weights are
added to
features more
significant than
others in
producing
accurate
predictions.
Multiply the
feature value by
the weight.
2
0
0
Petal
Count
1
Validation




Used to calculate error rates and overall
accuracy of recognition engine
Leave One out: Use n-1 instances in
classifier, test, repeat n times.
Holdout: Divide data into n groups, use
n-1 groups in classifier, repeat n times
Bootstrapping: Test with a randomly
sampled subset of instances.
Potential Pattern Recognition Problems




Are there adequate features to
distinguish different classes.
Are the features highly correlated.
Are there distinct subclasses in the
data.
Is the feature space too complex.