Supervised Machine Learning: Classification Techniques

Download Report

Transcript Supervised Machine Learning: Classification Techniques

Supervised Machine Learning:
Classification Techniques
Chaleece Sandberg
Chris Bradley
Kyle Walsh
Supervised Machine Learning

SML: machine performs function
(e.g., classification) after training
on a data set where inputs and
desired outputs are provided
Following training, SML algorithm is able
to generalize to new, unseen data
 Application: “Data Mining”



Often, large amounts of data must be
handled efficiently
Look for relevant information, patterns in
data
Decision Trees

Logic-based algorithm
Sort instances (data) according to
feature values…a hierarchy of tests

Nodes: features


Root node: feature that best divides
data


Algorithms exist for determining the best
root node
Branches: values the node can
assume
Decision Trees, an example
INPUT: data (symptom)
low RBC count
yes
no
size of cells
large
small
bleeding?
yes
stop bleeding
STOP
B12 deficient?
no
STOP
yes
gastrin assay
negative
OUTPUT: category (condition)
STOP
positive
anemia
Decision Trees: Assessment

Advantages:
Classification of data based on limiting
features is intuitive
 Handles discrete/categorical features best


Limitations:
Danger of “overfitting” the data
 Not the best choice for accuracy

Bayesian Networks



Graphical algorithm that encodes the joint
probability distribution of a data set
Captures probabilistic relationships
between variables
Based on probability that instances (data)
belong in each category
Bayesian Networks, an example
Wikipedia, 2008
Bayesian Networks: Assessment

Advantages:






Takes into account prior information regarding
relationships among features
Probabilities can be updated based on
outcomes
Fast!…with respect to learning classification
Can handle incomplete sets of data
Avoids “overfitting” of data
Limitations:


Not suitable for data sets with many features
Not the best choice for accuracy
Neural Networks

Used for:




Great because:



Able to learn
Able to generalize
Kiran


Classification
Noise reduction
Prediction
Plaut’s (1996) semantic neural network that could
be lesioned and retrained – useful for predicting
treatment outcomes
Mikkulainen

Evolving neural network that could adapt to the
gaming environment – useful learning application
Neural Networks: Biological Basis
Feed-forward Neural Network
Perceptron:
Hidden layer
Neural Networks: Training


Presenting the network with sample data and
modifying the weights to better approximate
the desired function.
Supervised Learning




Supply network with inputs and desired outputs
Initially, the weights are randomly set
Weights modified to reduce difference between
actual and desired outputs
Backpropagation
Backpropagation
Support Vector Machines
Perceptron Revisited:
Linear Classifier:
w.x + b > 0
y(x) = sign(w.x + b)
w.x + b = 0
w.x + b < 0
Which one is the best?
Notion of Margin



Distance from a data point to the hyperplane: r  | w  x  b |
w
Data points closest to the boundary are called
support vectors
Margin d is the distance between two classes.
d
r
Maximizing Margin


Maximizing margin is a quadratic optimization
problem.
Quadratic optimization problems are a well-known
class of mathematical programming problems,
and many (rather intricate) algorithms exist for
solving them.
Kernel Trick

What if the dataset is non-linearly separable?
0

x
We use a kernel to map the data to a higher-dimensional
space:
x2
0
x
Non-linear SVMs: Feature spaces

General idea: The original space can always be
mapped to some higher-dimensional feature
space where the training set becomes
separable:
Φ: x →
φ(x)
Examples of Kernel Trick

For the example in the previous figure:

The non-linear mapping
x   ( x)  ( x, x 2 )

A more commonly used radial basis function
(RBF) kernel
K (xi , x j )  e
||xi x j ||2 / 2 2
Advantages and Applications of
SVM

Advantages of SVM




Unlike neural networks, the class boundaries don’t
change as the weights change.
Generalizability is high because margin is
maximized.
No local minima and robustness to outliers.
Applications of SVM


Used in almost every conceivable situation where
automatic classification of data is needed.
(example from class) Raymond Mooney and his
KRISPER natural language parser.
The Future of Supervised Learning (1)

Generation of synthetic data




A major problem with supervised learning
is the necessity of having large amounts of
training data to obtain a good result.
Why not create synthetic training data
from real, labeled data?
Example: use a 3D model to generate
multiple 2D images of some object (such
as a face) under different conditions (such
as lighting).
Labeling only needs to be done for the 3D
model, not for every 2D model.
The Future of Supervised Learning (2)

Future applications




Personal software assistants learning from past
usage the evolving interests of their users in
order to highlight relevant news (e.g., filtering
scientific journals for articles of interest)
Houses learning from experience to optimize
energy costs based on the particular usage
patterns of their occupants
Analysis of medical records to assess which
treatments are more effective for new diseases
Enable robots to better interact with humans
References





http://homepage.psy.utexas.edu/homepa
ge/class/Psy394U/Hayhoe/cognitive%20s
cience%202008/talks:readings/
http://www.aijunkie.com/ann/evolved/nnt1.html
http://galaxy.agh.edu.pl/~vlsi/AI/backp_t
_en/backprop.html
http://cbcl.mit.edu/cbcl/people/heisele/h
uang-blanz-heisele.pdf
http://www.grappa.univlille3.fr/~gilleron/introML.pdf