Transcript Document

Classification for Lifelog Management
September 30, 2008
Sung-Bae Cho
Agenda
• Introduction to Classification
– Discriminant Analysis
– Decision Tree
– Density Estimation
– Artificial Neural Networks
• Comparison of Classification Methods
• An Example of Classification for Life Log Analysis
Classification
• Definition
– A procedure in which individual items are placed into groups based on
quantitative information on one or more characteristics inherent in the
items (referred to as traits, variables, characters, etc) and based on a
training set of previously labeled items.
• Formally, the problem can be stated as follows:
– Given training data
produce a classifier
which maps an object
to its classification label
.
• For example, if the problem is filtering spam, then Xi is some representation
of an email and y is either "Spam" or "Non-Spam".
• Classification algorithms are typically used in pattern recognition systems.
2
Other Types of Classification Problem
• Can consider classification as an estimation problem, where the goal is to
estimate a function of the form
– where the feature vector input is , and the function f is typically
parameterized by some parameters .
• In the Bayesian approach to this problem, instead of choosing a single
parameter vector , the result is integrated over all possible thetas
– with the thetas weighted by how likely they are given the training data D:
3
Features of Classification
• Structure of a classification task
– Prior probability: relative frequency with which the classes occur in the
population
– Criterion for separating the classes: underlying relation that uses
observed attributes to distinguish an individual from each class
– Misclassification cost: cost associated with making a wrong classification
• Benefits
– Mechanical classification is faster
– Human operators may have biases
– Cheaper, diagnosis made on external symptoms, avoiding surgery
– Understanding structural relationships between the response and the
measured variables
• Issues
– Accuracy, Speed, Comprehensibility
– Time to learn
4
Examples of Classification Algorithms
• Discriminant analysis
– Linear / Quadratic discriminant
– Logistic regression
– Support vector machine
• Decision tree
• Density estimation
– k-nearest neighbor
• Artificial neural networks
– Perceptron
– Multi layer perceptron
– Naive Bayes classifier
• Probabilistic classification
– Bayesian networks*
– Temporal classification
• Hidden Markov models*
• Dynamic Bayesian networks*
* at the next lectures
5
Agenda
• Introduction to Classification
– Discriminant Analysis
– Decision Tree
– Density Estimation
– Artificial Neural Networks
• Comparison of Classification Methods
• An Example of Classification for Life Log Analysis
Discriminant Analysis
• Definition
– Divide sample space by a series of lines in two dimensions, planes in 3-D
and, more generally hyper planes in many dimensions
– The line dividing two classes is drawn to bisect the line joining the centers
of those classes
– The direction of the line is determined by the shape of the clusters of
points
7
Discriminant Analysis: Example
8
Discriminant Analysis: Issues and Properties
•
•
•
•
•
•
•
Requires numerical attributes, no missing values
Dummy variables for categories
Maximizes separation between the classes in a least square sense
Attributes must be linearly independent
No attribute can be constant within each class (add random noise)
Canonical variates for multiclass
Specially useful when the class means are ordered, or lie along a simple
curve in attribute-space
• Logistic maximizes conditional P(class|x)
• Assumes that the form of the underlying density functions (or ratio) is known
9
Support Vector Machines
• Definition:
– A set of related supervised learning methods used for classification
– Viewing input data as two sets of vectors in an n-dimensional space
– Constructing a separating hyperplane in the space
• which maximizes the margin between the two data sets
10
Linear Support Vector Machines
Support
Vectors
11
Agenda
• Introduction to Classification
– Discriminant Analysis
– Decision Tree
– Density Estimation
– Artificial Neural Networks
• Comparison of Classification Methods
• An Example of Classification for Life Log Analysis
Decision Tree
• Definition
– Space is divided into boxes, and at each stage in the procedure, each box
is examined to see if it may be split into more boxes
– The split usually being parallel to the coordinate axes
• Structure
13
Decision Tree: Example
14
Classification Process of a Tree
• Generate_tests:
– Attr = val, Attr < val, Attr In Subset, …
• Test:
– Gini, Entropy, KS, Twoing, …
Training Data
Classification Rules
• Stop_criterion:
– Num cases, significance (e.g., c2-test),
overgrow & prune
Learning Algorithm
• Info:
– Most frequent class ("mode"), …
• Prune:
– Reduced-error, cost complexity, …
Testing Data
15
Agenda
• Introduction to Classification
– Discriminant Analysis
– Decision Tree
– Density Estimation
– Artificial Neural Networks
• Comparison of Classification Methods
• An Example of Classification for Life Log Analysis
Density Estimation
• Definition
– Classification rule that assigns x to class Ad if: p(Ad|x) = maxi p(Ai|x),
maximizes the a posteriori probability
– For two classes Ai and Aj
 j c( j, i) f j ( x)   i c(i, j ) f i ( x)
– Kernel methods
– Naive Bayes  assumes variable independency
– Projection methods
• Example: Nearest Neighbors
– 1-nearest neighbor rule, in which the training set is searched for the
‘nearest’ (in a defined sense) previous example, whose class is then
assumed for the new case
– K-nearest neighbor, the natural extension taking the most frequent class
in the neighbors
– Parameter free model (distance!)
17
Nearest Neighbours: Example
18
Density Estimation: Issues
• Presence of irrelevant variables affect model
• Require numerical variables, or special distance function for categorical
variables
• Variable’s scale affect the model
• All of the sample must be stored
19
Agenda
• Introduction to Classification
– Discriminant Analysis
– Decision Tree
– Density Estimation
– Artificial Neural Networks
• Comparison of Classification Methods
• An Example of Classification for Life Log Analysis
Artificial Neural Networks
• Black Box Modeling
• An artificial neural network (ANN) is a mathematical model or computational
model based on biological neural networks
• It consists of an interconnected group of artificial neurons and processes
information using a connectionist approach to computation
• In most cases an ANN is an adaptive system that changes its structure based
on external or internal information that flows through the network during the
learning phase
21
The Neuron Model
Inputs I1
Weights
w1
I2
I3
I4
I5
w2
w3
w4
w5
Output

sum
O
f(sum)
sum = wn In
f: activation function
 Oi = fi(wn In)
22
Multi Layer Perceptron Model
feedforward
e31
I1
w31
P
A
T
T
E
R
N
I2
e32
I3
e33
e = d 1 - o1
w32
w33
O1
w34
I4
e34
e: error
d: desired output
I5
feedback
o: network output
w: weights
23
Artificial Neural Networks: Principles
• Universal approximator
1
E 
2 i
2
(
y

Y
)
 ji ji
j
• Weights are modified to minimize classification error
dE ( w) dE ( w)
E ( w)  (

 )
dw1
dw2
• The outputs represent the P ( Class | example )
w  w  E (w)
24
Back-propagation Learning
• The error E is a function of the weights and the training data
• Minimize error E using the steepest descent
Downhill = -dE / dW
E
Wnew = Wold - dE/ dWold
minimum
: learning rate
W
25
Artificial Neural Networks: Issues
• To make a generalizable network one has to determine a suitable number of
hidden nodes
– Too few  network does not learn enough…
– Too many  over-learning
26
Agenda
• Introduction to Classification
– Discriminant Analysis
– Decision Tree
– Density Estimation
– Artificial Neural Networks
• Comparison of Classification Methods
• An Example of Classification for Life Log Analysis
Comparison of Classification Methods
•
•
•
•
•
•
•
MV: Missing Values
Cost matrix: learning(L),testing(T) or not(N)
Interpretability: easily understood classifier
Method: comprehensibility of the method
Parameters: good guidelines or automatic selection of parameters
User-fr: user friendliness
Data: numerical, categorical
Interpr. Method Param’s User-fr
Algo.
MV
Cost
Data
Density
N
T
1
5
2
N
N
Discriminant
N
T
3
4
4
Y
N
Tree
Y
LT
5
4
5
Y
NC
Neural
Nets
N
T
1
3
3
N
N
28
Agenda
• Introduction to Classification
– Discriminant Analysis
– Decision Tree
– Density Estimation
– Artificial Neural Networks
• Comparison of Classification Methods
• An Example of Classification for Life Log Analysis
VTT Technical Research Center at Finland
• Representing complex entities: Any single
methodology alone is often insufficient
– For example, knowledge representation needs a
hierarchy utilizing heterogeneous methodologies
(Minsky)
Sensing
Representation
Fuzzy Logic
• Laerhoeven and Cakmakci (2000)
– 4 layers (statistical cues, Kohonen clustering,
kNN classification, and a Markov chain
supervisory layer)
• Advantage of using high-level features
– Control and ‘accurate touch’ to the data:
Maintained up to the classification layer
– If the feature itself is discriminative enough:
classification layer is not even necessary
Clustering
Modeling & Classifying
Naïve Bayes
Service
Fuzzy Control
30
Overview
• Disadvantage of using high-level features
– The development of proper signal processing algorithms is laborous
(compared to the user of very low-level features)
31
Context Representation
• A central problem in utilizing context information in application development
– Difficulty to add meaning of concepts from real world to extracted features
– “When environment loudness is SILENT, set ringing volume LOW” >
“When environment loudness is -20dB, set ringing tone volume to 2.5”
• Context extraction: Abstract raw sensor signals and compress information
– How well features describe some parts of the real world context
– Extracted features: Context atoms (the smallest amount of context
information)
32
Context Information Processing (1)
33
Context Information Processing (2)
• Context atom vector (k=maximum number of context atoms, the values of
context atoms are between 0 and 1)
• Advantages
– Composing the ontology for sensor-based context information
• Direct control of various types of applications
– Multidimensional context atom vector C: Utilization of explorative data
analysis methods
– Extendable (Straightforward to add new contexts atoms into a vector C)
34
Some of the Quantized Features
35
Naive Bayes Classifier
CPT (Conditional probability table)
learning
Structure
Joint Probability Distribution
Conditional probability distribution
counting
36
Experiment
• Two kinds of data
– Measured in controlled conditions (1)
– Real world scenario (2)
• Experiment (1)
– Nine individual context class data sets were recorded (each five times
separately in controlled conditions)
– Length of each nine-class data set: 30 seconds (30 sets of 1 seconds
atoms)
– Nine sets, each to the sum of 150 vectors
– To create variability in the data
• Five samples of different songs from two albums
• Speech and acceleration-based contexts by four different persons
– Training data set = test data set
37
Experiment (2)
• Experiment (2)
– Home environment
– Multiple simultaneous contexts
– Only those contexts that can be defined in each phase are counted
– Five testees (performed the scenario twice)  ten test data set with a
length 6-8 minutes
– The last test data set  ignored (box wires had been broken)
– Final amount of scenario data sets  9 (60 minutes)
– Different songs, three different cars
– Manual on-the-fly segmentation is not accurate, a lot of disturbances
– 1) training data set = test data set 2) cross validation
38
Recognition Results as an Intensity Graph
Resolution
= 1 second (5 second moving average)
39
Classification Accuracy (9 contexts, 2 BN)
40
Recognition Results
• Recognition Results for 13 Individual Contexts (9 scenarios)
BN
Direct
41
Context Data
•
•
•
•
Data gathering  Laptop
1 seconds resolution (4 seconds sliding window)
3 minutes
Experiments  Offline
42
Experiments with User Scenario
• Fluctuation of control signals
– Oscillation of various information
representation levels
– Moving averaging methods or
smoothing in changing font sizes
• Extension (adding new types of
contexts)
– Complicate the development of a
control system
43
User Reactions
• Ten cell phone users were interviewed
– 5 of them  normal users (use basic communication applications)
– 5 of them  active cell phone users (use actively most of the cell phone
applications and features)
• Interview process
– Explaining scenario presented in the table to a user
– Explaining to a user how adaptive applications operate in general and
particularly during the user scenario
• Control signals and screenshots of adapted service contents
– Users were interviewed
• Application adaptation is an acceptable feature, but mistakes in adaptation
are not accepted
• Users want to have control over the device’s operations
44
Summary
• Classification for context awareness
• Classification method application
– Applying naïve Bayesian classifier in learning and classifying the contexts
of a mobile device user
– Mobile device-oriented  sensors are embedded (instead of being
distributed into the environment)
– To recognize static short-term events (descriptive and generic at the
intermediate levels of the context representation layer structure),
sequences of these events could be utilized at the upper level
– Naïve Bayesian learning and inference: Straightforward and
computationally very efficient
45
Bibliography
• Book: Machine Learning, Neural and Statistical Classification. Editors: D.
Michie, D.J. Spiegelhalter, C.C. Taylor, Statlog Project
• Book: Weiss, S. M. and Kulikowski, C. A. (1991). Computer systems that
learn: classification and prediction methods from statistics, neural networks,
machine learning and expert systems. Morgan Kaufmann, San Mateo, CA.
• Book: Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition.
Academic Press, 2nd edition
• Paper: C.M. van der Walt and E. Barnard, “Data characteristics that
determine classifier performance”, in Proceedings of the Sixteenth Annual
Symposium of the Pattern Recognition Association of South Africa, pp.160165, 2006
• Classifier showdown: A practical comparison of classification algorithms Link
• Statistical Pattern Recognition Toolbox for Matlab - Link
46