Transcript Document

Feature Extraction for lifelog management
September 25, 2008
Sung-Bae Cho
0
Agenda
• Feature Extraction
– Temporal Feature Extraction
– Spatial Feature Extraction
• Feature Extraction Example
– Tracking
• Summary & Review
Feature Extraction: Motivation
• Data compression: Efficient storage
• Data characterization
– Data understanding: analysis
• Discovering data characteristics
– Clustering: unknown labels
– Classification: known labels
– Data characterization
• Data simulation: synthesis
• Modeling data
• Model selection
• Model parameter estimation
• Prediction
• Feature forecast
• Raw data forecast
– Pre-processing for further analysis
• Tracking
• Visualization: reduction of visual
clutter
• Comparison
• Search: large collections of data sets
• Database management: efficient
retrieval
2
Features
• Features are confusable
• Regions of overlap represent the
classification error
• Error rates can be computed with
knowledge of the joint probability
distributions
• Context can be used to reduce overlap
• In real problems, features are
confusable and represent actual
variation in the data
• The traditional role of the signal
processing engineer has been to
develop better features
3
An Example (1)
• Problem: Sorting fish
– Incoming fish are sorted according to
species using optical sensing
(sea bass or salmon?)
• Problem Analysis:
– Set up sensors and take some sample
images to extract features
– Consider features
• Length
• Lightness
• Width
• Number and shape of fins
• Position of mouth
• …
Sensing
Segmentation
Feature Extraction
4
An Example (2)
• Length is a poor discriminator
• Lightness is a better feature than
length because it reduces the
misclassification error
• We can select the lightness feature
• We can also combine features
5
Feature: Definition
• Feature or attribute: Usually physical measurement or category associated
with spatial location and temporal instance
– Continuous, e.g., elevation
– Categorical, e.g., forest label
• Every domain has a different definition for features, regions of interest, or
objects
• A feature is a cluster or a boundary/region of points that satisfy a set of predefined criteria
– The criteria can be based on any quantities, such as shape, time,
similarity, orientation, and spatial distribution
6
Feature Categories (1)
• Statistical features
– Density distribution of spatially distributed measurements
• e.g., nests of eagles and hawks, tree types
– Statistical central moments per region computed from raster
measurements over region definitions
• e.g., average elevation of counties
• Temporal features
– Temporal rate of spatial propagation
• e.g., AIDS spreading from large cities
– Seasonal spatially-local changes
• e.g., precipitation changes
7
Feature Categories (2)
• Geometrical features
– Distance, e.g., Optical Character Recognition (OCR)
– Circular, e.g., SAR scattering centers
– Arcs, e.g., semiconductor wafers
– Linear, e.g., roads in aerial photography
– Curve-linear, e.g., isocontours in DEM
– Complex, e.g., map symbols & annotations
• Spectral features
– Areas with a defined spectral structure (morphology)
• Areas with homogeneous measurements (color, texture)
8
Feature Extraction
• Feature extraction
– Transforming the input data into the set of features still describing the data
with sufficient accuracy
– In pattern recognition and image processing, feature extraction is a
special form of dimensionality reduction
• Why?
– When the input data to an algorithm is too large to be processed
and it is suspected to be redundant (much data, but not much information)
– Analysis with a large number of variables generally requires a large
amount of memory and computation power or a classification algorithm
which overfits the training sample and generalizes poorly to new samples
 Need to transform input into a reduced representation set of features
9
Goal of Feature Extraction
• Transform measurements from one space into another space in order to
(a) compress data or (b) characterize data
• Examples:
– Data compression:
• Noise removal: filtering
• Data representation: raster vector
• Information redundancy removal: multiple band de-correlation
– Data characterization:
• Similarity and dissimilarity analysis
• Statistical, geometrical and spectral analysis
10
Feature Extraction Methods
• Dimensionality reduction techniques
– Principal components analysis (PCA): A vector space transform used to
reduce multidimensional data sets to lower dimensions for analysis
– Multifactor dimensionality reduction (MDR): Detecting and characterizing
combinations of attributes that interact to influence a dependent or class
variable
– Nonlinear dimensionality reduction: To assume the data of interest lies on
an embedded non-linear manifold within the higher dimensional space
– Isomap: Computing a quasi-isometric, low-dimensional embedding of a
set of high-dimensional data points
• Latent semantic analysis (LSA): Analyzing relationships between a set of
documents and terms by producing a set of concepts related to them
• Partial least squares (PLS-regression): Finding a linear model describing
some predicted variables in terms of other observable variables
• Feature Selection Methods: feature selection is a kind of feature extraction
11
Feature Selection Methods
• Search approaches
– Exhaustive
– Best first
– Simulated annealing
– Genetic algorithm
– Greedy forward selection
– Greedy backward elimination
• Filter metrics
– Correlation
– Mutual information
– Entropy
– Inter-class distance
– Probabilistic distance
12
Spatial Feature Extraction Example
• Distance features
• Density features
• Orientation features
• Mutual point distance features
13
Temporal Feature Extraction Example
• Temporal features from point data
– Deformation changes over time
• Extracted features: Horizontal, Vertical, Diagonal
• Temporal features from raster data
– Precipitation changes over time
• Example: Image subtraction to obtain features that can be clustered
14
Feature Extraction Applications
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Activity recognition
Place tracking
Face recognition
Remote sensing
Bioinformatics
Structural engineering
Robotics
Biometrics
GIS (Geographic information system)
Semiconductor defect analysis
Earthquake engineering
Plant biology
Medicine
Sensing
…
15
Agenda
• Feature Extraction
– Temporal Feature Extraction
– Spatial Feature Extraction
• Feature Extraction Example
– Tracking
• Summary & Review
Tracking
• A well-known research area using temporal feature extraction method
• Observing persons or objects on the move and supplying a timely ordered
sequence of respective location data to a model
– e.g., Capable to serve for depicting the motion on a display capability
• Finding the location of an object of the scene on each frame of the sequence,
when processing a video sequence
• Tracking example
– Human/objects tracking: e.g., GPS sensor based car position tracking
– Tracking a part of human: e.g., Accelerometer based hand/leg movement
tracking
– Eye tracking: analyzing eye image
– Object tracking in camera
17
An Example of Tracking
• Tracking of human behavior
– Recognize behaviors acting on Cricket game
– Reference:
• M. Ko, G. West, S. Venkatesh, and M. Kumar, Using dynamic time
warping for online temporal fusion in multisensor systems, Information
Fusion, 2007
• Used tracking method
– DTW (dynamic time warping)
• An algorithm for measuring similarity between two sequences which
may vary in time or speed
– e.g., Automatic speech recognition coping with different speaking
speeds
• Any data which can be turned into a linear representation can be
analyzed with DTW
18
Motivation
• We need a method for temporal fusion between raw data or feature data
– Fusion level: Raw, Feature, Decision level
• Requirements for temporal fusion method of multi sensors
– Variable type: multi dimension, time, discrete, continuous sensor
– Variable length of data
• Proposition: Multi-sensor fusion using DTW
– Expanding DTW algorithm
• Considering end-point
• Supporting fusion of diverse heterogeneous sensory data
19
Used Sensor Data
• Sensor: ADXL202 sensor: 3-axis, ±2g, 150Hz accelerometer
– 2 sensors for each wrist
– 6 channel data
• Data
– 4 Human subjects & 65 (20 + 15 * 3) samples
– 12 gestures in Cricket game: Cancel call, dead ball, last hour, …
20
20
Behavior System Structure based on DTW
•
•
•
•
Sliding window: Transmit a specified size of data units
Data pre-processing: Convert raw data into test template
DTW recognizer: Measure similarity between test & class template
Decision module: Select a behavior of best matching template
21
Preprocessing
• Input data
– online : streaming sensor values
– offline : segmented sensor values
• Preprocessing methods
– Signal filter: noise & outlier elimination
– Normalization
• Preprocessing for temporal data
– Sliding window
– End point detection based on DTW
22
Dynamic Time Warping (1)
6
6
3
1
0
0
1
1
1
2
2
2
5
5
3
2
Class
Template
Input sample
2
• Minimum warping path:
–
NF : Normalization factor
• Distance table (D):
23
23
Dynamic Time Warping (2)
• Local distance:
–
: Class template with length I
–
: Test template with length J
– d(I , j) : distance between class & test templates
• Warping path(W) definition
– i(q) ∈{1 ,…, I) , j(q) ∈{1 ,…, J)
– Constraints
• Continuity
• End-point
• Monotonicity
24
Class Template Selection
• Class template selection method
– Random selection
– Normal selection
– Minimum selection
– Average selection
– Multiple selection
– Random, minimum, multiple selection
• End region
– Band-DP( E = E2-E1)
• Rejection threshold
25
Distance Measurement
• Distance calculation in DTW
– Extended Euclidian distance
– Cosine correlation coefficient
– where
• Multi sequence of class template : C( I x V )
• Multi sequence of test template : T( I x V )
• V : num. of variables
• WV : positive weight vector
26
26
Decision Module
• Nearest neighbor algorithm
– Normal, minimum, average selection
– where
• N : no. of class templates, 1 <= n <= N
• Cn : class template, Dn : distance table
• Method: kNN
– Multiple selection : Cn,m
– M : no. of selected class template, K : 1 <= k <= M
27
27
Experimental Setup
• Environments
– Pentium 4, 3.2G, 1G RAM, Window XP
• Comparison
– HMM
• Experiments
– Off-line temporal fusion
– On-line temporal fusion
– Sensor based
• Gesture recognition based on accelerometer
• Scenario recognition based on diverse sensor
28
Experiment: Sensor Data
• W : sliding window size, O : overlap size, F : features
29
29
Experiment: Results (DTW vs. HMM)
• Performance of DTW was better
– Raw data: Data in – decision out
– Filtered data: Feature in – decision out
Data
HMM
DTW
Raw data
85.7~86.5%
97.9%
Filtered data
87.8~88.1%
92.5~96.4%
W≠50, O≠30
73.9~78.8%
96~98%
30
Experiment: Results (Online) (1)
• Class template selection methods comparison
• Min-1 : Minimum selection, Min-4 : Minimum + multiple selection
• RD-1 : Random selection, RD-4 : Random + multiple selection
K : param. For kNN
NF : Normalization factor
31
Experiment: Results (Online) (2)
• Gesture recognition
– 12 gestures
– Minimum distance comparison between sample & class
32
Experiment 2: Setup
• Multiple sensor fusion
• Sensors
– 3-axis Accelerometer
– Light
– Temperature
– Humidity
– Microphone
–…
• Data: J.Mantyjarvi et al, 2004
– 5 scenario, 5 times
• 1 ~ 5 min.
– 32 sensor data
– 46,045 samples
33
33
Experiment 2: Results (Offline)
• DTW classification rate
• HMM classification rate
– With randomly selected training data
• T1:20 samples, 75.1~88.1%
• T2: minimal selection, 72.5~78%
34
Experiment 2: Results (Online)
• Classification rate
35
Agenda
• Feature Extraction
– Temporal Feature Extraction
– Spatial Feature Extraction
• Feature Extraction Example
– Tracking
• Summary & Review
Summary
• Feature extraction
– Data sources
– Feature categories
– Applications
• Review
– Why is feature extraction important?
– How would you extract important features from data?
– What features would you recommend for tracking from sensor data?
37
Further Information
• Feature Selection for Knowledge Discovery and Data Mining (Book)
• An Introduction to Variable and Feature Selection (Survey)
• Toward integrating feature selection algorithms for classification and
clustering (Survey)
•
•
•
•
•
JMLR Special Issue on Variable and Feature Selection: Link
Searching for Interacting Features: Link
Feature Subset Selection Bias for Classification Learning: Link
M. Hall 1999, Correlation-based Feature Selection for Machine Learning: Link
Peng, H.C., Long, F., and Ding, C., "Feature selection based on mutual
information: criteria of max-dependency, max-relevance, and minredundancy," IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 27, No. 8, pp.1226-1238, 2005.: Link
38