Synthesis of Streaming Data from Multiple Sensors via Embedded

Download Report

Transcript Synthesis of Streaming Data from Multiple Sensors via Embedded

Synthesis of Streaming Data from Multiple
Sensors via Embedded Data Extraction
April 15th, 2004 Project Report
Magdiel Galán
CSE591: DataMining
Dr. Huan Liu
Spring 2004
http://www.public.asu.edu/~mgalan/StreamProjApr15.ppt
Outline






Problem/Project Description
Sampling
Smoothing
Clustering
Current Status
Plans
Project Description


Synthesis of Streaming Data from Multiple
Sensors (~100’s) via Embedded Data
Extraction for mission critical applications.
Work in conjunction with Motorola’s Human
Interface Lab (on-going project)

Simulation Environment
Project Description

Goal: Develop driver assistance system that provide
feedback, but not control, during unsafe instances.



From distractions caused by cellphones, PDAs, eMail,
Why: Targeting a government initiative to create a
safer car environment in the information age explosion
How: Develop intelligent system by mining Streaming
Data from multiple automotive sensors

Development work being done using driving simulator with
projections screens with up to 400 parameters/sensors
including video links for eye-gaze and foot-pedal movement
Sample Cases

Case Scenario #1:

Passing Slow Traffic

which slowed down due to an accident


which you are also rubber-necking
 while fidgetting with your radio
Case Scenario #2:

Making a left turn

while hearing directions from MapTracker

while checking at the time because you are late
 while reaching for the cellphone with on-coming call
Simulation Environment
150 Simulated View
Driving Experience
Gas
Gas
Batt
EngineTemp
Acceleration
Lateral Acc.
PDA
GearShift
Oil
Air Bag
GPS
Driver
Internet
CellPhone
A/C
CD
Sonar Proximity Sensor RPMs
Wheel Rotation
Brake Pressure
Motivation

Primary Interest: Robotics

Merging of Sensors/Sensor Fusion






Problem: decide agent’s next best action vs. a goal


optical
proximity (IR, sonar, radar)
location (GPS, visual maps)
movement (actuators, rotations)
system (battery, temperature, bump switches)
Not too dissimilar from an Automobile environment
Other Applications:

Manufacturing Environment

Increase Yields/Productivity/Reduce Defects using quality
control daily monitor data (100’s  Parameters  1K’s)

Pentium Ex.: Oxide Thickness, Poly Width, Boron Implant
Density, Plasma Etch eV’s, Litho PM, Diffuser RPMs, etc…
Stream Data Properties

Numerical/Continuous





Categorical






Speed
Steering/Heading
Acceleration (Forward/Lateral)
Distance (Lane Edge, Vehicle on Front)
Lane Position
Gear: P/R/D/OD/L1/L2
Headlights On/Off
Radio/CD ON
Incoming Call
Sampling Rate: 60Hz
Critical/Special Conditions






Left/Right Turn
Passing/Changing Lanes
U-Turn
Reverse
Tailgating
Not On Road
Some Warning Signs


Lane Drifting
Erratic Behavior




droopy eyes
eyes not facing the road
foot/pedal movement do not correspond with
road conditions
Incoming Call while performing Critical
Maneuver
Goal

Identify Instances outside normal patterns
as an indication of an Abnormal Situation


Hence – Need to draw Driver’s Attention to
Impending Situation
Ultimate Goal:

Develop bootsrapping mechanism that
combines driving situation classifiers (i.e.
LeftTurn/Passing) together with instance
selection methods in active learning

Bootsrapping – selecting high utility data for retraining
Instance Selection Properties



Instance representative
Instance selection  reduce rows
Ideal outcome instance selection


choose a data subset achieves same result as
whole data with little or no performance P
deterioration
Should be model independent

∆ P(Mi) ≐ ∆P(Mj)
[LM01]
Problem#1: Sampling

Initial step towards instance selection:
select representative subset…

Divide into collection of elements which must
cover the whole population without
overlapping [GHL01]

These are called sampling units
Sampling Results
Sampling at 10mS
(x-axis: signal duration; y-axis: count)
Problem#2: Smoothing


Reduce/Filter out noise and outliers.
Smoothing Techniques used:

Bin Median/Rolling Average [LM01]/[D03]


Median preferred over Mean since less sensitive to
outliers
Tresholding/Bin Boundaries [LM01]/[HK01]

10% offset treshold
PreSmoothing - RAW Data
x-axis: driving time elapsed in minutes
y-axis: speed(km/h); steering(degrees), heading(degrees)
RAW Data Map/Course
Route Map – starting point at (0,0)
Smoothing Results - Median
x-axis: driving time elapsed in minutes
y-axis: speed(km/h); steering(degrees), heading(degrees)
Smoothing Results - Median
Smoothing Results - Threshold
Smoothing Results - Threshold
Dr. Liu’s Incremental Instance
Selection Algorithm
Given: Data streams with instances I
Output: indicative instances
For each data stream
Do the following incrementally
Create a profile P for I
Check new instance i against P
if i is an outlier of P
Return i
else
Update P with i
End do
Outliers
Problem#3: Clustering

Why?




Data is Unclassified
Previous results using Numerical Data on most
significant key parameters
Develop clusters exemplifying ALL attributes
Select instances that do not belong to a cluster
as triggering mechanism
Stream Clustering Challenges


Large “Unclassified” Data Base
Fast On-Line Resolution within small window



0.5 – to 2 or 3 seconds
One Pass Only restriction (need fast I/O)
Mix of Numerical and Categorical Data

Traditional algorithms do not work well for categorical
attributes (remember P/R/D/OD/L1/L2, or CD On)



Centroid approach cannot be used
Hard to reflect the properties of the neighborhood of the
points
Memory Constraints
Clustering Techniques vs.
Streaming Data

SVM



Good at handling multidimensional data
Not good – need classified data, lots of I/O,
data in memory
BIRCH


Good at handling mulidimensional data, large
databases; single scan, linear I/O time
Not good – predominantly for “numerical” type
of attributes; order dependent
Clustering Techniques vs.
Streaming Data (2)

CURE (Clustering Using REpresentative)[D03]



Good at handling outliers; hierarchical
Not good – random sampling (won’t fit
streaming)
ROCK (RObust Clustering Using LinKs)[D03]


Good at Hierarchical clustering for categorical
attributes
Not good: Random sampling for scale up
My 1st Clustering Attempt…
Move in
Reverse
My 1st Clustering Attempt(2)
Zoom Next
Page
My 1st Clustering Attempt(3)
Move in
Reverse
Current Status/Plans


This is an ON-GOING project
Cluster Technique Development


Evolve from known methods?
Generalization of the technique

Not just Automobile Streaming Data
References




[LM01] H.Liu, H. Motoda. “Data Reduction via Instance Selection”. Instance
Selection and Construction for Data Mining. 2001. KAP. ASU Library
[GHL01] B. Gu, F.Hu, H. Liu. “Sampling: Knowing Whole From its Part”. Instance
Selection and Construction for Data Mining. 2001. KAP. ASU Library
[HK01] J. Han, M. Kamber. Data Mining Concepts and Techniques. Chps. 3, 8
Data Cleaning, Clustering. Morgan Kaufman. ASU Library
[D03] M.Dunham. Introductory and Advanced Topics. Prentice Hall, Chps. 3-5.
Mining Techniques, Classification, Clustering. ASU Library