sensor data fusion

Download Report

Transcript sensor data fusion

Abdelzaher (UIUC)
Aggarwal (IBM)
Bar-Noy (CUNY)
The QoI Problem
 Understand QoI delivered of fusion algorithms that use
information networks
Actionable Information
(QoI Delivered)
Mission
requirements
INARC
Data Fusion Algorithm
Spec.
(Object-level) QoI
Specification
CNARC
1. Quantify QoI
2. Optimize
Value/Cost
Control
Information
Network
Data
Delivered
Communication Network
Sensors, reports, and human sources
The QoI Problem
 Understand QoI delivered of fusion algorithms that use
information networks
Actionable Information
(QoI Delivered)
Mission
requirements
INARC
Data Fusion Algorithm
Spec.
(Object-level) QoI
Specification
CNARC
1. Quantify QoI
2. Optimize
Value/Cost
Control
Information
Network
Data
Delivered
Communication Network
Sensors, reports, and human sources
Dimensions of QoI in Data Fusion
Applications
 Estimation/prediction accuracy
 Provenance and corroboration
 Timeliness/freshness
 Security/trust
…
Dimensions of QoI in Data Fusion
Applications
 Estimation/prediction accuracy (this year)
 Provenance and corroboration
 Timeliness/freshness
 Security/trust
…
Outer
Years
Motivation
Disjoint analysis stovepipes developed
in non-overlapping communities
State of the art…
Fusion
of hard sources
Methods:
• Bayesian analysis
• Maximum likelihood
Estimation
• etc.
Signal data fusion
Missing QoI Theory when
inference uses both sensors and
information
network sources
Fusion
Fusion
of soft sources
Methods:
• Ranking
• Clustering
• etc.
Information Network
Analysis
of text and images
Methods:
• Transfer
knowledge
• CCM
• etc.
Machine
Learning
Fusion
from human sources
Methods:
• Fact-finding
• Influence analysis
• etc.
Trust, Social
Networks
Sensors, reports, and human sources
6
Contributions
Advances in I1.2
Unified analysis of sensors and
information network links
Quantifiable Estimation/Prediction
Accuracy (QoI)
High QoI,
Quantifiable
uncertainty
Bayesian Interpretation of Information Networks
• Bayesian analysis
• Can use heterogeneous content
• Maximum likelihood Estimation
• Can integrate prior knowledge
• Applied to hard and soft sources
Signal data fusion
Information Network
Analysis
Machine
Learning
• Collaboration on
social factors is
underway
Trust, Social
Networks
Sensors, reports, and human sources
7
Research Milestones
Due
Q1
Q2
Q3
Q4
Description
Estimation-theoretic QoI analysis. Formulation of analytic
models for quantifying accuracy of prediction/estimation
results.
Extended analysis of semantic links in information
networks. Formulation of information network
abstractions that are amenable to analysis as new sensors
in a data fusion framework.
Data pool quality metrics and impact of data fusion.
Formulation of metrics for data selection when all data
cannot be used/sent.
Validation of QoI theory. Documentation and publications.
Outline
 Thread 1: QoI/cost trade-off analysis in a sensor data fusion
problem
(a restricted class of information networks where links = correlations among sensors)
 Optimal, cost-aware sensor selection
 Understanding quality-cost trade-off in sensor estimation/prediction
problems
 Thread 2: Generalized information network analysis as a
sensor data fusion problem
 Map a broad category of link analysis problems to sensor fusion
problems
 Apply fusion theory to quality/cost trade-off analysis
 Thread 3: Feedback to communication network
 Derive QoI metrics for data pools
 Optimize feedback to underlying communication network to deliver
the right data
 Thread 4: Experimentation and validation
Outline
 Thread 1: QoI/cost trade-off analysis in a sensor data fusion
problem
(a restricted class of information networks where links = correlations among sensors)
 Optimal, cost-aware sensor selection
 Understanding quality-cost trade-off in sensor estimation/prediction
problems
 Thread 2: Generalized information network analysis as a
sensor data fusion problem
 Map a broad category of link analysis problems to sensor fusion
problems
 Apply fusion theory to quality/cost trade-off analysis
 Thread 3: Feedback to communication network
 Derive QoI metrics for data pools
 Optimize feedback to underlying communication network to deliver
the right data
 Thread 4: Experimentation and validation
Thread 1: QoI/Cost Analysis in
Sensor Data Fusion
Part I: The Sensor Selection Problem:
 The redundancy relationships between different sensors are
known a-priori, and can be represented as virtual linkages
 We design efficient theoretical models (for “link analysis”) that
exploit these linkages for sensor selection
 Integer programming formulation
 Greedy approximation algorithm
 Our greedy selection algorithm provides a small constant
guaranteed approximation bound.
 We show that our approach is much more effective than
baseline sampling strategies.
 Experimental results illustrate the effectiveness and efficiency
of the approach.
An Information Network
Interpretation
 Link analysis for maximum weighted graph coverage
 Nodes = sensors (measured variables)
 Links = correlations
An Information Network
Interpretation
 Link analysis for maximum weighted graph coverage
 Nodes = sensors (measured variables)
 Links = correlations
More on that later, by Amotz BarNoy …
Outline
 Thread 1: QoI/cost trade-off analysis in a sensor data fusion
problem
(a restricted class of information networks where links = correlations among sensors)
 Optimal, cost-aware sensor selection
 Understanding quality-cost trade-off in sensor estimation/prediction
problems
 Thread 2: Generalized information network analysis as a
sensor data fusion problem
 Map a broad category of link analysis problems to sensor fusion
problems
 Apply fusion theory to quality/cost trade-off analysis
 Thread 3: Feedback to communication network
 Derive QoI metrics for data pools
 Optimize feedback to underlying communication network to deliver
the right data
 Thread 4: Experimentation and validation
Outline
 Thread 1: QoI/cost trade-off analysis in a sensor data fusion
problem
(a restricted class of information networks where links = correlations among sensors)
 Optimal, cost-aware sensor selection
 Understanding quality-cost trade-off in sensor estimation/prediction
problems
 Thread 2: Generalized information network analysis as a
sensor data fusion problem
 Map a broad category of link analysis problems to sensor fusion
problems
 Apply fusion theory to quality/cost trade-off analysis
 Thread 3: Feedback to communication network
 Derive QoI metrics for data pools
 Optimize feedback to underlying communication network to deliver
the right data
 Thread 4: Experimentation and validation
Thread 1: QoI/Cost Analysis in
Sensor Data Fusion
Part II: The Estimation/prediction Problem
 Analyze cost/quality trade-off in estimation problems:
Thread 1: QoI/Cost Analysis in
Sensor Data Fusion
Part II: The Estimation/prediction Problem
 Analyze cost/quality trade-off in estimation problems:
Fusion
Outcome
Insights
 Complex general system models with a large number of
parameters are hard to train (need a lot of training data)
and have a high inference cost (need a lot of inputs)
 Poor cost/quality trade-off
 Main idea: Break-up complex general models into trees of
simpler (but more specialized models)
 Model has fewer parameters
 less run-time data collection cost
 Model may fit special case better
 higher accuracy
 Improved cost/quality trade-off!
When Cost is Concerned
Ai
Sorted Attribute Ai used
Aj
Unsorted Attribute Aj used
A1
A5
T
A3
Terminal
Cost
Budget
A2
T
A4
T
T
Used Attribute {A1, A2,
A3, A5}
T
T
Cost is sum of Cost{A1,
A2, A3, A5}
T
T
T
Cost of a Node is defined as the sum of costs of splitting attributes and
costs of predicting attributes
Approach: Two Level Cost Prune
A1
Parent level
Generate children to parent if
children cannot meet cost
budget at terminal level
Within
Budget
Beyond
Budget
A5
A3
A2
T
T
T
Cost
Budget
A4
T
T
Terminal level
Start dropping the least important non-splitter
attributes from prediction set
T
T
T
Evaluation
(Predicting Vehicular Fuel Consumption)
Prediction Accuracy
Method Used
Single Model*
(Cost-insensitive)
Cube Model
(Cost-insensitive)
Cost-insensitive Hybrid
Regression Tree
Cost-sensitive Hybrid
Regression Tree
Prediction Error
(%)
34.39%
Cost
21.25%
33
19.47%
34
18.88%
23
35
*Single Model: use all data (without splitting into subspaces) to build a
single regression model to predict
An Information Network Interpretation
 Different predictors are
chosen depending on
measured conditions
Fusion
Outcome
An Information Network Interpretation
 Different predictors are
chosen depending on
measured conditions
Fusion
Outcome
Generalizing the Distance Metric
(Collaboration with I2.1)
 Generalize links to represent a broader distance metric
 Nodes = items, Link weight = semantic similarity
 PictureNet: A service for sharing pictures of areas in distress
(e.g., to quickly survey damage in the aftermath of disasters)
 Goal: Reduce redundant image transmissions to maximize
situation awareness under resource constraints
 Nodes = pictures, Link weight = similarity between content
Each router decides on
pictures to forward:
Send these pictures
Evaluation of PictureNet
(Collaboration with I2.1)
 Improves fraction of
identified distress areas
by up to 20%
20% better survivor recovery potential
Outline
 Thread 1: QoI/cost trade-off analysis in a sensor data fusion
problem
(a restricted class of information networks where links = correlations among sensors)
 Optimal, cost-aware sensor selection
 Understanding quality-cost trade-off in sensor estimation/prediction
problems
 Thread 2: Generalized information network analysis as a
sensor data fusion problem
 Map a broad category of link analysis problems to sensor fusion
problems
 Apply fusion theory to quality/cost trade-off analysis
 Thread 3: Feedback to communication network
 Derive QoI metrics for data pools
 Optimize feedback to underlying communication network to deliver
the right data
 Thread 4: Experimentation and validation
Outline
 Thread 1: QoI/cost trade-off analysis in a sensor data fusion
problem
(a restricted class of information networks where links = correlations among sensors)
 Optimal, cost-aware sensor selection
 Understanding quality-cost trade-off in sensor estimation/prediction
problems
 Thread 2: Generalized information network analysis as a
sensor data fusion problem
 Map a broad category of link analysis problems to sensor fusion
problems
 Apply fusion theory to quality/cost trade-off analysis
 Thread 3: Feedback to communication network
 Derive QoI metrics for data pools
 Optimize feedback to underlying communication network to deliver
the right data
 Thread 4: Experimentation and validation
Thread 2: Generalized (Information Network)
Link Analysis as a Sensor Fusion Problem
 Links as arbitrary “claims” relating arbitrary nodes to arbitrary
assertions
 How to combine information from sensors and from information
network link analysis, while offering a rigorous quantification of
QoI (e.g., correctness probability)?
P(armed convoy)=?
John
Claim1
Mike
Claim3
Claim4
Vibration
sensors
Claim2
+
Sally
Acoustic
sensors
Target
Infrared motion sensor
Main Contribution:
A Bayesian Interpretation of Information Network
Link Analysis
 A Fact-finder:
Rank (Claim j ) 
Rank (Source i ) 
John
1
j
1
i
 Rank (Source
kSourcesj
 Rank (Claim
kClaimsi
k
)
Mike
Claim2
Claim3
k
Claim1
)
Sally
Claim4
 A Bayesian Classifier (Naïve Bayesian):
 P(Sensor
P(Target j | Sensors )  P(Target j )
kSensorsj
Z
k
| Target j )
Main Contribution:
A Bayesian Interpretation of Information Network
Link Analysis
 A Fact-finder:
John
Rank (Claim j ) 
Rank (Source i ) 
1
j
1
i
 Rank (Source
kSourcesj
 Rank (Claim
kClaimsi
k
)
Mike
Claim2
Claim3
k
Claim1
)
Sally
Claim4
Equivalent to:
 A Bayesian Classifier (Naïve Bayesian):
 P(Sensor
P(Target j | Sensors )  P(Target j )
kSensorsj
Z
k
| Target j )
Evaluation Results
 Comparison with 4 fact-finders from
previous literature
 No a priori information on individual
sources or assertions
 Significantly improved prediction
accuracy of source correctness
probability
 Less than 1% false positives and false
negatives for large networks
Outline
 Thread 1: QoI/cost trade-off analysis in a sensor data fusion
problem
(a restricted class of information networks where links = correlations among sensors)
 Optimal, cost-aware sensor selection
 Understanding quality-cost trade-off in sensor estimation/prediction
problems
 Thread 2: Generalized information network analysis as a
sensor data fusion problem
 Map a broad category of link analysis problems to sensor fusion
problems
 Apply fusion theory to quality/cost trade-off analysis
 Thread 3: Feedback to communication network
 Derive QoI metrics for data pools
 Optimize feedback to underlying communication network to deliver
the right data
 Thread 4: Experimentation and validation
Outline
 Thread 1: QoI/cost trade-off analysis in a sensor data fusion
problem
(a restricted class of information networks where links = correlations among sensors)
 Optimal, cost-aware sensor selection
 Understanding quality-cost trade-off in sensor estimation/prediction
problems
 Thread 2: Generalized information network analysis as a
sensor data fusion problem
 Map a broad category of link analysis problems to sensor fusion
problems
 Apply fusion theory to quality/cost trade-off analysis
 Thread 3: Feedback to communication network
 Derive QoI metrics for data pools
 Optimize feedback to underlying communication network to deliver
the right data
 Thread 4: Experimentation and validation
Abdelzaher, Cao, Govindan, LaPorta, Yener
INARC/CNARC Interactions
(Courtesy of Ramesh Govindan, CNARC)
What is the role of the Communication Network?
Inference Algorithms Operate
On Linked Structures
Information Network contains
Linked Information Structures
Information from CN
populates structures
Information
Structure (e.g., track)
Communication
Network
34
The Puzzle Model of Information
Structures
 Each information structure
can be thought of as a puzzle
 For example, in a track, a
puzzle piece represents a
segment of a track
 At any given instant, one or
more pieces of the puzzle
may be missing
 Each missing piece is
involved in multiple
constraints
35
Relationship Between Information
and Communication Networks
 Information network runs algorithms to determine
which puzzle piece to fill next
 This depends on the kind of decision to be made, but
in general is the piece that increases confidence most
 Fusion application expresses the “request to fill the
puzzle piece” as a desired-QoI request to the
communications network
This model represents a framework for
a collection of collaborative research between
INARC and CNARC
36
Road Ahead (Q3):
(Example is Courtesy of Ramesh
Govindan, CNARC)
Suspect’s
Track
Information
Structure
Missing Puzzle Pieces
Decision: Are the
suspects working
together?
Most Important
Piece to be filled
37
Road Ahead (Q3):
QoI Metrics for Data Pools
Main Approach:
• Use results from sensor
selection and quality/cost
optimization to determine
what to send
Decision:
the
• Understand
impact Are
of prior
suspects
working
knowledge
and constraints
Key!
together?
Most Important
Piece to be filled
38
Abdelzaher, Adali, Han, Huang, Roth, Szymanski
Thread 4: Validation
The Apollo FactFinder
 Apollo: Improves fusion QoI from noisy human and sensor data.
 Demo in IPSN 2011 (in April)
 Collects data from cell-phones
 Interfaced to twitter and twitpic (collects text and images)
 Can use sensors and human text
 Analysis on several data sets: what really happened?
Apollo Architecture
Apollo: Towards Factfinding in Participatory Sensing, H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher,
J. Han, D. Roth, B. Szymanski, and S. Adali, demo session at ISPN10, The 10th International Conference on Information
Processing in Sensor Networks, April, 2011, Chicago, IL, USA.
Thread 4: Data Sets (Q4)
Track data from cell-phones
in a controlled experiment
Tweets on Japan Earthquake,
Tsunami and Nuclear Emergency
2 Million tweets from
Egypt Unrest
Road Ahead
Develop a unifying QoI-assurance theory for fact-finding/fusion
from heterogeneous sources
 Sources
 Different media: signals, text, images, …
 Different authors: physical sensors, humans
 Capabilities
 Computes accurate best estimates of probabilities of correctness
 Computes accurate confidence bounds in results
 Enhances QoI/cost trade-offs in data fusion systems
 Integrates sensor and information network link analysis into a unified
analytic framework for QoI assessment
 Accounts for data dependencies, constraints, context and prior knowledge
 Account for effect of social factors such as trust, influence, and homophily
on opinion formation, propagation, and perception (in human sensing)
 Impact: Enhanced warfighter ability to get accurate information
Collaborations
QoI Mining
Task
I3.1
QoI/cost analysis (unified theory for
estimation/prediction and information
network link analysis
New link analysis
algorithms
Social network
models
QoI Task
I1.2
Community
Modeling
S2.2 Decisions
under Stress
S3.1
Provenance
Task
T1.3
Fusion Task
I1.1
New fusion
algorithms
- Accurate, timely
In-network
Storage
I2.1/C2.1
Capacity
Task
I1.2
Sister QoI
Task
I1.2
Relation with CNARC
(Fusion) QoI
Delivered
Mission requirements
Optimal
Value/Cost
Data Fusion Application
INARC
Spec.
(Object-level) QoI
Specification
CNARC
Information
Network
Control
Data
Delivered
Underlying Communication
Network
Twitter and the Human Sensor:
Crowd-sourcing and Social Dynamics
Relation with SCNARC and Trust
(Courtesy of Boleslaw Szymanski, SCNARC)
 Understanding the “human sensor”
 Computing quantifiably reliable information from large numbers of soft,
unreliable data sources
Collection of tweets from Egypt
Anti-Government Protests Jan – Feb 2011
Behavioral trust based on tweeting interactions
Tahrir Square, February 11, 2011. © 2011 Human Rights Watch
Human-centric Sensing, M. Srivastava (UCLA), T. Abdelzaher (UIUC), B. Szymanski (RPI),
Philosophical Transactions of the Royal Society, 2011.
Papers
Collaborative – Multi-institution:
 Q1 (UIUC+IBM): Hossein Ahmadi, Tarek Abdelzaher, Jiawei Han, Raghu Ganti and Nam
Pham, “On Reliable Modeling of Open Cyber-physical Systems and its Application to
Green Transportation,” ICCPS, Chicago, IL, April 2011.
 Q1 (IBM+CUNY): Charu Aggarwal, Amotz Bar-Noy, Simon Shamoun, “On Sensor
Selection in Linked Information Networks,” submitted to DCoSS 2011
 Q1 (UIUC+IBM): Dong Wang, Hossein Ahmadi, Tarek Abdelzaher, Harsha Chenji, Radu
Stoleru, Charu Aggarwal, “Optimizing Quality-of-Information in Cost-sensitive Sensor
Data Fusion,” submitted to DCoSS 2011.
 Q2 (UIUC+IBM): Tarek Abdelzaher, Dong Wang, Hossein Ahmadi, Jeff Pasternack, Dan
Roth, Omid Fetemieh, and Hieu Le, Charu Aggarwal, “On Bayesian Interpretation of
Fact-finding in Information Networks,” submitted to Fusion 2011
Collaborative – Inter-center:
 Q2 (I+SC): Mani Srivastava, Tarek Abdelzaher, Boleslaw K. Szymanski, “Human-centric
Sensing,” Philosophical Transactions of the Royal Society, special issue on Wireless Sensor
Networks, expected in 2011 (invited).
 Q2 (I+SC): H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J.
Han, D. Roth, B. Szymanski, S. Adali, “Apollo: Towards Factfinding in Participatory
Sensing,” IPSN Demo, April 2011
 Q3 (I+C): Md Y. S. Uddin, Guo-Jun Qi, and Tarek Abdelzaher, Guohong Cao, “PhotoNet:
A Similarity-aware Image Delivery Service for Situation Awareness,” IPSN Demo, April
2011
More Papers
I1.1-I1.2 Collaboration (Multi-institution):
 (UIUC+IBM): G. Qi, C. Aggarwal, T. Huang, “Towards Semantic Knowledge




Propagation between text and web images,” WWW Conference, 2011.
(UIUC+IBM): Guo-Jun Qi, Charu Aggarwal, Yong Rui, Qi Tian, Shiyu Chang
and Thomas Huang, “Towards Cross-Category Knowledge Propagation for
Learning Cross-domain Concepts,” IEEE Conference on Computer Vision and
Pattern Recognition (CVPR 2011), Colorado Springs, Colorado, June 21-23, 2011
(IBM+UIUC): C. Aggarwal, Y. Zhao, P. Yu. On Wavelet Decomposition of
Uncertain Text Streams, CIKM Conference, 2011.
(UIUC+IBM): G. Qi, C. Aggarwal, T. Huang, “Transfer learning with distance
functions between text and web images,” Submitted to the ACM KDD
Conference, 2011.
(UIUC+IBM): G. Qi, C. Aggarwal, H. Ji, T. Huang, “Exploring Content and
Context-based Links in Social Media: A Latent Space Method,” Submitted to
IEEE Transactions on Pattern Mining (TPAMI)
Military Relevance
 Enhanced warfighter decision-making ability based on




better quality assessment of fusion outputs
Improved lifetime of deployed fusion assets based on
improved QoI/cost performance
Foundations of a unified QoI assurance theory for
fusion systems that utilize both sensors and
information networks
A quantitative understanding of the benefits of
exploiting information network links in data fusion
Improved QoI/cost trade-offs in networked data fusion
systems