Fundamentals of Context-aware Real

Download Report

Transcript Fundamentals of Context-aware Real

Abdelzaher (UIUC)
Research Milestones
Due
Q1
Q2
Q3
Q4
Description
Estimation-theoretic QoI analysis. Formulation of analytic
models for quantifying accuracy of prediction/estimation
results.
Extended analysis of semantic links in information
networks. Formulation of information network
abstractions that are amenable to analysis as new sensors
in a data fusion framework.
Data pool quality metrics and impact of data fusion.
Formulation of metrics for data selection when all data
cannot be used/sent.
Validation of QoI theory. Documentation and publications.
Research Milestones
Due
Q1
Q2
Q3
Q4
Description
Estimation-theoretic QoI analysis. Formulation of analytic
models for quantifying accuracy of prediction/estimation
results.
Extended analysis of semantic links in information
networks. Formulation of information network
abstractions that are amenable to analysis as new sensors
in a data fusion framework.
Data pool quality metrics and impact of data fusion.
Formulation of metrics for data selection when all data
cannot be used/sent.
Validation of QoI theory. Documentation and publications.
This Talk:
Towards a QoI Theory for Data Fusion
from Sensors + Information network links
Fusion
of hard sources
Methods:
• Bayesian analysis
• Maximum likelihood
Estimation
• etc.
Signal data fusion
Fusion
of soft sources
Methods:
• Ranking
• Clustering
• etc.
Information Network
Analysis
Fusion
of text and images
Methods:
• Transfer
knowledge
• CCM
• etc.
Machine
Learning
Fusion
from human sources
Methods:
• Fact-finding
• Influence analysis
• etc.
Trust, Social
Networks
Sensors, reports, and human sources
4
Sensor Fusion
Example: Target Classification
 Different sensors (of known reliability, false alarm rates,
etc) are used to classify targets
 Well-developed theory exists to combine possibly
conflicting sensor measurements to accurately estimate
Vibration
Infrared motion sensor
Target
target attributes.
 Bayesian analysis
 Maximum likelihood
 Kalman filters
 etc.
Acoustic
sensors
sensors
Information Network Mining
Example: Fact-finding
 Example 1:
 Consider a graph of who published where (but no
prior knowledge of these individuals and
conferences)
 Rank conferences and authors by importance in
their field
WWW
Han
KDD
Roth
Fusion
Abdelzaher
Sensys
 Example 2:
 Consider a graph of who said what (sources and
assertions but no prior knowledge of their
credibility)
 Rank sources and assertions by credibility
John
Claim1
Mike
Claim3
Claim4
Claim2
Sally
The Challenge
 How to combine information from sensors and information
network links to offer a rigorous quantification of QoI (e.g.,
correctness probability) with minimal prior knowledge?
P(armed convoy)=?
John
Claim1
Mike
Claim3
Claim4
Vibration
sensors
Claim2
+
Sally
Acoustic
sensors
Target
Infrared motion sensor
Applications
 Understand Civil Unrest
 Remote situation assessment
 Use Twitter feeds, news, cameras, …
 Expedite Disaster Recovery
 Damage assessment and first response
 Use sensor feeds, eye witness reports,
…
 Reduce Traffic Congestion
 Maping traffic congestion in city
 Use crowd-sourcing (of cell-phone
GPS measurements), speed sensor
readings, eye witness reports, …
Approach: Back to the Basics
 Interpret the simplest fact-finder as a classical (Bayesian)
sensor fusion problem
 Identify the duality between information link analysis and
Bayesian sensor fusion (links = sensor readings)
 Use that duality to quantify probability of correctness of
fusion (i.e., information link analysis) results
 Incrementally extend analysis to more complex information
network models and mining algorithms
An Interdisciplinary Team
QoI Mining
Fusion Task
Task
I1.1
I3.1
QoI Task
I1.2
 Abdelzaher (QoI, sensor fusion)
 Roth (fact-finders, machine learning)
 Aggarwal, Han (Data mining, veracity analysis)
The Bayesian Interpretation
John
 The Simplest Fact-finder:
Rank(Claim j ) 
Rank(Sourcei ) 
1
j
1
i
Claim1
Mike
 Rank(Source )
kSources j
k
Claim2
Claim3
 Rank(Claimk )
Sally
Claim4
kClaims i
 The Simplest Bayesian Classifier (Naïve Bayesian):
 P(Sensor
k
P(T argetj | Sensors)  P(T argetj )
kSensors j
Z
| T argetj )
The Equivalence Condition
 P(Sensor
k
P(T argetj | Sensors)  P(T argetj )
| T argetj )
kSensors j
Z
 We know that for a sufficiently small xk:
 (1  x )  1   x
k
k
k
k
 Consider individually unreliable sensors:
P(Sensork | T argetj )
P(Sensork )
 1  x jk , x jk  1
A Bayesian Fact-finder
 By duality, if:
Sensors  Sources
Measured States  Claims
 Then, Bayes Theorem eventually leads to:
Rank(Claim j ) 
Rank(Sourcei ) 
 and:
 Rank(Source )
k
kSources j
 Rank(Claim )
kClaimsi
P(Claim j | network)   ( Rank(Claim j )  1)
P(Sourcei | network)  ( Rank(Sourcei )  1)
k
Fusion of Sensors and Information
Networks
Source1
Sensor2
Sensor1
Sensor3
Source2
Claim3
Fusion
Result
Information
Network
Claim1
Claim2
Source3
Claim4
 Putting fusion of sensors and information network
link analysis on a common analytic foundation:
 Can quantify probability of correctness of results
 Can leverage existing theory to derive accuracy bounds
Fusion of Sensors and Information
Networks
Source1
Sensor2
Sensor1
Sensor3
Source2
Claim3
Measurements
Fusion
Result
Information
Network
Claim1
Claim2
Source3
Claim4
Measurements
 Putting fusion of sensors and information network
link analysis on a common analytic foundation:
 Can quantify probability of correctness of results
 Can leverage existing theory to derive accuracy bounds
Simulation-based Evaluation
 Generate thousands of “assertions” (some true, some false –
unknown to the fact-finder)
 Generate tens of sources (each source has a different
probability of being correct – unknown to the fact-finder)
 Sources make true/false assertions consistently with their
probability of correctness
 A link is created between each source and each assertion it makes
 Analyze the resulting network to determine:
 The set of true and false assertions
 The probability that a source is correct
 No prior knowledge of individual sources and assertions is
assumed
Evaluation Results
Comparison to 4 fact-finders from literature
 Significantly improved prediction accuracy of source
correctness probability (from 20% error to 4% error)
Evaluation Results
Comparison to 4 fact-finders from literature
 (Almost) no false positives for larger networks (> 30 sources)
Evaluation Results
Comparison to 4 fact-finders from literature
 Below 1% false negatives for larger networks (> 30 sources)
Abdelzaher, Adali, Han, Huang, Roth, Szymanski
Coming up: The Apollo FactFinder
 Apollo: Improves fusion QoI from noisy human and sensor data.
 Demo in IPSN 2011 (in April)
 Collects data from cell-phones
 Interfaced to twitter
 Can use sensors and human text
 Analysis on several data sets: what really happened?
Apollo Architecture
Apollo: Towards Factfinding in Participatory Sensing, H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher,
J. Han, D. Roth, B. Szymanski, and S. Adali, demo session at ISPN10, The 10th International Conference on Information
Processing in Sensor Networks, April, 2011, Chicago, IL, USA.
Apollo Datasets
Track data from cell-phones
in a controlled experiment
2 Million tweets from
Egypt Unrest
Tweets on Japan Earthquake,
Tsunami and Nuclear Emergency
Immediate Extensions
 Non-independent sources
 Sources that have a common bias, sources where one
influences another, etc.
 Collaboration opportunities with SCNARC and Trust
 Non-independent claims
 Claims that cannot be simultaneously true
 Claims that increase or decrease each other’s probability
 Mixture of reliable and unreliable sources
 More reliable sources can help calibrate correctness of
less reliable sources
Road Ahead
Develop a unifying QoI-assurance theory for fact-finding/fusion
from hard and soft sources
 Sources
 Use different media: signals, text, images, …
 Feature differ authors: physical sensors, humans
 Capabilities
 Computes accurate best estimates of probabilities of correctness
 Computes accurate confidence bounds in results
 Enhances QoI/cost trade-offs in data fusion systems
 Integrates sensor and information network link analysis into a unified
analytic framework for QoI assessment
 Accounts for data dependencies, constraints, context and prior knowledge
 Account for effect of social factors such as trust, influence, and homophily
on opinion formation, propagation, and perception (in human sensing)
 Impact: Enhanced warfighter ability to assess information
Collaborations
QoI Mining
Task
I3.1
QoI/cost analysis (unified theory for
estimation/prediction and information
network link analysis
Fusion Task
I1.1
QoI Task
I1.2
(w/Dan Roth)
Account for prior
knowledge and
constraints
(w/Jiawei Han)
Consider new link
analysis algorithms
Community
Modeling
S2.2 Decisions
under Stress
S3.1
(w/Boleslaw Szymanski
and Sibel Adali)
Model humans in the
loop
(w/Aylin Yener)
Increase OICC
OICC Task
C1.2
Sister QoI
Task
C1.1
(w/Ramesh Govindan)
Improve communication
resource efficiency
Collaborations
Collaborative – Multi-institution:
 Q2 (UIUC+IBM): Tarek Abdelzaher, Dong Wang, Hossein Ahmadi,
Jeff Pasternack, Dan Roth, Omid Fetemieh, and Hieu Le, Charu
Aggarwal, “On Bayesian Interpretation of Fact-finding in
Information Networks,” submitted to Fusion 2011
Collaborative – Inter-center:
 Q2 (I+SC): H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T.
Abdelzaher, J. Han, D. Roth, B. Szymanski, S. Adali, “Apollo:
Towards Factfinding in Participatory Sensing,” IPSN Demo, April 2011
 Q2 (I+SC): Mani Srivastava, Tarek Abdelzaher, Boleslaw Szymanski,
“Human-centric Sensing,” Philosophical Transactions of the Royal
Society, special issue on Wireless Sensor Networks, expected in 2011
(invited).
Invited Session on QoI at Fusion 2011
(co-chaired with Ramesh Govindan, CNARC)
Military Relevance
 Enhanced warfighter decision-making ability based on
better quality assessment of fusion outputs
 A unified QoI assurance theory for fusion systems that
utilize both sensors and information networks
 Offers a quantitative understanding of the benefits of
exploiting information network links in data fusion
 Enhances result accuracy and provides confidence
bounds in result correctness