Workshop report

Download Report

Transcript Workshop report

ISCTSC
Workshop A7
Best Practices in Data Fusion
Objectives
• Indentify the state of the art and the state of
practice
• Identify key research challenges and
opportunities
• Identify tangible ways to accelerate
methodological innovation and adoption in
practice
What exactly is data fusion?
• Using more than one data source to estimate a
parameter of interest
Direct
measurement
Real world process
(T , T )
The World ‘today’
(N )
Measurement 1
( X1 )
Measurement 2
( X2 )
Measurement n
……
( Xn )
What exactly is data fusion?
• Using more than one data source to estimate a
parameter of interest
Direct
measurement
Real world process
(T , T )
The World ‘today’
(N )
Measurement 1
( X1 )
Measurement 2
( X2 )
Measurement n
……
( Xn )
Indirect
measurement
Real world process
(T , T )
The World ‘today’
(N )
Complex interaction with
other quantities as captured
in existing domain models
Measurement 1
(Y1 )
Measurement 2
(Y2 )
Measurement m
……
(Ym )
SOP & SOA (1)
• There is a long history of data fusion in transport,
but very fragmented
• Examples
– Synthetic population generation
– OD matrix updating
– Data enrichment in discrete choice model estimation
– Network state estimation
– Activity pattern feature extraction from trace data
– Use of multiple survey modes
– Activity and time use survey consolidation
– Population exposure modelling
– Public transport (e.g. UK bus) OD matrix estimation
Summary: SOP & SOA (2)
• Problem types:
– Direct observation by multiple methods
• Requires error model
• Does not in general require system process model
– Direct and indirect observation
• Requires error model
• Requires additionally a system process model to link indirect
observations to parameters of interest
• Methods:
– ‘Record linking’ methods (e.g., statistical matching,
data mining, imputation, fuzzy logic)
– Model-based inference (e.g., FIML, filtering, Bayesian
inference)
Research needs (1)
• Enabling research
– Better meta data (survey/data collection process +
context) to support informed fusion (specially important
in era of web 2.0)
– More professional and disciplined protocols in reporting
data treatments in published work
– Better techniques of disclosure management
– Understanding how to make the business case for data
fusion
• Benefits - sample size, precision;
• Barriers – perception of ‘made up data’, threat to incumbent
data providers
Research needs (2)
• Methodological research
– Detecting genuinely conflicting information (not fuseable) – a form
of specification test
– Better means of validating fused data
– Better methods for modelling the propagation of data and model
uncertainty during data fusion – enhance confidence in fused data
– Are deterministic/’mean imputation’ approaches adequate – how
seriously do they distort the covariance structure?
– Better re-sampling/Bayesian methods in high dimensions
– Integrate methods from SAE
– Opportunities to reduce respondent burden by split designs and
ex-post fusion (a la SP surveys and analysis) – question
substitutability
– For record matching, what are the key connecting variables?
Research needs (3)
• Research infrastructure
– Establish to more consistent and complete taxonomy of
data fusion problems, methods, outcomes
– Establish reference datasets and reference ‘cases’