Proposal Replicates for Spatially Clustered Processes Rafal Wojcik
Download
Report
Transcript Proposal Replicates for Spatially Clustered Processes Rafal Wojcik
Proposal replicates for spatially clustered porcesses
Rafal Wojcik, Dennis McLaughlin, Hamed Almohammad and Dara Entekhabi, MIT
Spatially clustered processes are very pervasive in nature
How can we incorporate their intermittent structure into ensemble data
assimilation?
Can we do more to insure that our estimates are physically realistic?
Forest fire, Colorado
Midwest thunderstorms
(2D space, 1D time)
Algae bloom,
Washington
Rainfall Data Assimilation – Merging Diverse Observations
•
Develop Bayesian (ensemble) data assimilation procedures that can
efficiently merge remote sensing and ground-based measurements of
spatially clustered processes (e.g. rainfall).
•
These procedures will be feature-based versions of particle
filtering/importance sampling or MCMC.
Bayesian Perspective
Extend Bayesian formalism to accommodate geometric features
to integrate prior information w. new measurements :
C Feature
p (C) = p(C | d) ~ L(d | C)p(C)
Likelihood
Posterior
d = Measurement
Prior
Use ensemble representation:
p (C) = å wid (C - Ci )
i
p(C)
q(C, d)
L(d | C)p(C)
wi =
q(C, d )
Proposal
Relationship between true and measured images:
d = C +e
Gives likelihood expression in terms of
observation error PDF:
L(d | C) = pe (d - C)
Ci
Requirements for feature-based Bayesian
Needed for feature-based Bayesian formulation:
1. Generate realistic clustered proposal images
38.5
80
38
60
37.5
38
40
37.5
20
37
37
-99 -98.5 -98 -97.5
0
38.5
38
80
38
60
37.5
37
37
40
36
36.5
20
-99.5-99-98.5-98-97.5
80
80
60
60
40
40
20
20
-99 0-98 -97
0
0
-99
-98
2. Define observation error probability measure over set of possible error
images.
(
L(d |C)µexp - C-d
S
)
Is × a relevant measure of similarity between observations and
S
proposal replicates?
How can we define measurement error norm?
•
should preserve spatially intermittent features of the real process (e.g. rainfall)
•
metrics used to compare replicates and measurements should be sensitive to
clustering.
How similar are these images?
Euclidean metric
Rain replicate (=1)
Euclidean dist = 4
Meas rain (=1)
No rain (=0)
Euclidean dist = 4
Image characterization: cluster based image compression
Initial cluster centers and scattered
rain pixels
Neural gas finds “best” locations for
cluster centers
Center of
rain pixel
Cluster
center
xi
yi
Image is concisely characterized by
cluster centers’ coordinates (xi,yi)
Image characterization: cluster based image compression
NG algorithm identifies 10-D feature vector characterizing
each image replicate
Image characterization: cluster based image compression
1
4
1
2 3
4
5
5
2
3
4
4
5
5 3
1
2
2
3
1
POOR RESULTS:
Numbering of neural gas centers has strong impact on
aggregate distance measure.
Image characterization: Jaccard metric
For two binary vectors (images) A and B Jaccard similarity is defined as:
ì
A×B
ï
J(A, B) = í A × A + B × B - A × B
ï1
î
if A × A + B × B - A × B > 0
A
otherwise
and Jaccard metric is defined as:
B
AA-AB AB BB-AB
D(A, B) =1- J(A, B)
This can be generalized for real positive vectors using:
ì åi min( Ai , Bi )
ï
ï å max(Ai , Bi )
J(A, B) = í i
ï1
ï
î
if
åi max( Ai , Bi ) > 0
otherwise
AA+BB-AB
Image characterization: Jaccard metric
Rain replicate (=1)
Jaccard dist = 0.8
Meas rain (=1)
No rain (=0)
Jaccard dist = 0.7
Feature Ensembles – Training Images & Priors
Multipoint technique identifies patterns within a moving template that scans training image
Training image
Template
Template
patterns
Number of times each
template pattern
occurs
Pattern
probability
Replicate
generator
Replicate generation -- Unconditional simulation
Replicates
Measurement
Training image
rain/no rain probabilities + cluster size distribution preserved
Conditional simulation
Replicates
Measurement
Training image
Conditional ensembles approach analogous to “nudging” (van Leeuven, 2010)
Constructing ensembles of proposal replicates for
Bayesian estimation
How do we generate a moderate-sized proposal (or prior) that properly
represents uncertainty in the measurement while including a reasonable number
of replicates that are "close" to the true image?
measurement
L(d | C)
truth
Constructing ensembles of proposal replicates for
Bayesian estimation
Conditional (1% of pixels)
Conditional (5% of pixels)
500 replicates
Conditional (20% of pixels)
Conditional ensemble (1% of pixels) – sorted using Jaccard metric
Measurement
WORST
WORST
BEST
JACCARD DISTANCE
0.64885
0.66276
0.67246
0.68113
0.68378
0.68765
0.6882
0.69091
0.94444
0.94677
0.94763
0.94802
0.95918
0.96164
0.96169
0.96429
Conditional ensemble (5% of pixels) – sorted using Jaccard metric
Measurement
WORST
BEST
JACCARD DISTANCE
0.55
0.56765
0.56923
0.57143
0.57447
0.57468
0.58075
0.58228
0.76796
0.76882
0.76943
0.77128
0.77975
0.78453
0.79221
0.80052
Conditional ensemble (20% of pixels) – sorted using Jaccard metric
Measurement
WORST
BEST
JACCARD DISTANCE
0.35452
0.37209
0.37895
0.37919
0.37954
0.38141
0.38436
0.38721
0.53311
0.5387
0.54125
0.54321
0.54777
0.5538
0.55696
0.56308
Conclusions
Clustered processes require a feature-based approach to
Bayesian estimation which does not rely on Gaussian
assumptions.
One option is to use importance sampling over the space of
possible features. This requires that we 1) generate
appropriate proposal images and 2) define an observation
error probability measure based on an appropriate norm.
The Jaccard metric is a promising choice for this norm that
orders differing images in an intuitive fashion.
Conditional multi-point random field generators can be used to
produce realistic clustered proposal replicates
Future work will combine these ideas to obtain a feature-based
procedure for rainfall data assimilation
Characterizing random fields using multipoint statistics
Conclusions
Conclusions
Proposal Replicates
Processes
for
Spatially
Clustered
Rafal Wojcik , Dennis McLaughlin, Hamed Almohammad and Dara
Entekhabi, MIT, U.S.
Long-term objectives
Measurements
Proposal ensemble
generator
Update
MAP
estimate
Truth
Geostationary satellite
(e.g. GOES)
Rain gage
Feature preserving
data assimilation
scheme
Microwave LEO satellite
(e.g. NOAA, TRMM,
SSMI)
Radar
(e.g. NEXRAD)
Short-term objective
•
Identify ways to characterize and generate random ensembles of realistic
spatially clustered replicates (images) for ensemble-based data assimilation
•
These procedures will be feature-based versions of particle
filtering/importance sampling or MCMC.
37.5
80
80
38.5
38
Replicate 4
Replicate 3
Replicate 1
Replicate38.5
2
38
60
80
38
60
37.5
37
37
40
36
36.5
20
-99.5-99-98.5-98-97.5
40
37.5
20
37
37
-99 -98.5 -98 -97.5
-99
38
0
-98
0
60
Possible
80 alternatives –
summer rain storms
60
40
…..
40
20
20
-99 0-98 -97
0
Particle filter
Ci
L(d | C)
p (C) = å wid (C - Ci )
i
Common assumption in particle filters:
(
L(d |C)µexp - C-d
)
S
Is × S a relevant measure of similarity between
observations and proposal replicates?
Image characterization
How do we describe a feature ? -- Discretize over an n pixel grid
Feature support
2n possible features
x1
x
2
x
x
n 1
xn
Feature represented as a
vector of pixel values
Feature support + texture
∞ possible features
boundary of
clouds
boundary of
feature
support
no rain
texture (rain intensity)
within support
rain
Geometric aspects of a typical
NEXRAD summer rainstorm