MCMC (continued)

Download Report

Transcript MCMC (continued)

MCMC: Particle Theory
By Marc Sobel
Particle Theory: Can we understand
it?
The Dynamic Model Setup: Heavy
Theory
Yt  h(t , X1:t , )  Wt ;
Wt
t  1,...;
independent errors;
 are unobserved parameters independent of the X's.
X t are unobserved parameters
generated independently or by a markov
process.
X t :
t=1,... is called the signal process.
Yt are observations.
The function 'h' is typically unknown
Particle Filters: Lighter Theory
 More familiar dynamic model setting:
Yt
Xt
qt ( | X t , );
ft ( | X t 1,  );
 Even more familiar: (for unknown g,h)
Yt  g ( X t )   t ;
X t  h( X t 1 )  et ;
Particle Filter: Goal
 The Goal in particle filter theory is to simulate (all or part
of) the (unobserved) posterior distribution of the signal
process {Xt: t=1,…}=X1:t
 (i.e., (X1:t |Y1:t) ) (as well as additional parameters, if
necessary). Specifically, we would like to simulate the
signal process at time t (i.e., the posterior distribution of
X1:t. The particles referred to above take the form,

(k )
P  X1:(1)t , X1:(2)
,...,
X
t
1:t

 These particles are designed to approximate the
posterior distribution, (X1:t |Y1:t) through the introduction
of appropriate weights (see next).
Particle Filter: Weights
 The (normalized) weights W(1),…,W(k) attached to particles
are designed so that the posterior probability
 X1:(it) | Y1:t 
 at a particle X(i) is given by the weight W(i) (i=1,…,k).
Thus, for example,
P ( X j  a j ; j  1,..., k ) 

X (jl )  a j ; j 1,...,t
l 1,..., k
W1:(tl )
 Since weights are always assumed to be normalized, we
need only specify them up to a constant of proportionality.

Convergence
 The goal of convergence is that the estimated
posterior distribution of xn’s given observations Y1:n

ˆt ( A) P( X t  A | y0:t 1)
 corresponds to what it should be. Typically the
posterior distribution is defined via weighting wt[k].
 This induces, using ‘K’ as a kernel, the density,
ˆt ( x) 
(k )
 wt K  X t  x  '  h  X t  x 
k
 K  X t  x  '  h  X t  x 
k
convergence
 We’d like the resulting measures to converge,
ˆt  ˆ  0;

or
 ˆ 
ˆt
  ˆ log    0
 ˆt log
ˆ
ˆt 
Particle Filters: Bootstrap
 Assume that the (Y,X)’s are independent of each
other. Then we can build particle filters
sequentially by defining the weights by:
(k )
W1:t
(k )
 W1:t 1 *


(k )
X t | Yt  W1:t 1 *
(k )

Yt | X t  ( X t )
(k )
(k )

 for particles formed by: X1:t  X1:t 1, X t
 But, by normalization the term W (k ) drops out.
1:t 1
So, we can define the weights on the new Xt’s by
(k )
Wt
  X t | Yt   Yt | X t  ( X t )
Particles PSLAM (continued)
 To get the bootstrap particle filter, we assume that
the prior distribution of Xt can be based on
simulating X’s from the prior distribution. This
leaves us to define the weights by:

 
Wt(l )  X t(l ) | Yt  Yt | X t(l )

for Xt’s selected from the prior. We might then cull
the particles, choosing only those with reasonably
large weights.
Particle Selection: Elementary
 Shephard and Pitt (in 1994) proposed dividing up
the particles into homogeneous groups (i.e.,
based on the values of functions of particles), and
resampling from each group separately. This has
the advantage that we aren’t working with
collections of particles which combine apples and
oranges.
Wt(l )


Yt | X t(,lk)
;
Yt | X t , k 
A Probabilistic Approach
 The following algorithms take a probabilistic
approach
p( xt , m | z1 : t , u1 : t )
xt  State of the robot at time t
m  Map of the environmen t
z1 : t  Sensor inputs from time 1 to t
u1 : t  Control imputs from time 1 to t
Full vs. Online SLAM: Full Slam is an example of a
particle filter which is not a bootstrap filter.
 Full SLAM calculates the robot state over all
time up to time t
p( x1 : t , m | z1 : t , u1 : t )
 Online SLAM calculates the robot state for the
current time t
p( xt , m | z1:t , u1:t )      p( x1:t , m | z1:t , u1:t ) dx1dx2 ...dxt 1
Weights
 We’ve already said that weights reflect the
posterior probability of a particle. In
Sequential MCMC we construct these
weights sequentially. In bootstrap particle
analysis we simulate using the prior, but
sometimes this doesn’t make sense. (see
examples later). One mechanism is to use
another distribution (i.e., less prone to
outliers) to generate particle extensions.
Weights (continued)
 Liu (2001) recommends using a target distribution
q to reconstruct the current posterior. Use ‘q’ to
simulate the X’s. The weights then take on the
form,
 ( X t | X1:t 1) f (Yt | X t ) 
W1:n  W1:n 1 

q( X t | X1:t 1 )



Kernel Density Estimate-based
Particle Filters
 Use a kernel (gaussian or otherwise) as the
proposal density q. This puts likelihood weights on
the points selected by the prior:

k

(l )
1
 L( y1:t | x1:t ) K ( x1:t  x)' h ( x1:t  x)
fˆ ( x)  l 1
k
(l )
 L( y1:t | x1:t )
l 1
Doucet Particle Filters
 Doucet recommends using the kernel K as a
proposal density. His weighting becomes:
 ( X t | X1:t 1 ) f (Yt | X t ) 
W1:n  W1:n 1 

K ( X t | X1:t 1 )


Effective Sample Size
 The effective sample size of a weighted distribution is the
effective number of unique particles.
k
ESS  k
2
(
i
)
 Wt 

i 1 
 When it gets too small, we devise a threshold; A particle
survives if:
 A) it’s weight is above the threshold, or
 B) it surves with probability (w/thresh).
 All rejected weights are started from time t=0.
Culling or Sampling Particles
 We can see that for non-bootstrap particle filters, particles
tend to multiply beyond all restriction if no culling is used.
Culling (or resampling) removes some (unlikely) particles
so they don’t multiply too rapidly.
 A) residual resampling: At step t-1, extend particles
via
with x1:( tj )   x1:( tj) 1, xt*  reconstruct weights for
x1:( tj) 1
xt*   | x1:( tj) 1, y1:t 
the new particles, retain
( j)  ( j) 
m
  kw1:t  (j=1,...,k) copies of the particle x1:( tj ) . Draw
m0=k-m(1)-…-m(k) from the particle stream. Residual
resampling has the effect of killing particles with small
weights and emphasizing particles with large weights.
Cull Particles (continued)
Thus, a particle with weight .01 in a stream of size 50, is
killed.
Project: Should
you first ‘extend’ particles and then resample or vice
versa?
 B) Simple Resampling: Extend particles as above.
Sample particles from the full stream according to the new
weights.
 This has the effect of ‘keeping’ particles with low weights.
 Thus, a particle with weight .01 in a stream of size 50, has
a .01 chance of being selected (whereas for residual
resampling) it has a chance of 0.
Resampling (continued)
 C) General Resampling: Define ‘rescaled’ probability
weights: (a(1),…,a(k)) (usually related to the wt
weights). Choose particles based on these weights
and assign the ‘new weights’ wt(*,j)= (wt(j)/a(j)) to the
corresponding particles. Rescale the new weights.
 D) Effective Sample size together with General
Sampling: Sample until the effective sample size is
below a threshold. Accept particles whose weights are
bigger than an appropriate weight threshold c (i.e.,
weight median). Accept particles whose weights w are
below the threshold c with probability (w/c) and
otherwise reject them.
 General Resampling: We can generalize residual
sampling by choosing only large weights to resample
from, and then resample with scaled weights. Suppose
some particles have different numbers of
continuations than others. In this case we might want
to select them with weighted probabilities that take this
into account – i.e., make the a’s larger for more
continuations and smaller for fewer continuations. If
there are e.g., more continuations, once these have
been implemented, we reweight by
 w*=(w/a).
The Central Problem with Particle
Filters
 At each time stage ‘t’ we are trying to reconstruct the
posterior distribution ( | x ( j ) , y
and we biuld
1:t 1 )
1:t 1
future particles on this estimate. Mistakes at each stage
amplify mistakes later on. Posterior distribution is typically
done using weights. For this reason, there are many
algorithms supporting ways to improve the posterior
distribution reconstruction:
 A) Use kernel density estimators to reconstruct the
posterior.
 B) Use Metropolis Hastings to reconstruct the posterior.
 C) Divide particles into separate streams, reconstructing
the posterior for each one separately.
Pitt – Shephard Particle Theory
 Suppose new X’s which are sampled aposteriori
are heterogeneous i.e., they can be divided into
homogeneous streams. Call the streams s=1,…,k.
Define, for each stream s, with ‘Zi=s’ denoting that
Xi is in the s’th stream:


1
f (Yi , s |  s )  L  Yi , s |
 X i  ; i=1,....,n s ; s=1,...,k

ns Zi  s


 We then divide sampling into two parts. First we
sample a stream using
 s  f (Yi, s | s ) ; s=1,...,k
Pitt Shephard
 Then, we sample using weights:
(s)
wi

f (Yi , s | Pi , s )
f (Yi , s |  s )
 Suppose we have very heterogeneous particles,
and there is a way to divide them into
homogeneous streams. Then this makes sense.
 (e.g., mixture kalman filter type examples).
Example of a Linear Dynamic Model
 In what follows we consider the following linear
dynamic model:
X t  .5 X t 1  5  Vt ;
Yt  2 X t  Wt ;
 v2  9;  w2  4;
 We compute t  E ( X t2 | Y )
 The Kalman Filter gives us a solution to the
posterior distribution. We can check this against
the bootstrap and other filters.
Comparing Bootstrap Particles with the real
thing computing Λt over 500 time periods.
Histogram of the difference between bootstrap
particle and real parameters
A More Complicated Example of a
Dynamic Model
 In what follows we consider the following quadratic
dynamic models:
X t  .5 X t 1  5   i X t21  Vt ;
Yt  2 X t  Wt ;
1  .005;
 v2  9;  w2  4;
 2  .01;
 The Kalman Filter again gives us stepwise
solutions to the posterior distribution. We can
check this against the bootstrap and other filters.
Quadratic Model for κ1=.005. Bootstrap versus
real parameters.
Absolute difference between the bootstrap
particle filter and the real parameter for
κ1=.005
Switch to Harder Model
 For the second quadratic model when
kappa=.01, the bootstrap particle filter
breaks down entirely before time t=50.
 We switch to residual resampling.
Residual Resamling versus real
parameters when κ2=.01.
Histogram: Note how much larger
differences are in this case
Switch to General Sampling
 Now we use residual sampling as long as
the effective sample size stays above 500.
 When it falls below 500, we use only those
particles:
 A) with weights above .002, or
 B) with probability (w/.002) we accept the
particles.
General Sampling versus real
parameters for the quadratic model:
Coefficient=.01.
Histogram of the differences between general
sampling and real parameters: Note how small
the differences are.
Mixture Kalman Filters: Tracking
Models
 Define models which have a switching
mechanism P(KFi|KFj) in going from Kalman filter
KFj to KFi (i,j=1,…,d). Updates use standard
Kalman Filters for a given model KFj and then with
probability P(KFi|KFj) switch to filter KFi.
Yt  i X t  i   i Zt ;
KFi  
;
 X t   i X t 1  i   iWt
i=1,...,k
Kalman Filters
 We have the updating formula:
P Yt | It  r    P( It  r ) P( X t | X t 1, It  r ) P(Yt | X t , It  r )dX t
 Which can generate particles consisting of
Kalman Filters.
 We can use Pitt-Shepard to sample these
models. In effect, we divide the Kalman Filters
into homogeneous groups, and then sample
them as separate streams.
MKF’s
 Mixture Kalman Filters are useful for
tracking. The Kalman Filters represent
different objects at a particular location. The
probabilities represent the chance that one
object versus others are actually visible
subject to the usual occlusion. The real
posterior distribution for MKF’s is easy to
calculate.
Matlab Code: Part I












% Particle Model
% X[t]=.5X[t-1]+3*V[t]
% Y[t]=2*X[t]+2*W[t]
XP=zeros(100,1000);
X(1,1:1000)=5+sqrt(3)*normrnd(0,1,1,1000);
Y(1,1:1000)=2*X(1,1:1000)+sqrt(2)*normrnd(0,1,1,1000);
W=normpdf(Y(1,:),2*X(1,:),ones(1,1000));
WW=W/sum(W');
XBP(1,:)=randsample(X(1,:),1000,true,WW);
mm=((2*Y(1,:)/4)+(5/9))/(1+(1/9));
sg=sqrt(1/(1+(1/9)));
XP(1,:)=normrnd(mm,sg*ones(1,1000));
Matlab Code: Part II












for tim=2:100
for jj=1:1000
X(tim,jj)=.5*X(tim-1,jj)+5+3*randn;
Y(tim,jj)=2*X(tim,jj)+2*randn;
end
W=normpdf(Y(tim,:),2*X(tim,:),ones(1,1000));
WW=W/sum(W');
XBP(tim,:)=randsample(X(tim,:),1000,true,WW);
mm=((2*Y(tim,:)/4)+(5/9)+.5*X(tim,:)/9)/(1+(1/9));
sg=sqrt(1/(1+(1/9)));
XP(tim,:)=normrnd(mm,sg*ones(1,1000));
end