spdf(X) or (X)

Download Report

Transcript spdf(X) or (X)

Introduction to Monte Carlo
Methods
D.J.C. Mackay
Rasit Onur Topaloglu
Ph.D. candidate
[email protected]
Outline
•Problems Monte Carlo (MC) deals with
•Uniform Sampling
•Importance Sampling
•Rejection Sampling
•Metropolis
•Gibbs Sampling
•Speeding up MC : Hybrid MC and Over-relaxation
Definition of Problems
Problem1: Generate samples from a distribution
Problem2: Estimate expectation of a function given
probability distribution for a variable
Monte Carlo for High Dimensions
•Solving first problem => solving second one
•Just evaluate function using the samples and average
them out
•The accuracy is independent of the dimensionality of the
space sampled
Sampling from P(x)
•Assume we can evaluate P(x) within a multiplicative
constant
•If P*(x) can be evaluated, still a hard problem as:
• Z is not known
•Not easy to draw at high dimensions (except Gaussian)
ex: A sample from univariate Gaussian can be
calculated as :
u1 and u2 are
uniform in [0,1]
Difficulty of Sampling at High
Dimensions
•Can discretize the function (figure on right) and sample
the discrete samples
•This is costly at high dimensions: BN for B bins and N
dimensions
Uniform
Sampling
nd
•Tries to solve the 2 problem
•Draw samples uniformly from state space
•ZR is the normalizing constant:
•Estimate
by:
•For distributions that have peaks at a small region, lots
of points must be sampled so that (x) is calculated
a number of times => requires lots of samples
•Thus, uniform distribution seldom useful
Importance
Sampling
nd
•Tries to solve the 2 problem
•Introduce a simpler density Q
•Values of x where Q(x)>P(x) are over-represented;
values of x where Q(x)<P(x) are under-represented=>
introduce weights
Reliability of Importance Sampling
The estimate  vs number of samples
Gaussian sampler
Cauchy Sampler
•An importance sampler should have
heavy tails in problems where
infrequent samples might be influential
Rejection Sampling
•Again, a Q (proposal density) assumed
•Also assume we know a constant c s.t.:
•Generate x using Q*(x)
•Find r.v. u uniformly in interval [0,cQ*(x)]
•If u<=P*(x), accept and add x to the random number list
Transition to Markov Chain MC
•Importance and rejection sampling work well if Q is
similar to P
•In complex problems, it is difficult to find a single such
Q over the state space
Metropolis Method
•Proposal density Q depends on current state x(t):Q(x`;x(t))
Accept new state if a1,
Else; accept with probability a:
•In comparison to rejection sampling, the rejected points
are not discarded and hence influential on
consequent samples => samples correlated
=> Metropolis may have to be run more to generate
independent samples from P(x)!
Disadvantages of Large Step Size
•Useful for high dimensions
•Uses a length scale  much smaller than state space L
•Because:
•Large steps are highly unlikely to be accepted
=> Limited or no movement in space! => biased
A Lower Bound for Independent
Samples
•Metropolis will explore using a random walk
•Random walks take a long time
•After T steps of size , state only moves T
•As Monte Carlo is trying to achieve independent samples,
requires ~(L/ )2 samples to get a sample
independent of the initial condition
•This rule of thumb be used for a lower bound
An Example using this Bound
time
100th iteration
400th iteration
1200th iteration
•Takes ~ 102=100 steps to reach an end state
•Metropolis still provides biases estimates even after a
large number of iterations
Gibbs Sampling
•As opposed to previous methods, at least 2 dimensional
distributions required
•Q defined in terms of conditional distributions of joint
distribution P(x)
•The assumption is P(x) complex to evaluate but
P(xi|{xj}ji) is tractable
Gibbs Sampling on an Example
•Start with x=(x1,x2), fix x2(t) and sample x1 from P(x1|x2)
•Fix x1 and sample x2 from P(x2|x1)
Gibbs Sampling with K Variables
•In comparison to Metropolis, every proposal is
always accepted
•In bigger models, groups of variables jointly
sampled:
Comparison of MC Methods in
High Dimensions
•Importance and rejection sampling result in high
weights and constants respectively, resulting in
inaccurate or length simulations => not practical
•Metropolis requires at least (max/ min)2 samples to
acquire independent samples => might be lengthy
•Gibbs sampling has similar properties with
Metropolis. No adjustable parameters => most
practical
Practical Questions About Monte
Carlo
•Can we predict how long it takes for equilibrium:
Use the simple bound proposed
•Can we determine convergence in a running
simulation: yet another difficult problem
•Can we speed up convergence time and time
between independent samples?
Reducing Random Walk in
Metropolis : Hybrid MC
•Most probabilities can be written in the form:
•Introduce a momentum variable p:
•K is kinetic energy:
•Create asymptotic samples from joint distribution:
•Pick p randomly from:
•Update x and p:
•This results in linear time convergence
An Illustrative Example for Hybrid MC
Hybrid
Hybrid
Random
walk
Random
walk
•Over a number of iterations, hybrid trajectories indicate less
correlated samples
Reducing Random Walk in
Gibbs : Over-relaxation
•Use former value x(t) as well to calculate x(t+1) :
•Suitable to speed up the process when variables are
highly correlated
•Useful for Gaussians, not straightforward for other
types of conditional distributions
An Illustrative Example for
Over-Relaxation
•State space for a bi-variate Gaussian for 40 iterations
•Over-relaxation samples better covers the state space
Reducing Random : Simulated
Annealing
•Introduce a parameter T (temperature) and gradually
reduce it to 1:
•High T corresponds to being able to make transitions
easier
•As opposed to its use in optimization, T is not
reduced to 0 but 1
Applications of Monte Carlo
Differential Equations
Ex: Steady-state temperature distribution of an annulus:
The finite difference approximation using grid dimension h:
•With ¼ probability, one of the states is selected
•The value when a boundary is reached is kept, the mean
of many such runs gives the solution
Applications of Monte Carlo
Integration
To evaluate:
•Bound the function with a box
•Ratio of points under function to all points within
the box gives the ratio of the areas of the
function and the box
Applications of Monte Carlo
Image Processing :Automatic eye-glass removal:
•MCMC used instead of a gradient-based methods to solve
MAP criterion that is used to locate points of eye-glasses
[C. Wu, C. Liu, H.-Y. Shum, Y.-Q. Xu and Z. Zhang, “Automatic Eyeglasses Removal from Face
Images”, IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol.26, No.3, pp. 322-336,
Mar.2004]
Applications of Monte Carlo
Image segmentation:
•Data-driven techniques such as edge detection, tracing
and clustering combined with MCMC to speed up the
search
[Z. Tu and S.-C. Zhu, “Image Segmentation by Data-Driven Markov Chain Monte Carlo”, IEEE
Tran. On Pattern Analysis and Machine Intelligence, Vol.24, No.5, pp. 657-673, May 2002]
Forward Discrete Probability Propagation
Rasit Onur Topaloglu
Ph.D. candidate
[email protected]
The Problem
Lowest Level:
Ex: gm=(2*k*Id)
High level:
•The tree relates physical parameters to circuit parameters
•Structured according to SPICE formula hierarchy
•Given pdf’s for lowest level parameters; find pdf’s at highest
level
Motivation for Probability Propagation
•Estimation of distributions of high level parameters needed to
examine effects of process variations
•Gaussian assumption attributed to these parameters no longer
accurate in current technologies
Find a novel propagation method
GOALS
Determinism : a stochastic output using known formulas
Algebraic tractability : enabling manual applicability
Speed & Accuracy : be comparable or outperform Monte Carlo
and other parametric methods
Parametric Belief Propagation
Calculations
handled
at each node:
•Each node receives and sends messages to parents and
children until equilibrium
•Parent to child () : causal information
•Parent to parent () : diagnostic information
Parametric Belief Propagation
•When arrows in the hierarchy tree indicate linear addition
operations on Gaussians, analytic formulations possible
•Not straightforward for other distributions or nonstandard distributions
Shortcomings of Monte Carlo
•Non-determinism : Not manually applicable
•Limited for certain distributions : Random number generators
in most CAD packages only provide certain distributions
•Accuracy : May miss points that are less likely to occur due to
random sampling unless very large number of samples used;
limited by the performance of random number generator
Monte Carlo – FDPP Comparison
one-to-many relationships and custom pdf’s
•Non-standard pdf’s not
possible without a custom
random number generator
•Monte Carlo overestimates in
one-to-many relationships as
same sample is used
P1
P2
P3
P4
Operations Necessary to Implement
FDPP
Analytic operation on continuous distributions difficult; instead
operations on discrete distributions implemented :
•F (Forward) : Given a function, estimates the distribution of
next node in the formula hierarchy
•Q (Quantize) : Discretize a pdf to operate on its samples
•B (Bandpass) : Decrements number of samples for
computational efficiency
•R (Re-bin) : Reduces number of samples for
computational efficiency
Necessary Operators (Q, F, B, R) on a
Hierarchical Tree
Q
Q
T
NSUB
F
B,R
PHIf
•Repeated until we acquire the high level distribution (ex. G)
Probability Discretization Theory:
QN Operator; p and r domains
pdf(X)
spdf(X) or (X)
p-domain
r-domain
Certain operators easy to apply in r-domain
 ( X )  QN ( pdf ( X ))
•QN band-pass filter pdf(X) and divide into bins
N in QN indicates number or bins
Characterizing an spdf
spdf(X) or (X)
•can write spdf(X) as :
 p  (x  w )
(X ) 
i1.. N
i
where :
m  i
r-domain
pi 
 pdf ( X )dx
m  ( i 1) 
pi : probability for i’th impulse

wi  m  (i  1)
2
wi : value of i’th impulse
i
F Operator
•F operator implements a function over spdf’s
spdf(X) or (X)
 (Y )  F ( ( X1 ),..,  ( X r ))
Xi, Y : random variables
•Function applied to individual impulses
•Individual probabilities multiplied
 (Y ) 
p
s1 ,.., sr
X1
s1
.. p  ( y  f (w ,.., w ))
Xr
sr
X1
s1
X1
s1
pXs : Set of all samples s belonging to X
Band-pass, Be, Operator
 ' ( X )  Be ( ( X ))
Margin-based Definition:
•Eliminate samples having values out of range
(X ) 
 p  ( x  w )
i
i:( wi [ m , n ])  ( wi  ( X ))
i
Error-based Definition:
•Eliminate samples having probabilities least likely to occur
 p  (x  w )
(X ) 
i
i:( pi 
max i ( pi )
)  ( pi  ( X ))
e
i
Re-bin, RN, Operator
Impulses after F
 ' ( X )  RN ( ( X ))
Unite into one  bin
Resulting spdf(X)
•Samples falling into the same bin congregated in one
 ( X )   p i  ( x  wi ) where : pi   p j
i
sj
st. w j  bi
Error Analysis
•If quantizer uniform and  small, quantization error
random variable Q is uniformly distributed
Variance of quantization error:
  E[Q ]  
2
Q
2
/2
 / 2
2
q pdf (Q) dq
2

12
Distortion caused by representing samples in a bin by a single
sample:
d (m , w )  (m  w ) 2
i
j
i
j
mi : center or i’th bin
Total distortion:
 d (m , w ) p
i , j:i ( jbi )
i
j
j
Algorithm Implementing the F Operator
While each random variable has its spdf computed
For each rv. which has all ancestor spdf’s computed
For each sample in X1
For each sample in Xr
Place an impulse with height p1,..,pr at x=f(v1,..,vr)
Apply B and R algorithms to this rv.
Algorithm for the B and R Operators
Find maximum and minimum values wi within impulses
Divide this range into M bins
For each bin
Place a quantizing impulse at the center of the bin with a
height pi equal to the sum of all impulses within bin
Find maximum probability, pi-max, of quantized impulses within bins
Eliminate impulses within bins which have a quantized impulse with
smaller probability than error-rate*pi-max
Find new maximum and minimum values wi within impulses
Divide this range into N bins
For each bin
Place an impulse at the center of the bin with height equal to
sum of all impulses within bin
Monte Carlo – FDPP Comparison
Pdf of Vth
Pdf of ID
solid : FDPP
dotted : Monte Carlo
•A close match is observed after interpolation
Monte Carlo – FDPP Comparison with
a Low Sample Number
Pdf of F
Pdf of F
solid : FDPP,100 samples
noisy : Monte Carlo, 1000 and 100000 samples respectively
•Monte Carlo inaccurate for moderate number of samples
•Indicates FDPP can be manually applied without major
accuracy degradation
Monte Carlo – FDPP Comparison
Benchmark example
Pdf of n7
solid : FDPP
dotted : Monte Carlo
triangles:belief propagation
•Edges define a linear sum, ex: n5=n2+n3
Faulty Application of Monte Carlo
Benchmark example
Pdf of n7
solid : FDPP
dotted : Monte Carlo
triangles:belief propagation
•Distributions at internal nodes n4, n5, n6 should be
re-sampled using Monte Carlo
•Not optimal for internal nodes with non-standard
distributions
Conclusions
•Forward Discrete Probability Propagation is introduced as
an alternative to Monte Carlo based methods
•FDPP should be preferred when low probability samples
are important, algebraic intuition needed, non-standard pdf’s
are present or one-to-many relationships are present