The Monte Carlo method
Download
Report
Transcript The Monte Carlo method
The Monte Carlo method
1
(1,1)
(-1,1)
(0,0)
1
Z=
0
1
(-1,-1)
If X2+Y21
o/w
(1,-1)
• (X,Y) is a point chosen uniformly at random
in a 22 square centered at the origin (0,0).
• P(Z=1)=/4.
2
Assume we run this experiment m times, with
Zi being the value of Z at the ith run.
m
.
If W=iZi, then E[W ] E[ Z i ] E[ Z i ]
4
i 1
i 1
m
m
W
W’=4W/m is an estimate of .
m 4
3
By Chernoff bound,
m m
Pr[| W ' | ] Pr[| W
|
]
4
4
Pr[| W E[W ] | E[W ]]
2e
m 2 / 12
.
Def: A randomized algorithm gives an (,)approximation for the value V if the output X
of the algorithm satisfies Pr[|X-V|V] 1-.
4
The above method for estimating gives an
(,)-approximation, as long as <1 and m
large enough.
12 ln (2 / )
m / 12
2e
m
.
2
2
5
Thm1: Let X1,…,Xm be independent and
identically distributed indicator random
variables, with =E[Xi]. If m (3ln(2/))/2,
m
1
then Pr[|
X | ] .
m
i 1
i
I.e. m samples provide an (,)-approximation
for .
Pf: Exercise!
6
Def: FPRAS: fully polynomial randomized
approximation scheme.
A FPRAS for a problem is a randomized
algorithm for which, given an input x and any
parameters and with 0<,<1, the
algorithm outputs an (,)-approximation to
V(x) in time poly(1/,ln-1,|x|).
7
Def: DNF counting problem: counting the
number of satisfying assignments of a
Boolean formula in disjunctive normal form
(DNF).
Def: a DNF formula is a disjunction of clauses
C1C2… Ct, where each clause is a
conjunction of literals.
Eg.
(x1x2x3)(x2x4)(x1x3x4)
8
Counting the number of satisfying
assignments of a DNF formula is actually #Pcomplete.
Counting the number of Hamiltonian cycles in
a graph and counting the number of perfect
matching in a bipartite graph are examples of
#P-complete problems.
9
A naïve algorithm for DNF counting problem:
Input: A DNF formula F with n variables.
Output: Y = an approximation of c(F)
1. X0.
The number of satisfying
Assigments of F.
2. For k=1 to m, do:
(a) Generate a random assignment for the
n variables, chosen uniformly at random.
(b) If the random assignment satisfies F,
then X X+1.
3. Return Y (X/m)2n.
10
Analysis
1
Xk=
If the k-th iteration in the algorithm
generated a satisfying assignment;
o/w.
0
Pr[Xk=1]=c(F)/2n.
m
Let X= X k and then E[X]=mc(F)/2n.
k 1
E[ X ]2n
E[Y ]
c( F ).
m
11
Analysis
By Theorem 1, X/m gives an (,)approximation of c(F)/2n, and hence Y gives
an (,)-approximation of c(F), when m
32nln(2/))/2c(F).
If c(F) 2n/poly(n), then this is not too bad,
m is a poly.
But, if c(F)=poly(n), then m=O(2n/c(F))!
12
Analysis
Note that if Ci has li literals then there are
exactly 2n-li satisfying assignments for Ci.
Let SCi denote the set of assignments that
satisfy clause i.
U={(i,a):
1itt and aSCi}.
t
nl
|
SC
|
2
|U|= i
t
i 1
i 1
Want to estimate c( F ) | SCi | .
i 1
Define S={(i,a): 1it and aSCi, aSCj for j<i}.
i
13
DNF counting algorithm II:
1.
2.
3.
Input: A DNF formula F with n variables.
Output: Y: an approximation of c(F).
X0.
For k=1 to m do:
t
(a) With probability | SCi | | SCi | choose,
i 1
uniformly at random an assignment aSCi.
(b) If a is not in any
SCj, j<i, then XX+1.
t
Return Y ( X / m) | SCi |.
i 1
14
DNF counting algorithm II:
Note that |U|t|S|. Why? t
Let Pr[i is chosen]=| SCi | | SCi | | SCi | | U |
i 1
Then Pr[(i,a) is chosen]
=Pr[i is chosen]Pr[a is chosen|i is chosen]
| SCi | 1
1
.
=
| U | | SCi |
|U |
15
DNF counting algorithm II:
Thm: DNF counting algorithm II is an FPRAS for
DNF counting problem when m=(3t/2)ln(2/).
Pf: Step 2(a) chooses an element of U
uniformly at random.
The probability that this element belongs to S is
at least 1/t.
Fix any ,>0, and let m= (3t/2)ln(2/).
poly(t,1/,ln(1/))
16
DNF counting algorithm II:
The processing time of each sample is poly(t).
By Thm1, with m samples, X/m gives an (,)approximation of c(F)/|U| and hence Y gives an
(,)-approximation of c(F).
17
Counting with Approximate Sampling
Def: w: the output of a sampling algorithm for
a finite sample space .
The sampling algorithm generates an -uniform
sample of if, for any subset S of , |Pr[wS]|S|/|||.
18
Def: A sampling algorithm is a fully polynomial
almost uniform sampler (FPAUS) for a problem
if, given an input x and >0, it generates an uniform sample of (x) in time poly(|x|,ln(1/)).
Consider an FPAUS for independent sets would
take as input a graph G=(V,E) and a parameter
.
The sample space: the set of all independent
sets in G.
19
Goal: Given an FPAUS for independent sets, we
construct an FPRAS for counting the number of
independent sets.
Assume G has m edges, and let e1,…,em be an
arbitrary ordering of the edges.
Ei: the set of the first i edges in E and let
Gi=(V,Ei).
(Gi): denote the set of independent sets in Gi.
20
| (Gm ) | | (Gm 1 ) |
| (G1 ) |
| (G ) |
| (G0 ) | .
| (Gm 1 ) | | (Gm 2 ) |
| (G0 ) |
|(G0)|=2n. Why?
To estimate |(G)|, we need good estimates
for r | (Gi ) | , i 1,.., m.
i
| (Gi 1 ) |
21
~
Let ri be estimate for ri, then
~
ri .
i 1
To evaluate
the error, we need to bound the
m ~
ratio R ri .
i 1
| (G ) |~ 2
m
n
ri
To have an (,)-approximation, we want
Pr[|R-1|]1-.
22
~
Lemma: Suppose that for all i, 1im, ri is an
(/2m,/m)-approximation for ri.
Then Pr[|R-1|]1-.
Pf:
For each 1im, we have
~
Pr[| ri ri |
2m
ri ] 1
m
.
23
~
ri ] .
Equivalently, Pr[| ri ri |
2m
m
~
Pr[ i, | ri ri |
ri ] .
2m
~
Pr[i, | ri ri |
ri ] 1 .
2~m
ri
Pr[i,1
1
] 1 .
2m ri
2m
ri
m m ~
m
Pr[(1
) (1
) ] 1 .
2m
2m
i 1 ri
Pr[1 R 1 ] 1 .
24
1.
Estimating ri:
Input: Graph Gi-1=(V,Ei-1) and Gi=(V,Ei)
~
Output: ri = an approximation of ri.
X0
2.
Repeat for M=(1296m2/2)ln(2m/) independent trials:
3.
(a) Generate an (/6m)-uniform sample from (Gi-1).
(b) If the sample is an independent set in Gi, then XX+1.
Return ~
ri X/M.
25
Lemma: When m1 and 0<1, the procedure
for estimating ri yields an (/2m,/m)approximation for ri.
Pf:
Suppose Gi-1 and Gi differ in that edge {u,v} is
in Gi but not in Gi-1.
(Gi) (Gi-1).
An independent set in (Gi-1)\(Gi) contains
both u and v.
26
Associate each I(Gi-1)\(Gi) with an
independent set I\{v}(Gi).
Note that I’(Gi) is associated with no more
than one independent set I’{v}(Gi-1)\(Gi),
thus |(Gi-1)\(Gi)| |(Gi)|.
It follows that
| (Gi ) |
| (Gi ) |
1
ri
.
| (Gi 1 ) | | (Gi ) | | (Gi 1 ) \ (Gi ) | 2
27
Let Xk=1 if the k-th sample is in (Gi) and 0 o/w.
Because our samples are generated by an
(/6m)-uniform sampler, by definition,
| (Gi ) |
Pr[ X k 1]
| (Gi 1 ) |
6m
.
| (Gi ) |
E[ X k ]
.
| (Gi 1 ) | 6m
28
By linearity of expectations,
M
X
E[
k 1
M
k
]
| (Gi ) |
.
| (Gi 1 ) |
6m
M
X
ri ] ri | E[ k 1
| E[~
M
k
]
| (Gi ) |
.
| (Gi 1 ) |
6m
Since ri1/2, we have
1
1
E[ ~
ri ] ri
.
6m
2 6m 3
29
If M3ln(2m/)/(/12m)2(1/3)=1296m2-2ln(2m/),
then
~
ri
~
~
~
Pr ~ 1
E[ ri ] .
Pr ri E[ ri ]
12m
12m
m
E[ ri ]
Equivalently, with probability 1- /m,
~
ri
1
1
.
~
12m E[ ri ]
12m
-----(1)
30
~
|
E
[
r
]
r
|
, we have
i
i
As
6m
E[ ~
ri ]
1
1
.
6mri
ri
6mri
Using, ri1/2, then
E[ ~
ri ]
1
1
. -----(2)
3m
ri
3m
31
Combining (1) and (2), with probability 1-/m,
~
ri
1
(1
)(1
) (1
)(1
) 1
.
2m
3m
12m
ri
3m
12m
2m
This gives the desired (/2m,/m)-approximation.
Thm: Given FPAUS for independent sets in any
graph, we can construct an FPRAS for the
number of independent sets in a graph G.
32
The Markov Chain Monte Carlo Method
The Markov Chain Monte Carlo Method provides
a very general approach to sampling from a
desired probability distribution.
Basic idea: Define an ergodic Markov chain
whose set of states is the sample space and
whose stationary distribution is the required
sampling distribution.
33
Lemma: For a finite state space and
neighborhood structure {N(x)|x}, let
N=maxx|N(x)|.
Let M be any number such that MN. Consider a
Markov chain where
1/M
if xy and yN(x)
Px,y=
0
if xy and yN(x)
1-N(x)/M if x=y.
If this chain is irreducible and aperiodic, then the
stationary distribution is the uniform distribution.
34
Pf:
For xy, if x=y, then xPx,y=yPy,x, since
Px,y=Py,x=1/M.
It follows that the uniform distribution x=1/||
is the stationary distribution by the following
theorem.
35
Thm:
P: transition matrix of a finite irreducible and
ergodic Markov chain. If there are nonnegative
n
numbers =(0,..,n) such that i 1
i 0
and if, for any pair of states i,j, iPi,j=jPj,i, then
is the stationary distribution corresponding to
P. n
n
i Pi , j j Pj ,i j . P .
Pf:
n
i 0
i 0
Since i 1, it follows that is the unique
i 0
stationary distribution of the Markov Chain.
36
1.
2.
Eg. Markov chain with states from independent sets
in G=(V,E).
X0 is an arbitrary independent set in G.
To compute Xi+1:
(a) choose a vertex v uniformly at random from V;
(b) if vXi then Xi+1=Xi\{v};
(c) if vXi and if adding v to Xi still gives an
independent set, then Xi+1=Xi{v};
(d) o/w, Xi+1=Xi.
37
The neighbors of a state Xi are independent sets
that differ from Xi in just one vertex.
Since every state is reachable from the empty
set, the chain is irreducible.
Assume G has at least one edge (u,v), then the
state {v} has a self-loop (Pv,v>0), thus aperiodic.
When xy, it follows that Px,y=1/|V| or 0, by the
previous lemma, the stationary distribution is the
uniform distribution.
38
The Metropolis Algorithm
(When stationary distribution is nonuniform)
Lemma: For a finite state space and neighborhood
structure {N(x)|x}, let N=maxx|N(x)|.
Let M be and number such that MN. For all x, let
x>0 be the desired probability of state x in the
stationary distribution. Consider a Markov chain
Px,y=
(1/M)min(1,y/x)
0
1-yxPx,y
if xy and yN(x)
if xy and yN(x)
if x=y.
Then, if this chain is irreducible and aperiodic, then
the stationary distribution is given by x.
39
Pf:
For any xy, if xy, then Px,y=1/M and
Py,x=(1/M)(x/y).
It follows that Px,y=1/M=(y/x)Py,x.
xPx,y=yPy,x.
Similarly, for x>y.
Again, by the previous theorem, x’s form the
stationary distribution.
40
Eg. Create a Markov chain, in the stationary
distribution, each independent set I has
probability proportional to |I|, for some >0.
I.e. x=|Ix|/B, where Ix is the independent set
corresponding to state x and B=x|Ix|.
Note that, when =1, this is the uniform
distribution.
41
1.
2.
X0 is an arbitrary independent set in G.
To compute Xi+1:
(a) choose v uniformly at random from V;
(b) if vXi then Xi+1=Xi\{v} with probability min(1,1/);
(c) if vXi and Xi{v} still gives an independent set,
then put Xi+1=Xi{v} with probability min(1,);
(d) o/w, set Xi+1=Xi.
42