No Slide Title

Transcript No Slide Title

Universita’ dell’Insubria, Como, Italy
Introduction to
Quantum Monte Carlo
ˆ
H  E
Dario Bressanini
http://scienze-como.uninsubria.it/bressanini
UNAM, Mexico City, 2007
Why do simulations?
• Simulations are a general method for “solving”
many-body problems. Other methods usually
involve approximations.
• Experiment is limited and expensive.
Simulations can complement the experiment.
• Simulations are easy even for complex systems.
• They scale up with the computer power.
2
Buffon needle experiment, AD 1777
L
d
2L
p
d
3
Simulations
• “The general theory of quantum mechanics is now
almost complete. The underlying physical laws
necessary for the mathematical theory of a large
part of physics and the whole of chemistry are thus
completely known, and the difficulty is only that
the exact application of these laws leads to
equations much too complicated to be soluble.”
Dirac, 1929
4
General strategy

How to solve a deterministic problem using a
Monte Carlo method?

Rephrase the problem using a probability
distribution
A   P(R) f (R)dR

R 
N
“Measure” A by sampling the probability distribution
1
A 
N
N
 f (R )
i 1
i
R i ~ P(R )
5
Monte Carlo Methods

The points Ri are generated using random
numbers
This is why the methods are
called Monte Carlo methods

Metropolis, Ulam, Fermi, Von Neumann (-1945)

We introduce noise into the problem!!

Our results have error bars...

... Nevertheless it might be a good way to
proceed
6
Stanislaw Ulam (1909-1984)
S. Ulam is credited as
the inventor of Monte
Carlo method in
1940s, which solves
mathematical
problems using
statistical sampling.
7
Why Monte Carlo?
• We can approximate the numerical value of a
definite integral by the definition:

b
a
L
f ( x )dx   f ( xi )x
i 1
• where we use L points xi uniformly spaced.
8
Error in Quadrature
• Consider an integral in D dimensions:

f (R)dx1dx2 dxD   f (R)x
D
V
• N= LD uniformly spaced points,  to CPU time
• The error with N sampling points is
 f (R)dR   f (R)x
D
N
1 / D
9
Monte Carlo Estimates of Integrals
• If we sample the points not on regular grids,
but randomly (uniformly distributed), then
V
V f(X )dX  N
N
f(Xi )

i
1
Where we assume the integration domain is a
regular box of V=LD.
10
Monte Carlo Error
• From probability theory one can show that the
Monte Carlo error decreases with sample size
N as

1
N
• Independent of dimension D (good).
• To get another decimal place takes 100 times
longer! (bad)
11
MC is advantageous for large dimensions
•Error by simple quadrature  N-1/D
•Using smarter quadrature  N-A/D
•Error by Monte Carlo always  N-1/2
•Monte Carlo is always more efficient for
large D (usually D > 4 - 6)
12
Monte Carlo Estimates of π

2
(1,0)
1
  1  x dx
2
1
We can estimate π
using Monte Carlo
13
Monte Carlo Integration
•Note that

Can automatically estimate the error by
computing the standard deviation of
the sampled function values

All points generated are independent

All points generated are used
14
Inefficient?
• If the function is
1
strongly peaked,
the process is
inefficient
0.8
0.6
0.4
• We should generate
0.2
0
-3
b

a
-2
-1
0
1
f ( x )dx  (b  a )
N
1
2
3
N
 f (x )
i
i 1
more points where
the function is large
• Use a non-uniform
distribution!
15
General Monte Carlo
• If the samples are not drawn uniformly but
with some probability distribution p(R), we
can compute by Monte Carlo:

1
f (R ) p(R )dR 
N
N
 f (R )
i 1
Where p(R) is normalized,
i
R i ~ p( R )
 p(R )dR  1
16
Monte Carlo
• so
I 
f (R )
f (R )dR   p(R )
dR
p( R )
f (R )
I
 1
N
p( R )
f (Ri )
i p(R )
i
Convergence guaranteed by the Central Limit Theorem
•The statistical error0 if p(R)  f(R), convergence is faster
17
Warning!
• Beware of Monte Carlo integration routines in
libraries: they usually cannot assume anything
about your functions since they must be
general.
• Can be quite inefficients
• Also beware of standard compiler supplied
Random Number Generators (they are known
to be bad!!)
18
Equation of state of a fluid
The problem: compute
the equation of state (p
as function of particle
density N/V ) of a fluid
in a box given some
interaction potential
between the particles
Assume for every position of particles we can compute
the potential energy V(R)
19
The Statistical Mechanics Problem

For equilibrium properties we can just compute
the Boltzmann multi-dimensional integrals
A 
A
(
R
)
e

e



E (R)
k BT
E (R)
k BT
dR
dR
Where the energy usually is a sum
E (R )  V (d ij )
i j
20
An inefficient recipe

For 100 particles (not really the thermodynamic
limit), integrals are in 300 dimensions.

The naïve MC procedure would be to uniformly
distribute the particles in the box, throwing
them randomly.

If the density is high, throwing particles at
random will put them some of them too close to
each other.

almost all such generated points will give
negligible contribution, due to the boltzmann
factor
21
An inefficient recipe
A 
A
(
R
)
e

e


E (R)
k BT
E (R)
k BT
dR
dR

E(R) becomes very large and positive

We should try to generate more points where
E(R) is close to the minima
22
The Metropolis Algorithm

How do we do it?

Use the Metropolis algorithm (M(RT)2 1953) ...
... and a powerful computer

The algorithm is a random
walk (markov chain) in
configuration space. Points
are not independent
Anyone who consider
arithmetical methods of
producing random digits
is, of course, in a state of sin.
John Von Neumann
23
24
25
Importance Sampling

The idea is to use Importance Sampling, that is
sampling more where the function is large

“…, instead of choosing configurations
randomly, …, we choose configuration with a
probability exp(-E/kBT) and weight them
evenly.”
- from M(RT)2 paper
26
The key Ideas

Points are no longer independent!

We consider a point (a Walker) that moves in
configuration space according to some rules
27
A Markov Chain

A Markov chain is a random walk through
configuration space:
R1R2  R3  R4 …

Given Rn there is a transition probability to go to
the next point Rn+1 : p(RnRn+1) stochastic matrix

In a Markov chain, the distribution of Rn+1
depends only on Rn. There is no memory

We must use an ergodic markov chain
28
The key Ideas

Choose an appropriate p(RnRn+1) so that at
equilibrium we sample a distribution π(R) (for
this problem is just π = exp(-E/kBT) )

A sufficient condition is to apply detailed balance.

Consider an infinite number of walkers, and
two positions R, and R’

At equilibrium, the #of walkers that go from
RR’ is equal to the #of walkers R’R
p(RR’) ≠ p(R’R)
29
The Detailed Balance
 (R) p(R  R)   (R) p(R  R)

π(R) is the distribution we want to sample

We have the freedom to choose p(RR’)
p(R  R)  (R)

p( R  R )  ( R )
30
Rejecting points


The third key idea is to use rejection to enforce
detailed balance
p(RR’) is split into a Transition step and an
Acceptance/Rejection step
p(R  R)  T (R  R) A(R  R)

T(RR’) generate the next “candidate” point

A(RR’) will decide to accept or reject this point
31
The Acceptance probability
T (R  R) A(R  R)  (R)

T (R  R ) A(R  R )  (R )

Given some T, a possible choice for A is
  (R)T (R  R ) 
A(R  R)  min 1,



(
R
)
T
(
R

R
)



For symmetric T
  (R) 
A(R  R)  min 1,


(
R
)


32
What it does
  (R) 
A(R  R)  min 1,

  (R ) 
 Suppose π(R’) ≥ π(R)


move is always accepted
Suppose π(R’) < π(R)

move is accepted with probability π(R’)/π(R)

Flip a coin

The algorithm samples regions of large π(R)

Convergence is guaranteed but the rate is not!!
33
IMPORTANT!

Accepted and rejected states count the same!

When a point is rejected, you add the previous
one to the averages

Measure acceptance ratio. Set to roughly 1/2 by
varying the “step size”

Exact: no time step error, no ergodic problems in
principle (but no dynamics).
34
Quantum Mechanics

We wish to solve H = E to high accuracy


The solution usually involves computing integrals in
high dimensions: 3-30000
The “classic” approach (from 1929):

Find approximate  ( ... but good ...)

... whose integrals are analitically computable
(gaussians)

Compute the approximate energy
chemical accuracy ~ 0.001 hartree ~ 0.027 eV
35
VMC: Variational Monte Carlo

Start from the Variational Principle
H

(R ) H(R )dR


E
  (R)dR
0
2
Translate it into Monte Carlo language
H   P(R ) EL (R )dR
H (R )
EL (R ) 
 (R )
P(R ) 
 (R )
2
2

 (R)dR
36
VMC: Variational Monte Carlo
E  H   P(R ) EL (R )dR

E is a statistical average of the local energy over P(R)
1
E H 
N

N
E
i 1
L
(R i )
R i ~ P(R )
Recipe:
 take an appropriate trial wave function
 distribute N points according to P(R)
 compute the average of the local energy
37
Error bars estimation

In Monte Carlo it is easy to estimate the statistical
error
A   P(R) f (R)dR

if
1
A 
N

R  N
N
 f (R )
i 1
R i ~ P(R )
i
The associated statistical error is
1
2
2
2
 ( A)  A  A 
N
 ( A)
N
  f (R ) 
N
i 1
i
A

2
38
The Metropolis Algorithm
move
Ri
Call the Oracle
Rtry
reject
accept
Ri+1=Ri
Ri+1=Rtry
Compute
averages
39
The Metropolis Algorithm
The Oracle
  (new) 

p  
  (old ) 
2
if p ≥ 1
/* accept always */
accept move
If 0 ≤ p < 1 /* accept with probability p */
if p > rnd()
accept move
else
reject move
40
VMC: Variational Monte Carlo



No need to analytically compute integrals:
complete freedom in the choice of the trial wave
function.
Can use explicitly
correlated wave functions
Can satisfy the cusp
conditions
r12
r1
r2
He atom
e
a r1 b r2  c r12
41
VMC advantages

Can compute lower bounds
H  E0  H  
  H
2

2
 H
2
Can go beyond the Born-Oppenheimer
approximation, with any potential, in any
number of dimensions.
Ps2 molecule (e+e+e-e-) in 2D and 3D
M+m+M-m- as a function of M/m
42
Properties of the Local energy
H(R )
1 2  ( R )
EL ( R ) 

 V (R )
( R )
2 ( R )

For an exact eigenstate EL is a constant

At particles coalescence the divergence of V must
be cancelled by the divergence of the kinetic term

For an approximate trial function, EL is not
constant
43
Reducing Errors
1
E H 
N
 ( EL )  H
2
2
 ( EL )
N
 E (R )
i 1
L
 H
i
2
1

N
N
 E (R ) 
N
i 1
L
i
H

2

For a trial function, if EL can diverge, the
statistical error will be large

To eliminate the divergence we impose the Kato’s
cusp conditions
44
Kato’s cusps conditions on 

We can include the correct analytical structure
electron – electron cusps: ( r12  0)  1  r12
2
electron – nucleus cusps:
(r  0)  1  Zr
45
Optimization of 

Suppose we have variational parameters in the
trial wave function that we want to optimize

The straigthforward optimization of E is
numerically unstable, because EL can diverge
 ( R; c )
1
ET (c) 
N
N
 E (R , c)
i 1
L
i

For a finite N can be unbound

Also, our energies have error bars. Can be
difficult to compare
46
Optimization of 

 (H )  H  H
2
It is better to optimize
2
2
 (H )  0
2

It is a measure of the quality of the trial function
1
2
 ( H (c)) 
N
 E (R , c) 
N
i 1
L
i
H (c )

2
0

Even for finite N is numerically stable.

The lowest  will not have the lowest E but it is
usually close
47
Optimization of 

Meaning of optimization of 
H  E0  H  

Trying to reduce the
distance between upper
and lower bound
For which potential V’ is T an eigenfunction?
1  T

 V   ET
2 T
2

We want V’ to be “close” to the real V
2
2

min   (V  V ) dR  min  ( H )
2
T
48
VMC drawbacks

Error bar goes down as N-1/2

It is computationally demanding

The optimization of  becomes difficult as the
number of nonlinear parameters increases

It depends critically on our skill to invent a good


There exist exact, automatic ways to get better
wave functions. Let the computer do the work ...
To be continued...
49
In the last episode:
VMC
Today: DMC
First Major VMC Calculation

W. McMillan Thesis in 1964

VMC calculation of ground state of liquid helium 4.

Applied MC techniques from classical liquid theory.
51
VMC advantages and drawbacks

Simple, easy to implement

Intrinsic error bars

Usually obtains 60-90% of correlation energy

Error bar goes down as N-1/2

It is computationally demanding

The optimization of  becomes difficult as the
number of nonlinear parameters increases

It depends critically on our skill to invent a good 
52
Diffusion Monte Carlo

VMC is a “classical” simulation method
Nature is not classical, dammit, and if you
want to make a simulation of nature, you'd
better make it quantum mechanical, and by
golly it's a wonderful problem, because it
doesn't look so easy.
Richard P. Feynman

Suggested by Fermi in 1945, but implemented
only in the 70’s
53
Diffusion equation analogy
 The time dependent
Schrödinger equation
is similar to a diffusion
equation
 The diffusion
equation can be
“solved” by directly
simulating the system

2 2
i

   V
t
2m
C
2
 D C  kC
t
Time
evolution
Diffusion
Branch
Can we simulate the Schrödinger equation?
54
Imaginary Time Sch. Equation

The analogy is only formal

 is a complex quantity, while C is real and positive
 (R, t )  e iEnt /   n (R)

If we let the time t be imaginary, then  can be real!

 D 2   V

Imaginary time Schrödinger equation
55
 as a concentration

 is interpreted as a concentration of fictitious
particles, called walkers

The schrödinger equation
is simulated by a process
of diffusion, growth and
disappearance of walkers

2
 D   V

 (R, )   ai  i (R)e  ( Ei  ER )
i
(R,  )   0 (R)e ( E0  ER )
Ground State
56
Diffusion Monte Carlo
SIMULATION: discretize time
•Diffusion process

 D 2 

(R,  )  e
 ( R  R 0 ) 2 / 4 D
•Kinetic process (branching)

 (V (R )  ER )

(R,  )  e
 (V ( R )  ER ) 
(R,0)
57
First QMC calculation in chemistry
77 lines of Fortran code!
58
Formal development
 ˆ
itHˆ
i
 H  (t )  e
(0)
t

Formally, in imaginary time

(t )  e
Hˆ
(0)
In coordinate representation
R   (t )  R  e
Hˆ
  R e
(0)
Hˆ
R R (0) dR
59
Schrödinger Equation in integral form

Monte Carlo is good at integrals...
(R' , )   G(R  R' , )(R,0)dR
G ( R  R ' , )  R ' e

Hˆ
R
We interpret G as a probability to move from R to
R’ in an time step . We iterate this equation
60
Iteration of Schrödinger Equation

We can iterate this equation
(R' ' ,2 )   G(R'  R' ' , )G(R  R' , )(R,0)dRdR'
61
Zassenhaus formula

In general we do not have the exact G
e
Hˆ
e
e

 ( T V )
Hˆ
T V  2 [T ,V ] / 2
e e
T V
e e
e

 O( )
2
We must use a small time step , but at the
same time we must let  
62
Trotter theorem

A and B do not commute, use Trotter Theorem
e
A B
 lim e
n

A/ n B / n n
e

Figure out what each operator does
independently and then alternate their effect.
This is rigorous in the limit as n

In DMC A is diffusion operator, B is a branching
operator
63
Short Time approximation
(R' , )   G(R  R' , )(R,0)dR
G(R  R' , )  e
V ( R ')  ( R R ')2 / 2
e

Diffusion + branching

At equilibrium the algorithm will sample f0

The energy can be estimated as
E0 
 f0 HT dR
 f  dR
0
T

N
 H (R )
T
i 1
N
i
  (R )
i 1
T
i
64
The DMC algorithm
65
A picture for H2+
66
Short Time approximation
67
Importance sampling
G(R  R' , )  e
V ( R ')  ( R R ')2 / 2
e

V can diverge, so branching can be inefficient

We can transform the Schrödinger equation, by
multiplying by T
f (R, )
1 2

   f  ( f ln T (R ))  EL (R ) f (R, )

2
f (R, )  T (R)(R, )
68
Importance sampling
f (R, )
1 2

   f  ( f ln T (R ))  EL (R ) f (R, )

2

Similar to a Fokker-Plank equation

Simulated by diffusion+drift+branching

To the pure diffusion algorithm we added a drift
step that pushes the random walk in directions of
increasing trial function
R'  R   ln T (R)
69
Importance sampling
f (R, )
1 2

   f  ( f ln T (R ))  EL (R ) f (R, )

2

The branching term now is

Fluctuations are controlled

At equilibrium it samples:
e
E L ( R )
f (R,  )  T (R)f0 (R)
70
DMC Algorithm
•
Initialize a population of walkers {Ri}
•
For each walker
R'  R      ln T (R )
Diffusion
Drift
R’
R
71
DMC Algorithm
•
Compute branching
we
 ( EL ( R ') Eref )
•
Duplicate R’ to M copies: M = int( ξ + w )
•
Compute statistics
•
Adjust Eref to make average population
constant.
•
Iterate….
72
Good for Helium studies

Thousands of theoretical and experimental papers
Hˆ n (R)  En n (R)
have been published on Helium, in its various forms:
Atom
Small Clusters
Droplets
Bulk
73
3He 4He
m
n
3He
4He
n
0 1 2 3 4
m
Stability Chart
5 6
7 8
9 10 11
0
Bound L=0
1
Unbound
2
3
Unknown
4
L=1 S=1/2
5
L=1 S=1
Terra Incognita
Bound
32
3He 4He
2
2
L=0 S=0
3He 4He
2
4
L=1 S=1
3He 4He
3
8
3He 4He
3
4
L=0 S=1/2
L=1 S=1/2
74
Good for vibrational problems
75
For electronic structure?
76
The Fermion Problem

Wave functions for fermions have nodes.

Diffusion equation analogy is lost. Need to introduce
positive and negative walkers.
The (In)famous Sign Problem


If we knew the exact nodes of , we could exactly simulate
the system by QMC methods, restricting random walk to a
positive region bounded by nodes.
Unfortunately, the exact nodes
are unknown. Use approximate
nodes from a trial . Kill the
walkers if they cross a node.
+
-
77
Common misconception on nodes
• Nodes are not fixed by antisymmetry
alone, only a 3N-3 sub-dimensional subset
78
Common misconception on nodes
• They have (almost) nothing to do
with Orbital Nodes.

It is (sometimes) possible to use nodeless
orbitals
79
Common misconceptions on nodes
• A common misconception is that on a
node, two like-electrons are always close.
This is not true
0
0
1
2
2
0
1
80
Common misconceptions on nodes
•
Nodal theorem is NOT VALID in N-Dimensions


Higher energy states does not mean more nodes (Courant and
Hilbert )
It is only an upper bound
81
Common misconceptions on nodes
• Not even for the same symmetry species
3
2.5
2
1.5
1
0.5
Courant counterexample
0
0
0.5
1
1.5
2
2.5
3
82
Tiling Theorem (Ceperley)
Impossible for
ground state
Nodal domains must have the same shape
The Tiling Theorem does not say how
many nodal domains we should expect!
83
Nodes are relevant
• Levinson Theorem:

the number of nodes of the zero-energy
scattering wave function gives the number of
bound states
• Fractional quantum Hall effect
• Quantum Chaos (billiards)
Integrable system
Chaotic system
84
The Fixed Node approximation

Since in general we do not know the exact nodes,
we resort to approximate nodes

We use the nodes of some trial function

The energy is an upper bound to E0

The energy depends only on the nodes, the rest
of T affects the statistical error

Usually very good results! Even poor T usually
have good nodes
87
Trial Wave functions

For small systems (N<7)


Specialized forms (linear expansions, hylleraas, ...)
For larger systems (up to ~ 200)

Slater-Jastrow Form

 J
   ci Di e
 i


A sum of Slater Determinants

Jastrow factor: a polynomial parametrized in
interparticle distances
88
A little intermezzo
Be atom nodal structure
Be Nodal Structure
r1+r2
r1+r2
r3-r4
r3-r4
r1-r2
HF  0
r1-r2
  1s 2 2 s 2  c 1s 2 2 p 2
CI  0
94
Be nodal structure
Node is

Now there are only two
nodal domains

It can be proved that the
exact Be wave function has
exactly two regions
( r1  r2 )( r3  r4 )  c( r132  r142  r232  r242 )  ...  0
See Bressanini, Ceperley and Reynolds
http://scienze-como.uninsubria.it/bressanini/
95
Be nodal structure

A physicist proof...(David Ceperley)

4 electrons: 1 and 2 spin up, 3 and 4 spin down

Tiling Theorem applies. There are at most 4 nodal
domains
+
Pˆ12
R
Pˆ12 Pˆ34
+
Pˆ34
96
Be nodal structure

We need to find a point R and a path R(t) that
connects R to P12P34R so that (R(t)) ≠ 0

Consider the point R = (r1,-r1,r3,-r3)

 is invariant w.r.t. rotations


Path: Rotating by  along r1x
r3 ,  is constant
But (R) ≠ 0:

exact= HF + higher terms
HF(R) = 0
higher terms ≠ 0
r3
r2
r1
r4
97
An example

High precision total energy calculations of molecules

An example: what is the most stable fullerene?
C24

QMC could make
consistent predictions
of the lowest structure

Other methods are not
capable of making
consistent predictions
about the stability of
fullerenes
99
DMC advantages and drawbacks

Correlation between particles is automatically taken
into account.

Exact for boson systems

Fixed node for electrons obtains 85-95% of correlation
energy. Very good results in many different fields

Works for T=0. For T > 0 must use Path Integral MC

Not a “black box”

It is computationally demanding for large systems

Derivatives of  are very hard. Not good enough
100
Current research

Current research focusses on

Applications: nanoscience, solid state, condensed
matter, nuclear physics, geometry for molecules,...

Estimating derivatives of wave function

Solving the sign problem (very hard!!)

Make it O(N) method (currently is O(N^3)) to treat
bigger systems (currently about 200 particles)

Better wave functions

Better optimization methods
101
A reflection...
A new method for calculating properties in nuclei, atoms, molecules, or
solids automatically provokes three sorts of negative reactions:
 A new method is initially not as well formulated or understood as
existing methods
 It can seldom offer results of a comparable quality before a
considerable amount of development has taken place
 Only rarely do new methods differ in major ways from previous
approaches
Nonetheless, new methods need to be developed to handle
problems that are vexing to or beyond the scope of the
current approaches
(Slightly modified from Steven R. White, John W. Wilkins and Kenneth G. Wilson)
102
THE END

No Slide Title

Transcript No Slide Title

Directory