Lecture slides

Download Report

Transcript Lecture slides

Lecture 3
Stochastic Differentiation
Leonidas Sakalauskas
Institute of Mathematics and Informatics
Vilnius, Lithuania
EURO Working Group on Continuous Optimization
Content









Concept of stochastic gradient
Analytical differentiation of expectation
Differentiation of the objective function of twostage SLP
Finite difference approach
Stochastic perturbation approximation
Likelihood approach
Differentiation of integrals given by inclusion
Simulation of stochastic gradient
Projection of Stochastic Gradient
Expected objective function
The stochastic programming deals with the objective
and/or constraint functions defined as expectation of
random function:
F ( x)  Ef x,  ,
f :R    R
n
 
-elementary event in the probability space:
,,Px ,
Px - the measure, defined by probability density function:
p : R n   R
Concept of stochastic gradient
The methods of nonlinear stochastic
programming are built using the concept of
stochastic gradient.
The stochastic gradient of the function F (x )
is the random vector g ( x,  ) such that:
F ( x)
Eg ( x,  ) 
x
Methods of stochastic differentiation

Several estimators examined for
stochastic gradient:
 Analytical approach (AA);
 Finite difference approach (FD);
 Likelihood ratio approach (LR)
 Simulated perturbation
approximation.
Stochastic gradient: an analytical
approach
F ( x)   f ( x, z )  p( x, z )dz
n

F ( x)
   x f ( x, z )   x ln p( x, z ) p( x, z )dz
x
n

g ( x,  )   x f ( x,  )   x ln p( x,  )
Analytical approach (AA)
Assume, density of random variable
doesn’t depends on the decision
variable.
Thus, the analytical stochastic gradient
coincides with the gradient of
random integrated function:
f ( x,  )
g ( x,  ) 
x
1
Analytical approach (AA)
Let consider the two-stage SLP:
F ( x)  c  x  E min y q  y  min
W  y  T  x  h,
Ax  b,
y  Rm ,
x X,
vectors q, h, and matrices W, T
can be random in general
Analytical approach (AA)
The stochastic analytical gradient is defined as
g ( x,  )  c  T  u *
1
by the a set of solutions of the dual problem
(h  T  x)T  u*  max u [( h  T  x)T  u | u  W T  q  0, u  m ]
Finite difference (FD) approach
Let us approximate the gradient of the random
function by finite differences.
Thus, the each ith component of the stochastic
gradient
is computed as:
g 2 ( x,  )
g ( x,  ) 
2
i
i
f ( x   i , y )  f ( x, y )

is the vector with zero components except ith
one, equal to 1,  is some small value.
Simulated perturbation stochastic
approximation (SPSA)
g 3 ( x, y ) 

f ( x    , y )  f ( x    , y )
2 
where
is the random vector obtaining values 1 or -1
with probabilities p=0.5,  is some small value
(Spall 1992).
Likelihood ratio (LR) approach
F ( x) 
 f ( x  z) p( z)dz
n

 ln( p( y ))
g ( x, y )  ( f ( x  y )  f ( x)) 
y
4
Stochastic differentiation of
integrals given by inclusion
Let consider the integral on the set given by
inclusion
F ( x) 
 p( x, z )  dz
f ( x , z ) B
Stochastic differentiation of
integrals given by inclusion
The gradient of this function is defined as
 p( x, z )

G ( x)   
 q( x, z )   dz
x

f ( x , z ) B 
where q ( x, z ) is defined through
derivatives of p and f (see, Uryasev (1994),
(2002))
Simulation of stochastic gradient
We assume here that the Monte-Carlo sample of a
certain size N are provided for any x  R n
Z  ( z1 , z 2 ,..., z N ),
z
i
are independent random copies of  ,
i.e., distributed according to the density
p ( x, z ).
Sampling estimators of the
objective function
the sampling estimator of the objective function:
i
f
(
x
,
z
)
i 1
N
~
F ( x) 
N
and the sampling variance are computed


( x) 
N
D
2
i 1

2
~
f ( x, z )  F ( x)
i
N
Sampling estimator of the gradient
The gradient is evaluated using the
same random sample:
~
G ( x) 
i
g
(
x
,
z
)
i 1
N
N
Sampling estimator of the gradient
The sampling covariance matrix is applied



T
1
N
~
~
i
i
A( x) 
g ( x, z )  G ( x )  g ( x, z )  G ( x )

i 1
N n
later on for normalising of the gradient
estimator.
Say, the Hotelling statistics can be used for
testing the value of the gradient:
N n ~ T
1 ~
T 
G ( x)  A( x)  G ( x)
n
2
Computer simulation


Two-stage stochastic linear optimisation problem.
Dimensions of the task are as follows:


the first stage has 10 rows and 20 variables;
the second stage has 20 rows and 30 variables.
http://www.math.bme.hu/~deak/twostage/ l1/20x20.1/
(2006-01-20).
Wrap-Up and conclusions


The methods of nonlinear stochastic programming
are built using the concept of stochastic gradient
Several methods exist to obtain the stochastic
gradient by evaluating the objective function and
stochastic gradient by the same random sample.