Robust Optimization Concepts and Examples

Download Report

Transcript Robust Optimization Concepts and Examples

Robust Optimization
Concepts and Examples
Yuriy Zinchenko
Shane G. Henderson
ORIE, Cornell University
Outline
• What can go wrong with LP?
• A familiar blend problem
• The general picture
– Robust linear programming
– Software, resources, practicalities
• Radiation therapy for cancer
treatment
Zinchenko and Henderson 2005
2
What can go wrong with LP?
Tough LP problem:
max x + y
s/t 1 x  1
1y  1
x, y  0
?
Zinchenko and Henderson 2005
3
Blend Problem
$
$$
but properties
change with
time
$$$
blend to get output
properties at
minimum cost
for any
input properties
within reason
Zinchenko and Henderson 2005
4
Blend constraints
• Typical constraint looks like
Low ≤ 10 x1 + 12 x2 + 7 x3 ≤ High
• Changes to
Low ≤ a1 x1 + a2 x2 + a3 x3 ≤ High
for any vector a that is “close” to (10, 12, 7)
Zinchenko and Henderson 2005
5
General robust LP
min cTx
s/t A(1) x  b1
A(2) x  b2
A(3) x  b3
Zinchenko and Henderson 2005
6
A more detailed view
Simple linear constraint
ax  1
x
0
with a “close” to 1, namely
0a2
Want x to work for all such a
How do we deal with it?
Zinchenko and Henderson 2005
7
ax
 1, x  0
for all
max a x  1,
0a2
x0
0  a  2, x  0
2x1, x0
x  1/2 , x  0
Zinchenko and Henderson 2005
8
A slightly more involved example:
ax+by  1
where (a, b) “close” to (1, 1), namely in
Ellipsoidal (spherical)
“uncertainty” set U
(a, b) is in U if
(a, b) = (a0, b0) + (Da, Db)
with (a0, b0) = (1, 1) and Da2 + Db2  1
Zinchenko and Henderson 2005
9
Ellipsoidal “uncertainty” set U
(a, b) = (a0, b0) + (Da, Db)
(a0, b0) = (1, 1)
Da2 + Db2  1
U
(a0, b0)
Want (x, y) to satisfy
a x + b y  1,
for all (a, b) from U
Zinchenko and Henderson 2005
10
ax+by1
max
for all
(a, b) in U
ax+by 1
(a, b) in U
What can we say about
ax+by?
a x + b y = (a0 + Da) x + (b0 + Db) y
= (a0 x + b0 y) + (Da x + Db y)
Zinchenko and Henderson 2005
11
For a moment, think of (x, y)
as your objective function (fixed)
max a x + b y ( 1 ?)
(a, b) in U
(x, y)
U
(a0, b0)
same as
(a0 x + b0 y) + max (Da x + Db y) ( 1 ?)
Da2 + Db2  1
Zinchenko and Henderson 2005
12
max (Da x + Db y) ( 1 - (a0 x + b0 y) ?)
Da2 + Db2  1
Here
Da x + Db y
 ||(x, y)||
= (x2 + y2)1/2
(x1, y1)
U
(a0, b0) (x2, y2)
the “length”
of (x, y)
Zinchenko and Henderson 2005
13
ax+by1
max
for all
(a, b) in U
ax+by 1
(a, b) in U
(a0 x + b0 y) + max (Da x + Db y)  1
Da2 + Db2  1
||(x, y)||  1 - (a0 x + b0 y)
Zinchenko and Henderson 2005
14
Good news
Can handle constraints of this type
||(x, y)||  1 - (1 x + 1 y)
easily (the so-called second-order conic
programming (SOCP))
Not much harder than linear programming!
Zinchenko and Henderson 2005
15
General Robust LP formulation
Robust LP:
max cTx
s/t A(i) x  bi, i = 1,…,m
where
c, x  Rn, A(i)  R1 x n, A(i)=A(i)0 + wi Pi
with
wi  R1 x ki, ||wi||  1, i=1,…,m, Pi  Rki x n
Zinchenko and Henderson 2005
16
SOCP equivalent:
max cTx
s/t || Pi x ||  bi - A(i)0 x, i = 1,…,m
Probabilistic interpretation:
think of A(i) taken from an a-level set of
your favorite probability distribution (e.g.
multivariate normal)
the robust constraint will read
satisfy the constraint with a given
probability a
Zinchenko and Henderson 2005
17
Where’d the ellipse come from?
• Expert opinion
• Statistics: Averages live in ellipsoids
• Doesn’t have to be an ellipse. Can be
some other shape (e.g., boxes)
Zinchenko and Henderson 2005
18
Software
Commercial:
• Mosek (http://www.mosek.com/)
“Free”:
• SeDuMi (http://sedumi.mcmaster.ca/)
• SDPT3.x
(http://www.math.nus.edu.sg/~mattohkc/sdpt3.html/)
Zinchenko and Henderson 2005
19
Practicalities
• Realistic problem sizes
– number of variables/constraints on the
order of 103 – 104
– depends (greatly) on the problem data
structure/sparsity
• Possible to obtain a “good”,
“inexpensive” approximation with LP
Zinchenko and Henderson 2005
20
Generality
• Possible to extend this approach to quite a
few other convex programming problems
Resources
• Lectures on Modern Convex Optimization:
Analysis, Algorithms, and Engineering
Applications
by A. Ben-Tal, A. S. Nemirovskii
• Google for Robust Optimization (robust LP
etc.)
Zinchenko and Henderson 2005
21
Joint work with Millie Chu (Cornell)
and Michael B. Sharpe (Princess
Margaret Hospital, Toronto)
Zinchenko and Henderson 2005
22
Cancer treatment
• About 1.3 million new cancer cases in
the U.S. each year
• Nearly 60% receive radiation therapy
(in conjunction with surgery,
chemotherapy etc)
Zinchenko and Henderson 2005
23
External beam radiation therapy
• Radiation delivered
by a linear
accelerator
• Cancer cells more
susceptible than
normal cells
• Overlay beams
from different
angles
• Dose given in daily
fractions for ~ 6
weeks
Zinchenko and Henderson 2005
24
Intensity Modulated Radiation Therapy
• Block parts of the radiation beam – discretize
the whole beam into a grid of smaller “beamlets”
• Choose different intensities for each beamlet
Intensity Modulated
Radiation Therapy
Collaborative Working
Group, 2001
Zinchenko and Henderson 2005
25
Treatment Planning
Goal: Choose beam angles and beamlet intensities that
deliver enough radiation to kill all tumor cells, while
avoiding healthy organs & tissue as much as possible
- Take CT scan
- Delineate target
region and healthy
structures
- Discretize body as
small cubes, or
“voxels”
- Formulate & solve a
mathematical
program to find a
Zinchenko and Henderson 2005
“good” plan
26
Princess Margaret Hospital
Robust Treatment Planning
Setup errors
+ Patient motion
+ Structural changes during treatment
= uncertainty in geometry
• Don’t rescan patient much if at all
• Use RO to “robustify” mathematical
program
Zinchenko and Henderson 2005
27
Model Formulation
• Many different formulations exist – we use a
penalty formulation
minimize:
penalty objective
subject to:
Pr(Dose to voxel i in healthy structure k ≤ Uk)
≥ 0.95
Pr(Dose to voxel i in tumor ≥ L)
≥ 0.95
x = beamlet intensities
Zinchenko and Henderson 2005
≥ 0
28
Computational Results
• Prostate: tumor + 5 healthy regions
• 5 equi-spaced beams, ~ 225 beamlets
from each angle
• Voxel size = 2 cm, ~ 400 total voxels
• Solver: Mosek, v. 3.0.1.18
• Solve time = 6 seconds (LP), 45
minutes (SOCP)
Zinchenko and Henderson 2005
29
Dose-Volume Histograms
% of structure
receiving ≥ x
Gy
deterministic
solution’s plan
DVH of
expected dose
stochastic
solution’s plan
Zinchenko and Henderson 2005
30
Comparison
• Simulate 10 treatments (45 fractions each)
• For each of the 10 treatments, and for
each solution (deterministic & stochastic),
– calculated dose delivered to each voxel in each
fraction
– summed over the 45 fractions to get total dose
delivered to each voxel
– plotted DVH
Zinchenko and Henderson 2005
31
DVH – Treatment 1
det
stoch
Zinchenko and Henderson 2005
32
DVH – Treatment 2
det
stoch
Zinchenko and Henderson 2005
33
DVH – Treatment 3
det
stoch
Zinchenko and Henderson 2005
34
DVH – Treatment 4
det
stoch
Zinchenko and Henderson 2005
35
DVH – Treatment 5
det
stoch
Zinchenko and Henderson 2005
36
DVH – Treatment 6
det
stoch
Zinchenko and Henderson 2005
37
DVH – Treatment 7
det
stoch
Zinchenko and Henderson 2005
38
DVH – Treatment 8
det
stoch
Zinchenko and Henderson 2005
39
DVH – Treatment 9
det
stoch
Zinchenko and Henderson 2005
40
DVH – Treatment 10
det
stoch
Zinchenko and Henderson 2005
41
Conclusions
• LP “pushes you into a corner”
• True situation never same as data
• Robust LP: Find good solution that is
always feasible within reason
• Efficient solution methods: can solve
real problems
• Software available
Zinchenko and Henderson 2005
42
A Bit More Detail
• Di(x) = Dose delivered to voxel i in N fractions, with intensities x, a random
variable
Di(x) is the sum of N random variables (N = 45), assume iid,
apply CLT, so Di(x) is approximately normally distributed
• Take n sample shifts, s1,...,sn, with associated probabilities p = (p1,...,pn)T
• Let ai(∙)T = ai(s1)T
ai(s2)T
…
dose delivered to voxel i, shifted by sj,
from each beamlet with unit intensity
ai(sn)T
so that NpTai(∙)Tx = expected total dose delivered to voxel i, for N fractions.
• Let vi(x) = sample variance of dose delivered to voxel i

Di(x) ~ Normal ( NpTai(·)Tx, Nvi(x) )
Zinchenko and Henderson 2005
43
Probabilistic Constraints
• Want constraints to be violated with low probability (say, δ = .05)

• Example: maximum dose constraint on voxel i in Hk:
Assuming Di(x) ~ Normal ( NpTai(∙)Tx, Nvi(x) ),
 m
k
Want P(Di(x) > mk) ≤ δ



 Di (x)  NpT ai ()T x m k  NpT ai ()T x 

P



Nvi (x)
Nvi (x)


m k  NpT a i ()T x
 z1
Nvi (x)

m k  NpT ai ()T x
vi (x) 
z1 N
mk  NpT ai ()T x
Rai () x 
2
z1 N
T
• Second order cone program (SOCP)
Zinchenko and Henderson 2005
44
Dose-Volume Constraints
• Physicians like constraints of form:
“<= fraction fk of structure Hk gets >= dk”
• 0-1 var for each voxel: = 1 if dose is > dk.
• MIP: Hard to solve!
• Many voxels get near max allowed dose
• Alternative: upper bound the “excess” dose.
For healthy structure Hk, we require:
 ( Na x  d
iH k
T
i
k
' )  gk
• Linear constraints☺
Zinchenko and Henderson 2005
45