Transcript ppt - COR@L

Emory University
Inexact Methods for
PDE-Constrained
Optimization
Frank Edward Curtis
Northwestern University
Joint work with Richard Byrd and Jorge Nocedal
February 12, 2007
Nonlinear Optimization

“One” problem
min
f ( x)
s.t.
g ( x)  0
h( x )  0
x
Circuit Tuning

Building blocks:
 Transistors

(switches) and Gates (logic units)
Improve aspects of the circuit – speed, area,
power – by choosing transistor widths
AT1
AT2
w1
w2
d1
AT3
d2
(A. Wächter, C. Visweswariah, and A. R. Conn, 2005)
Circuit Tuning

Building blocks:
 Transistors

Improve aspects of the circuit – speed, area,
power – by choosing transistor widths
AT1
AT2

(switches) and Gates (logic units)
w1
w2
d1
AT3
d2
Formulate an optimization problem
min
s.t.
f (AT, w, d)
AT3  max AT1  d1, AT2  d2,...
(A. Wächter, C. Visweswariah, and A. R. Conn, 2005)
Strategic Bidding in Electricity Markets

Independent operator collects
bids and sets production
schedule and “spot price” to
minimize cost to consumers
(Pereira, Granville, Dix, and Barroso, 2004)
min
s.t.
bids  P
PT e  demand
Strategic Bidding in Electricity Markets


Independent operator collects
min
bids  P
bids and sets production
schedule and “spot price” to
s.t. PT e  demand
minimize cost to consumers
Electricity production companies “bid” on how
much they will charge for one unit of electricity
max
s.t.
(spot  cost)  P
bids  P 
min
P  arg min 

T
 s.t. P e  demand 
(Pereira, Granville, Dix, and Barroso, 2004)
Strategic Bidding in Electricity Markets


Independent operator collects
min
bids  P
bids and sets production
schedule and “spot price” to
s.t. PT e  demand
minimize cost to consumers
Electricity production companies “bid” on how
much they will charge for one unit of electricity
max
s.t.



(spot  cost)  P
bids  P 
min
P  arg min 

T
 s.t. P e  demand 
Bilevel problem
Equivalent to MPCC
Hard geometry!
(Pereira, Granville, Dix, and Barroso, 2004)
x0
y0
xy  0
y
x
Challenges for NLP algorithms









Very large problems
Numerical noise
Availability of derivatives
Degeneracies
Difficult geometries
Expensive function evaluations
Real-time solutions needed
Integer variables
Negative curvature
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Equality constrained optimization
Goal: solve the problem
min
f ( x)
s.t.
c( x)  0
x
e.g., minimize the difference
between observed and expected
behavior, subject to atmospheric
flow equations (Navier-Stokes)
x
*
Equality constrained optimization
Goal: solve the problem
min
f ( x)
s.t.
c( x)  0
x
Define: the derivatives
g ( x) :
f ( x)
A( x) :
c i ( x)
2
W ( x,  ) :  xx L( x,  )


Define: the Lagrangian
L( x,  )  f ( x)  T c( x)
Goal: solve KKT conditions
 g ( x)  A( x)T  

0
c( x)


Sequential Quadratic Programming (SQP)

Two “equivalent” step computation techniques
Algorithm: Newton’s method
W

A
 g  AT  
AT  d 
    

0   
 c 
Algorithm: the SQP subproblem
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Sequential Quadratic Programming (SQP)

Two “equivalent” step computation techniques
Algorithm: Newton’s method
W

A
 g  AT  
AT  d 
    

0   
 c 
W

A
A 

0
T
Algorithm: the SQP subproblem
min
g T d  12 d TWd
s.t.
c  Ad  0
d
KKT matrix
• Cannot be formed
• Cannot be factored
Sequential Quadratic Programming (SQP)

Two “equivalent” step computation techniques
Algorithm: Newton’s method
W

A
 g  AT  
AT  d 
    

0   
 c 
W

A
A 

0
T
Algorithm: the SQP subproblem
min
g T d  12 d TWd
s.t.
c  Ad  0
d
KKT matrix
• Cannot be formed
• Cannot be factored
Linear system solve
• Iterative method
• Inexactness
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Unconstrained optimization
Goal: minimize a nonlinear objective
min f ( x )
x
Algorithm: Newton’s method (CG)
 f ( xk )d k  f ( xk )
2
Unconstrained optimization
Goal: minimize a nonlinear objective
min f ( x )
x
Algorithm: Newton’s method (CG)
 f ( xk )d k  f ( xk )
2
Note: choosing any
intermediate step ensures
global convergence to a
local solution of NLP
xk
(Steihaug, 1983)
Nonlinear equations
Goal: solve a nonlinear system
F ( x)  0
Algorithm: Newton’s method
F ( xk )d k   F ( xk )
Nonlinear equations
Goal: solve a nonlinear system
F ( x)  0
Algorithm: Newton’s method
F ( xk )d k   F ( xk )
any step with
xk
F ( xk )d k   F ( xk )  rk
and
rk   F ( xk ) , 0    1
ensures descent
(Dembo, Eisenstat, and Steihaug, 1982)
(Eisenstat and Walker, 1994)
Line Search SQP Framework
W

A

 g  AT  
AT  d 
    


0  
 c 
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Define “exact” penalty function
( x)  f ( x)   c( x)
Line Search SQP Framework
W

A

 g  AT  
AT  d 
    


0  
 c 
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Define “exact” penalty function
( x)  f ( x)   c( x)
 ( x  d )
 ( x  d )
 1
  10


Algorithm Outline (exact steps)

for k = 0, 1, 2, …
 Compute
step by…
W

A
 Set
 g  AT  
AT  d 
    

0   
c


penalty parameter to ensure descent on…
( x)  f ( x)   c( x)
 Perform
backtracking line search to satisfy…
( x  d )  ( x) D (d )
 Update
iterate
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
… which may lead to
an increase in the
model objective
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Quadratic/linear model of merit function
W

A

 g  AT  
AT  d 
    


0  
 c 
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Create model
m(d )  f  g T d  12 d TWd   ( c  Ad )

Quantify reduction obtained from step
mred (d )  m(0)  m(d )
  g d  d Wd   ( c  c  Ad )
T
1
2
T
Quadratic/linear model of merit function
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Create model
m(d )  f  g T d  12 d TWd   ( c  Ad )

Quantify reduction obtained from step
mred (d )  m(0)  m(d )
  g d  d Wd   ( c  c  Ad )
T
1
2
T
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
… which may lead to
an increase in the
model objective
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
… which may lead to
an increase in the
model objective
… but this is ok since
we can account for this
conflict by increasing
the penalty parameter
mred (d )   c , 0    1
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
… which may lead to
an increase in the
model objective
… but this is ok since
we can account for this
conflict by increasing
the penalty parameter
g T d  12 d T Wd

, 0  1
(1   ) c
Algorithm Outline (exact steps)

for k = 0, 1, 2, …
 Compute
step by…
W

A
 Set
 g  AT  
AT  d 
    

0   
c


penalty parameter to ensure descent on…
( x)  f ( x)   c( x)
 Perform
backtracking line search to satisfy…
( x  d )  ( x) D (d )
 Update
iterate
First attempt
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Proposition: sufficiently small residual
(  , r )   ( g  AT  , c) , 0    1

Test: 61 problems from CUTEr test set

1e-8
1e-7
1e-6
1e-5
1e-4
1e-3
1e-2
1e-1
Success
100%
100%
97%
97%
90%
85%
72%
38%
Failure
0%
0%
3%
3%
10%
15%
28%
62%
First attempt… not robust
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Proposition: sufficiently small residual
(  , r )   ( g  AT  , c) , 0    1

… not enough for complete robustness
 We
have multiple goals (feasibility and optimality)
 Lagrange multipliers may be completely off

… may not have descent!
Second attempt

Step computation: inexact SQP step
W

A

 g  AT     
AT  d 
    
 
0   
 c  r 
Recall the line search condition
( x  d )  ( x) D (d )

We can show
D (d )  g d    c  r
T

Second attempt

Step computation: inexact SQP step
W

A

 g  AT     
AT  d 
    
 
0   
 c  r 
Recall the line search condition
( x  d )  ( x) D (d )

We can show
D (d )  g d    c  r
T

... but how negative should this be?
Algorithm Outline (exact steps)

for k = 0, 1, 2, …
 Compute
step
W

A
 Set
 g  AT  
AT  d 
    

0   
c


penalty parameter to ensure descent
( x)  f ( x)   c( x)
 Perform
backtracking line search
( x  d )  ( x) D (d )
 Update
iterate
Algorithm Outline (inexact steps)

for k = 0, 1, 2, …
 Compute
step and set penalty parameter to ensure
descent and a stable algorithm
W

A
 g  AT     
AT  d 
    
 
0   
c

 r 
 Perform
( x)  f ( x)   c( x)
backtracking line search
( x  d )  ( x) D (d )
 Update
iterate
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
 g T d  12 d TWd    c  r    c
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
Step is acceptable if for
0    1, 0   :
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
r  ,
   g  AT 
 g T d  12 d TWd    c  r    c
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
Step is acceptable if for
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
0    1, 0   :
r  c ,
  c
 g T d  12 d TWd    c  r    c
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
Step is acceptable if for
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
0    1, 0   :
r  c ,
  c
g T d  12 d T Wd

, 0  1
(1   ) c  r 
 g T d  12 d TWd    c  r    c
Algorithm Outline

for k = 0, 1, 2, …
 Iteratively
solve
W

A
 g  AT     
AT  d 
    
 
0   
 c  r 
 Until
r   c , 0   1
  c , 0
r  , 0  
or
   g  AT  , 0    1
mred (d )   c
 Update penalty parameter
 Perform backtracking line search
 Update iterate
Termination Test

Observe KKT conditions
g  AT   max 1, g  opt
, 0   opt  1
c  max 1, c( x0 )  feas , 0   feas  1
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Assumptions

The sequence of iterates is contained in a convex
set and the following conditions hold:
 the
objective and constraint functions and their first
and second derivatives are bounded
 the multiplier estimates are bounded
 the constraint Jacobians have full row rank and their
smallest singular values are bounded below by a
positive constant
 the Hessian of the Lagrangian is positive definite with
smallest eigenvalue bounded below by a positive
constant
Sufficient Reduction to Sufficient Decrease

Taylor expansion of merit function yields
D (d )  g T d    c  r


Accepted step satisfies
mred (d )   c , 0    1
g d    c  r    d Wd   c
T
1
2
T
D (d )   12 d TWd   c

  d  c
2

Intermediate Results
r   c , 0   1
  c , 0

g d  d Wd
, 0  1
(1   ) c  r 
T
1
2
T
r  , 0  
  g A  , 0
T
mred (d )   c
d
is bounded above
 
is bounded above

is bounded below by
a positive constant
Sufficient Decrease in Merit Function

D (d ; x,  )   d  c
2


 ( x;  )   ( x  d ;  )    d  c

lim d k
k 
2

 ck  0
lim Z kT g k  0
k 
2

Step in Dual Space

We converge to an optimal primal solution, and
g  A     g  AT  , 0     1

T
(for sufficiently small || c || and || d || )
Therefore,
lim ck  0
k 
lim g k  A k  0
k 
T
k
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Problem Formulation

Tikhonov-style regularized inverse problem
to solve for a reasonably large mesh size 
 Want to solve for small regularization parameter 
 Want


SymQMR for linear system solves
Input parameters:
 g  AT  
 

r   
 
 c 
Recall:
r   c , 0   1
  c , 0
(Curtis and Haber, 2007)
  0.1,   1,
  1,   0.1
r  , 0  
or
   g  AT  , 0    1
mred (d )   c
Numerical Results
n
m

1024
512
1e-6
 
r  
 
 g  AT  


 c 

Iters.
Time
Total LS Avg. LS
Iters.
Iters.
Avg. Rel.
Res.
0.5
29
29.5s
1452
50.1
3.12e-1
0.1
12
11.37s
654
54.5
6.90e-2
0.01
9
11.60s
681
75.7
6.27e-3
(Curtis and Haber, 2007)
Numerical Results
n
m

1024
512
1e-6
 
r  
 
 g  AT  


 c 

Iters.
Time
Total LS Avg. LS
Iters.
Iters.
Avg. Rel.
Res.
0.5
29
29.5s
1452
50.1
3.12e-1
0.1
12
11.37s
654
54.5
6.90e-2
0.01
9
11.60s
681
75.7
6.27e-3
(Curtis and Haber, 2007)
Numerical Results
n
m
1024
512
1e-1


Iters.
Time
1e-6
12
1e-7
11.40s
Total LS Avg. LS
Iters.
Iters.
654
54.5
Avg. Rel.
Res.
6.90e-2
11
14.52s
840
76.4
6.99e-2
1e-8
8
10.57s
639
79.9
6.15e-2
1e-9
11
18.52s
1139
104
8.65e-2
1e-10
19
44.41s
2708
143
8.90e-2
(Curtis and Haber, 2007)
Numerical Results
n
m
8192
4096
1e-1


Iters.
1e-6
15
Total LS Avg. LS
Iters.
Iters.
264.47s 1992
133
Avg. Rel.
Res.
8.13e-2
1e-7
11
236.51s 1776
161
6.89e-2
1e-8
9
204.51s 1567
174
6.77e-2
1e-9
11
347.66s 2681
244
8.29e-2
1e-10
16
805.14s 6249
391
8.93e-2
(Curtis and Haber, 2007)
Time
Numerical Results
n
m
65536
32768
1e-1


Iters.
1e-6
15
Total LS Avg. LS
Iters.
Iters.
5055.9s 4365
291
Avg. Rel.
Res.
8.46e-2
1e-7
10
4202.6s 3630
363
8.87e-2
1e-8
12
5686.2s 4825
402
7.96e-2
1e-9
12
6678.7s 5633
469
8.77e-2
1e-10
14
14783s
895
8.63e-2
(Curtis and Haber, 2007)
Time
12525
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Review and Future Challenges

Review
 Defined a globally convergent inexact SQP algorithm
 Require only inexact solutions of primal-dual system
 Require
only matrix-vector products involving
objective and constraint function derivatives
 Results also apply when only reduced Hessian of
Lagrangian is assumed to be positive definite
 Numerical experience on model problem is promising

Future challenges
 (Nearly) Singular constraint Jacobians
 Inexact derivative information
 Negative curvature
 etc., etc., etc….
Negative Curvature
 Big question
 What
is the best way to handle negative curvature
(i.e., when the reduced Hessian may be indefinite)?

Small question
 What
is the best way to handle negative curvature in
the context of our inexact SQP algorithm?
 We have no inertia information!

Smaller question
 When can we handle negative curvature in the
context of our inexact SQP algorithm with NO
algorithmic modifications?
 When do we know that a given step is OK?
 Our analysis of the inexact case leads to a few
observations…
Why Quadratic Models?
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
xk
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Why Quadratic Models?
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
xk
Provides a good…
• direction? Yes
• step length? Yes
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Provides a good…
• direction? Maybe
• step length? Maybe
Why Quadratic Models?
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
xk


min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
One can use our stopping criteria as a mechanism for
determining which are good directions
All that needs to be determined is whether the step
lengths are acceptable
Unconstrained Optimization
Hd   g  

min g T d  12 d T Hd
d
Direct method is the angle test
 gT d   g d

Indirect method is to check the conditions
   g , d Hd   d
T
or
 g d  g , d  g
T
2
2
Unconstrained Optimization
Hd   g  

min g T d  12 d T Hd
d
Direct method is the angle test
 gT d   g d

Indirect method is to check the conditions
   g , d Hd   d
T
or
step quality
 g d  g , d  g
T
2
2
step length
Constrained Optimization
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
g T d  12 d TWd
s.t.
c  Ad  0
d
Step quality determined by
r   c , 0   1
  c , 0

min
r  , 0  
or
   g  AT  , 0    1
mred (d )   c
Step length determined by
d Wd   d
T
2
or
d   max  c , r 
Thanks!
Actual Stopping Criteria
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
g T d  12 d TWd
s.t.
c  Ad  0
d
Stopping conditions: 0   ,   1, 0   , 
r  c
  c

min
r   max  c , 1
or

  max  c ,  g  AT 
mred (d )   max  c , r  c 
Model reduction condition
 g T d  2 d TWd    c  r    max  c , r  c 

Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g T d  2 d T Wd
Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g T d  2 d T Wd
Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g T d  2 d T Wd
Some region
around the
exact solution
Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g d  2 d Wd
T

T
Ellipse distorted
toward the
linearized
constraints
Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g T d  2 d T Wd