Transcript ppt - COR@L

University of North Carolina at Chapel Hill
Inexact Methods for
PDE-Constrained
Optimization
Frank Edward Curtis
Northwestern University
Joint work with Richard Byrd and Jorge Nocedal
January 31, 2007
Nonlinear Optimization

“One” problem
min
f ( x)
s.t.
g ( x)  0
h( x )  0
x
Circuit Tuning

Building blocks:
 Transistors

(switches) and Gates (logic units)
Improve aspects of the circuit – speed, area,
power – by choosing transistor widths
AT1
AT2
w1
w2
d1
AT3
d2
(A. Wächter, C. Visweswariah, and A. R. Conn, 2005)
Circuit Tuning

Building blocks:
 Transistors

Improve aspects of the circuit – speed, area,
power – by choosing transistor widths
AT1
AT2

(switches) and Gates (logic units)
w1
w2
d1
AT3
d2
Formulate an optimization problem
min
s.t.
f (AT, w, d)
AT3  max AT1  d1, AT2  d2,...
(A. Wächter, C. Visweswariah, and A. R. Conn, 2005)
Strategic Bidding


Electricity production companies “bid” on how
much they will charge for one unit of electricity
Independent operator collects bids and sets
production schedule and “spot price” to
minimize cost to consumers
max
s.t.
(spot  cost)  P
bids  P 
min
P  arg min 

T
 s.t. P e  demand 
(Pereira, Granville, Dix, and Barroso, 2004)
Strategic Bidding


Electricity production companies “bid” on how
much they will charge for one unit of electricity
Independent operator collects bids and sets
production schedule and “spot price” to
minimize cost to consumers
max
s.t.



(spot  cost)  P
bids  P 
min
P  arg min 

T
 s.t. P e  demand 
Bilevel problem
Equivalent to MPCC
Hard geometry!
(Pereira, Granville, Dix, and Barroso, 2004)
x0
y0
xy  0
y
x
Challenges for NLP algorithms









Very large problems
Numerical noise
Availability of derivatives
Degeneracies
Difficult geometries
Expensive function evaluations
Real-time solutions needed
Integer variables
Negative curvature
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Equality constrained optimization
Goal: solve the problem
min
f ( x)
s.t.
c( x)  0
x
e.g., minimize the difference
between observed and expected
behavior, subject to atmospheric
flow equations (Navier-Stokes)
Equality constrained optimization
Goal: solve the problem
min
f ( x)
s.t.
c( x)  0
x
Define: the derivatives
g ( x) :
f ( x)
A( x) :
c i ( x)
2
W ( x,  ) :  xx L( x,  )


Define: the Lagrangian
L( x,  )  f ( x)  T c( x)
Goal: solve KKT conditions
 g ( x)  A( x)T  

0
c( x)


Sequential Quadratic Programming (SQP)

Two “equivalent” step computation techniques
Algorithm: Newton’s method
W

A
 g  AT  
AT  d 
    

0   
 c 
Algorithm: the SQP subproblem
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Sequential Quadratic Programming (SQP)

Two “equivalent” step computation techniques
Algorithm: Newton’s method
W

A
 g  AT  
AT  d 
    

0   
 c 
W

A
A 

0
T
Algorithm: the SQP subproblem
min
g T d  12 d TWd
s.t.
c  Ad  0
d
KKT matrix
• Cannot be formed
• Cannot be factored
Sequential Quadratic Programming (SQP)

Two “equivalent” step computation techniques
Algorithm: Newton’s method
W

A
 g  AT  
AT  d 
    

0   
 c 
W

A
A 

0
T
Algorithm: the SQP subproblem
min
g T d  12 d TWd
s.t.
c  Ad  0
d
KKT matrix
• Cannot be formed
• Cannot be factored
Linear system solve
• Iterative method
• Inexactness
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Unconstrained optimization
Goal: minimize a nonlinear objective
min f ( x )
x
Algorithm: Newton’s method (CG)
 f ( xk )d k  f ( xk )
2
Unconstrained optimization
Goal: minimize a nonlinear objective
min f ( x )
x
Algorithm: Newton’s method (CG)
 f ( xk )d k  f ( xk )
2
Note: choosing any
intermediate step ensures
global convergence to a
local solution of NLP
xk
(Steihaug, 1983)
Nonlinear equations
Goal: solve a nonlinear system
F ( x)  0
Algorithm: Newton’s method
F ( xk )d k   F ( xk )
Nonlinear equations
Goal: solve a nonlinear system
F ( x)  0
Algorithm: Newton’s method
F ( xk )d k   F ( xk )
any step with
xk
F ( xk )d k   F ( xk )  rk
and
rk   F ( xk ) , 0    1
ensures descent
(Dembo, Eisenstat, and Steihaug, 1982)
(Eisenstat and Walker, 1994)
Line Search SQP Framework
W

A

 g  AT  
AT  d 
    


0  
 c 
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Define “exact” penalty function
( x)  f ( x)   c( x)

Implement a line search
( x  d )  ( x) D (d )
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
… which may lead to
an increase in the
model objective
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Quadratic/linear model of merit function
W

A

 g  AT  
AT  d 
    


0  
 c 
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Create model
m(d )  f  g T d  12 d TWd   ( c  Ad )

Quantify reduction obtained from step
mred (d )  m(0)  m(d )
  g d  d Wd   ( c  c  Ad )
T
1
2
T
Quadratic/linear model of merit function
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Create model
m(d )  f  g T d  12 d TWd   ( c  Ad )

Quantify reduction obtained from step
mred (d )  m(0)  m(d )
  g d  d Wd   ( c  c  Ad )
T
1
2
T
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
… which may lead to
an increase in the
model objective
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
… which may lead to
an increase in the
model objective
… but this is ok since
we can account for this
conflict by increasing
the penalty parameter
mred (d )   c , 0    1
Exact Case
W

A
 g  AT  
AT  d 
    


0  
 c 
Exact step minimizes
the objective on the
linearized constraints
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
… which may lead to
an increase in the
model objective
… but this is ok since
we can account for this
conflict by increasing
the penalty parameter
g T d  12 d T Wd

, 0  1
(1   ) c
Algorithm Outline (exact steps)

for k = 0, 1, 2, …
 Compute
step by…
W

A
 Set
 g  AT  
AT  d 
    

0   
c


penalty parameter to ensure descent on…
( x)  f ( x)   c( x)
 Perform
backtracking line search to satisfy…
( x  d )  ( x) D (d )
 Update
iterate
First attempt
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Proposition: sufficiently small residual
(  , r )   ( g  AT  , c) , 0    1

Test: 61 problems from CUTEr test set

1e-8
1e-7
1e-6
1e-5
1e-4
1e-3
1e-2
1e-1
Success
100%
100%
97%
97%
90%
85%
72%
38%
Failure
0%
0%
3%
3%
10%
15%
28%
62%
First attempt… not robust
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Proposition: sufficiently small residual
(  , r )   ( g  AT  , c) , 0    1

… not enough for complete robustness
 We
have multiple goals (feasibility and optimality)
 Lagrange multipliers may be completely off

… may not have descent!
Second attempt

Step computation: inexact SQP step
W

A

 g  AT     
AT  d 
    
 
0   
 c  r 
Recall the line search condition
( x  d )  ( x) D (d )

We can show
D (d )  g d    c  r
T

Second attempt

Step computation: inexact SQP step
W

A

 g  AT     
AT  d 
    
 
0   
 c  r 
Recall the line search condition
( x  d )  ( x) D (d )

We can show
D (d )  g d    c  r
T

... but how negative should this be?
Algorithm Outline (exact steps)

for k = 0, 1, 2, …
 Compute
step
W

A
 Set
 g  AT  
AT  d 
    

0   
c


penalty parameter to ensure descent
( x)  f ( x)   c( x)
 Perform
backtracking line search
( x  d )  ( x) D (d )
 Update
iterate
Algorithm Outline (inexact steps)

for k = 0, 1, 2, …
 Compute
step and set penalty parameter to ensure
descent and a stable algorithm
W

A
 g  AT     
AT  d 
    
 
0   
c

 r 
 Perform
( x)  f ( x)   c( x)
backtracking line search
( x  d )  ( x) D (d )
 Update
iterate
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
 g T d  12 d TWd    c  r    c
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
Step is acceptable if for
0    1, 0   :
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
r  ,
   g  AT 
 g T d  12 d TWd    c  r    c
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
Step is acceptable if for
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
0    1, 0   :
r  c ,
  c
 g T d  12 d TWd    c  r    c
Inexact Case
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
Step is acceptable if for
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
0    1, 0   :
r  c ,
  c
g T d  12 d T Wd

, 0  1
(1   ) c  r 
 g T d  12 d TWd    c  r    c
Algorithm Outline

for k = 0, 1, 2, …
 Iteratively
solve
W

A
 g  AT     
AT  d 
    
 
0   
 c  r 
 Until
r   c , 0   1
  c , 0
r  , 0  
or
   g  AT  , 0    1
mred (d )   c
 Update penalty parameter
 Perform backtracking line search
 Update iterate
Termination Test

Observe KKT conditions
g  AT   max 1, g  opt
, 0   opt  1
c  max 1, c( x0 )  feas , 0   feas  1
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Assumptions

The sequence of iterates is contained in a convex
set and the following conditions hold:
 the
objective and constraint functions and their first
and second derivatives are bounded
 the multiplier estimates are bounded
 the constraint Jacobians have full row rank and their
smallest singular values are bounded below by a
positive constant
 the Hessian of the Lagrangian is positive definite with
smallest eigenvalue bounded below by a positive
constant
Sufficient Reduction to Sufficient Decrease

Taylor expansion of merit function yields
D (d )  g T d    c  r


Accepted step satisfies
mred (d )   c , 0    1
g d    c  r    d Wd   c
T
1
2
T
D (d )   12 d TWd   c

  d  c
2

Intermediate Results
r   c , 0   1
  c , 0

g d  d Wd
, 0  1
(1   ) c  r 
T
1
2
T
r  , 0  
  g A  , 0
T
mred (d )   c
d
is bounded above
 
is bounded above

is bounded below by
a positive constant
Sufficient Decrease in Merit Function

D (d ; x,  )   d  c
2


 ( x;  )   ( x  d ;  )    d  c

lim d k
k 
2

 ck  0
lim Z kT g k  0
k 
2

Step in Dual Space

We converge to an optimal primal solution, and
g  A     g  AT  , 0     1

T
(for sufficiently small || c || and || d || )
Therefore,
lim ck  0
k 
lim g k  A k  0
k 
T
k
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Problem Formulation

Tikhonov-style regularized inverse problem
to solve for a reasonably large mesh size 
 Want to solve for small regularization parameter 
 Want


SymQMR for linear system solves
Input parameters:
 g  AT  
 

r   
 
 c 
Recall:
r   c , 0   1
  c , 0
(Curtis and Haber, 2007)
  0.1,   1,
  1,   0.1
r  , 0  
or
   g  AT  , 0    1
mred (d )   c
Numerical Results
n
m

1024
512
1e-6
 
r  
 
 g  AT  


 c 

Iters.
Time
Total LS Avg. LS
Iters.
Iters.
Avg. Rel.
Res.
0.5
29
29.5s
1452
50.1
3.12e-1
0.1
12
11.37s
654
54.5
6.90e-2
0.01
9
11.60s
681
75.7
6.27e-3
(Curtis and Haber, 2007)
Numerical Results
n
m

1024
512
1e-6
 
r  
 
 g  AT  


 c 

Iters.
Time
Total LS Avg. LS
Iters.
Iters.
Avg. Rel.
Res.
0.5
29
29.5s
1452
50.1
3.12e-1
0.1
12
11.37s
654
54.5
6.90e-2
0.01
9
11.60s
681
75.7
6.27e-3
(Curtis and Haber, 2007)
Numerical Results
n
m
1024
512
1e-1


Iters.
Time
1e-6
12
1e-7
11.40s
Total LS Avg. LS
Iters.
Iters.
654
54.5
Avg. Rel.
Res.
6.90e-2
11
14.52s
840
76.4
6.99e-2
1e-8
8
10.57s
639
79.9
6.15e-2
1e-9
11
18.52s
1139
104
8.65e-2
1e-10
19
44.41s
2708
143
8.90e-2
(Curtis and Haber, 2007)
Numerical Results
n
m
8192
4096
1e-1


Iters.
1e-6
15
Total LS Avg. LS
Iters.
Iters.
264.47s 1992
133
Avg. Rel.
Res.
8.13e-2
1e-7
11
236.51s 1776
161
6.89e-2
1e-8
9
204.51s 1567
174
6.77e-2
1e-9
11
347.66s 2681
244
8.29e-2
1e-10
16
805.14s 6249
391
8.93e-2
(Curtis and Haber, 2007)
Time
Numerical Results
n
m
65536
32768
1e-1


Iters.
1e-6
15
Total LS Avg. LS
Iters.
Iters.
5055.9s 4365
291
Avg. Rel.
Res.
8.46e-2
1e-7
10
4202.6s 3630
363
8.87e-2
1e-8
12
5686.2s 4825
402
7.96e-2
1e-9
12
6678.7s 5633
469
8.77e-2
1e-10
14
14783s
895
8.63e-2
(Curtis and Haber, 2007)
Time
12525
Outline

Problem Formulation



Inexact Framework




Merit function and sufficient decrease
Satisfying first order conditions
Numerical Results



Unconstrained optimization and nonlinear equations
Stopping conditions for linear solver
Global Behavior


Equality constrained optimization
Sequential Quadratic Programming
Model inverse problem
Accuracy tradeoffs
Final Remarks


Future work
Negative curvature
Review and Future Challenges

Review
 Defined a globally convergent inexact SQP algorithm
 Require only inexact solutions of primal-dual system
 Require
only matrix-vector products involving
objective and constraint function derivatives
 Results also apply when only reduced Hessian of
Lagrangian is assumed to be positive definite
 Numerical experience on model problem is promising

Future challenges
 (Nearly) Singular constraint Jacobians
 Inexact derivative information
 Negative curvature
 etc., etc., etc….
Negative Curvature
 Big question
 What
is the best way to handle negative curvature
(i.e., when the reduced Hessian may be indefinite)?

Small question
 What
is the best way to handle negative curvature in
the context of our inexact SQP algorithm?
 We have no inertia information!

Smaller question
 When can we handle negative curvature in the
context of our inexact SQP algorithm with NO
algorithmic modifications?
 When do we know that a given step is OK?
 Our analysis of the inexact case leads to a few
observations…
Why Quadratic Models?
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
xk
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Why Quadratic Models?
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
xk
Provides a good…
• direction? Yes
• step length? Yes
min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
Provides a good…
• direction? Maybe
• step length? Maybe
Why Quadratic Models?
W

A
 g  AT     
A  d 
    
 
0   
 c  r 
T
xk


min
g T d  12 d TWd
s.t.
c  Ad  0
d
xk
One can use our stopping criteria as a mechanism for
determining which are good directions
All that needs to be determined is whether the step
lengths are acceptable
Unconstrained Optimization
Hd   g  

min g T d  12 d T Hd
d
Direct method is the angle test
 gT d   g d

Indirect method is to check the conditions
   g , d Hd   d
T
or
 g d  g , d  g
T
2
2
Unconstrained Optimization
Hd   g  

min g T d  12 d T Hd
d
Direct method is the angle test
 gT d   g d

Indirect method is to check the conditions
   g , d Hd   d
T
or
step quality
 g d  g , d  g
T
2
2
step length
Constrained Optimization
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
g T d  12 d TWd
s.t.
c  Ad  0
d
Step quality determined by
r   c , 0   1
  c , 0

min
r  , 0  
or
   g  AT  , 0    1
mred (d )   c
Step length determined by
d Wd   d
T
2
or
d   max  c , r 
Thanks!
Actual Stopping Criteria
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
g T d  12 d TWd
s.t.
c  Ad  0
d
Stopping conditions: 0   ,   1, 0   , 
r  c
  c

min
r   max  c , 1
or

  max  c ,  g  AT 
mred (d )   max  c , r  c 
Model reduction condition
 g T d  2 d TWd    c  r    max  c , r  c 

Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g T d  2 d T Wd
Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g T d  2 d T Wd
Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g T d  2 d T Wd
Some region
around the
exact solution
Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g d  2 d Wd
T

T
Ellipse distorted
toward the
linearized
constraints
Constraint Feasible Case
W

A

 g  AT     
A  d 
    
 
0   
 c  r 
T
min
g T d  12 d TWd
s.t.
c  Ad  0
d
If feasible, conditions reduce to
xk
r 
   g  AT 
1   
r   g T d  2 d T Wd