VLSI Physical Design, Springer Verlag
Download
Report
Transcript VLSI Physical Design, Springer Verlag
© KLMH
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8 – Timing Closure
Original Authors:
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
1
Lienig
Andrew B. Kahng, Jens Lienig, Igor L. Markov, Jin Hu
© KLMH
Chapter 8 – Timing Closure
8.1
Introduction
8.2
Timing Analysis and Performance Constraints
8.2.1 Static Timing Analysis
8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
8.3.1 Net-Based Techniques
8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
8.4.2 Prim-Dijkstra Tradeoff
8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
8.5.1 Gate Sizing
8.5.2 Buffering
8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
2
Lienig
8.7 Conclusions
Introduction
© KLMH
8.1
System Specification
Partitioning
Architectural Design
ENTITY test is
port a: in bit;
end ENTITY test;
Functional Design
and Logic Design
Chip Planning
Circuit Design
Placement
Physical Design
DRC
LVS
ERC
Physical Verification
and Signoff
Clock Tree Synthesis
Signal Routing
Fabrication
Timing Closure
Packaging and Testing
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
3
Lienig
Chip
Introduction
© KLMH
8.1
IC layout must satisfy geometric constraints, electrical constraints,
power & thermal constraints as well as timing constraints
Setup (long-path) constraints
Hold (short-path) constraints
Chip designers must complete timing closure
Optimization process that meets timing constraints
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
4
Lienig
Integrates point optimizations discussed in previous chapters, e.g.,
placement and routing, with specialized methods to improve circuit performance
Introduction
© KLMH
8.1
Components of timing closure covered in this lecture:
Timing-driven placement (Sec. 8.3) minimizes signal delays
when assigning locations to circuit elements
Timing-driven routing (Sec. 8.4) minimizes signal delays
when selecting routing topologies and specific routes
Physical synthesis (Sec. 8.5) improves timing by changing the netlist
Sizing transistors or gates: increasing the width:length ratio of transistors
to decrease the delay or increase the drive strength of a gate
Inserting buffers into nets to decrease propagation delays
Restructuring the circuit along its critical paths
Performance-driven physical design flow (Sec. 8.6)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
5
Lienig
Introduction
© KLMH
8.1
Timing optimization engines must estimate circuit delays quickly and accurately
to improve circuit timing
Timing optimizers adjust propagation delays through circuit components,
with the primary goal of satisfying timing constraints, including
Setup (long-path) constraints, which specify the amount of time a data input signal
should be stable (steady) before the clock edge for each storage element
(e.g., flip-flop or latch)
Hold-time (short-path) constraints, which specify the amount of time a data input
signal should be stable after the clock edge at each storage element
VLSI Physical Design: From Graph Partitioning to Timing Closure
tcombDelay t hold t skew
Chapter 8: Timing Closure
6
Lienig
tcycle tcombDelay t setup t skew
Introduction
Timing closure is the process of satisfying timing constraints
through layout optimizations and netlist modifications
Industry jargon: “the design has closed timing”
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
7
Lienig
© KLMH
8.1
Timing Analysis and Performance Constraints
© KLMH
8.2
8.1
Introduction
8.2
Timing Analysis and Performance Constraints
8.2.1 Static Timing Analysis
8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
8.3.1 Net-Based Techniques
8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
8.4.2 Prim-Dijkstra Tradeoff
8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
8.5.1 Gate Sizing
8.5.2 Buffering
8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
8
Lienig
8.7 Conclusions
Timing Analysis and Performance Constraints
© KLMH
8.2
Sequential circuit, “unrolled” in time
Combinational
Logic
FF
Copy 1
Combinational
Logic
Copy 2
FF
Combinational
Logic
FF
Copy 3
Combinational logic
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
9
Lienig
Storage elements
© 2011 Springer Verlag
Clock
Timing Analysis and Performance Constraints
© KLMH
8.2
Main delay concerns in sequential circuits
Gate delays are due to gate transitions
Wire delays are due to signal propagation along wires
Clock skew is due to the difference in time the sequential elements activate
Need to quickly estimate sequential circuit timing
Perform static timing analysis (STA)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
10
Lienig
Assume clock skew is negligible, postpone until after clock network synthesis
Static Timing Analysis
© KLMH
8.2.1
STA: assume worst-case scenario where every gate transitions
Given combinational circuit, represent as directed acyclic graph (DAG)
Every edge (node) has weight = wire (gate) delay
Compute the slack = RAT – AAT for each node
RAT is the required arrival time, latest time signal can transition
AAT is the actual arrival time
By convention, AAT is defined at the output of every node
Negative slack at any output means the circuit does not meet timing
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
11
Lienig
Positive slack at all outputs means the circuit meets timing
Static Timing Analysis
© KLMH
8.2.1
Combinational circuit as DAG
(0.15)
a
y (2)
(0.2)
w (2)
(0.2)
f
(0.1)
(0.1) x (1)
(0.3)
z (2)
(0.1)
a (0)
(0)
(0)
y (2)
(0.15)
(0.1)
b (0) (0.1)
(0.2)
w (2) (0.2)
x (1)
(0.3)
(0.6)
c (0)
(0.1)
VLSI Physical Design: From Graph Partitioning to Timing Closure
f (0)
(0.25)
z (2)
Chapter 8: Timing Closure
12
Lienig
c
s
(0.25)
© 2011 Springer Verlag
b
Static Timing Analysis
© KLMH
8.2.1
Compute AATs at each node:
AAT(v) max AAT(u) t (u, v)
uFI (v )
where FI(v) is the fanin nodes, and t(u,v) is the delay between u and v
(AATs of inputs are given)
y (2)
(0.15)
A0
A 3.2
s
(0)
A0
(0.1)
b (0) (0.1)
A0
w (2) (0.2)
x (1)
A 1.1
A 5.65
(0.3)
(0.6)
c (0)
(0.2)
(0.1)
A 0.6
VLSI Physical Design: From Graph Partitioning to Timing Closure
f (0)
A 5.85
(0.25)
z (2)
A 3.4
Chapter 8: Timing Closure
13
Lienig
(0)
© 2011 Springer Verlag
a (0)
Static Timing Analysis
© KLMH
8.2.1
Compute RATs at each node:
RAT(v) min RAT(u) t (u, v)
uFO(v )
where FO(v) are the fanout nodes, and t(u,v) is the delay between u and v
(RATs of outputs are given)
y (2)
(0.15)
R 0.95
R 3.1
s
(0)
R -0.35
(0.1)
b (0) (0.1)
R -0.35
w (2) (0.2)
x (1)
R 0.75
R 5.3
(0.3)
(0.6)
c (0)
(0.2)
(0.1)
R 0.95
VLSI Physical Design: From Graph Partitioning to Timing Closure
f (0)
R 5.5
(0.25)
z (2)
R 3.05
Chapter 8: Timing Closure
14
Lienig
(0)
© 2011 Springer Verlag
a (0)
Static Timing Analysis
© KLMH
8.2.1
Compute slacks at each node:
slack(v) RAT(v) AAT(v)
A0
R -0.35
S -0.35
(0)
b (0) (0.1)
(0.6)
A0
R -0.35
S -0.35
c (0)
A 3.2
(0.1) R 3.1 (0.2)
S -0.1
w (2) (0.2)
x (1)
A 1.1
R 0.75 (0.3)
S -0.35
(0.1)
A 0.6
R 0.95
S 0.35
VLSI Physical Design: From Graph Partitioning to Timing Closure
A 5.65
(0.25) R 5.3
S -0.35
f (0)
A 5.85
R 5.5
S -0.35
z (2)
A 3.4
R 3.05
S -0.35
Chapter 8: Timing Closure
15
Lienig
s
(0)
A0
R 0.95
S 0.95
y (2)
(0.15)
© 2011 Springer Verlag
a (0)
Delay Budgeting with the Zero-Slack Algorithm
© KLMH
8.2.2
Establish timing budgets for nets
Gate and wire delays must be optimized during timing driven layout design
Wire delays depend on wire lengths
Wire lengths are not known until after placement and routing
Delay budgeting with the zero-slack algorithm
Let vi be the logic gates
Let ei be the nets
Let DELAY(v) and DELAY(e) be the delay of the gate and net, respectively
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
16
Lienig
Timing budget TB(v) of a gate corresponds to DELAY(v) + DELAY(e)
Delay Budgeting with the Zero-Slack Algorithm
© KLMH
8.2.2
Input: timing graph G(V,E)
Output: timing budgets TB for each v V
1. do
2. (AAT,RAT,slack) = STA(G)
3. foreach (vi V)
4.
TB[vi] = DELAY(vi) + DELAY(ei)
5. slackmin = ∞
6. foreach (v V)
7.
if ((slack[v] < slackmin) and (slack[v] > 0))
8.
slackmin = slack[v]
9.
vmin = v
10. if (slackmin ≠ ∞)
11.
path = vmin
12.
ADD_TO_FRONT(path,BACKWARD_PATH(vmin,G))
13.
ADD_TO_BACK(path,FORWARD_PATH(vmin,G))
14.
s = slackmin / |path|
15.
for (i = 1 to |path|)
16.
node = path[i]
// evenly distribute
17.
TB[node] = TB[node] + s
// slack along path
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
17
Lienig
18. while (slackmin ≠ ∞)
Delay Budgeting with the Zero-Slack Algorithm
© KLMH
8.2.2
Forward Path Search (FORWARD_PATH(vmin,G))
Input: node vmin with minimum slack slackmin, timing graph G
Output: maximal downstream path path from vmin such that no node v V affects
the slack of path
1. path = vmin
2. do
3. flag = false
4. node = LAST_ELEMENT(path)
5. foreach (fanout node fo of node)
6.
if ((RAT[fo] == RAT[node] + TB[fo]) and (AAT[fo] == AAT[node] + TB[fo]))
7.
ADD_TO_BACK(path,fo)
8.
flag = true
9.
break
10. while (flag == true)
VLSI Physical Design: From Graph Partitioning to Timing Closure
// remove vmin
Chapter 8: Timing Closure
18
Lienig
11. REMOVE_FIRST_ELEMENT(path)
Delay Budgeting with the Zero-Slack Algorithm
© KLMH
8.2.2
Backward Path Search (BACKWARD_PATH(vmin,G))
Input: node vmin with minimum slack slackmin, timing graph G
Output: maximal upstream path path from vmin such that no node v V affects the
slack of path
1.
path = vmin
2.
do
3.
flag = false
4.
node = FIRST_ELEMENT(path)
5.
foreach (fanin node fi of node)
6.
if ((RAT[fi] == RAT[node] – TB[fi]) and (AAT[fi] == AAT[node] – TB[fi]))
7.
ADD_TO_FRONT(path,fi)
8.
flag = true
9.
break
10. while (flag == true)
VLSI Physical Design: From Graph Partitioning to Timing Closure
// remove vmin
Chapter 8: Timing Closure
19
Lienig
11. REMOVE_LAST_ELEMENT(path)
Delay Budgeting with the Zero-Slack Algorithm Example
Example: Use the zero-slack algorithm to distribute slack
Format: <AAT, Slack, RAT>, [timing budget]
© KLMH
8.2.2
O1: <13,4,17>
I1
I2
I3
<1,4,5> [0]
O2: <6,8,14>
<3,4,7> [0]
2
<0,5,5> [0]
4
<7,4,11> [0]
<13,4,17> [0]
O1
6
<1,6,7> [0]
<6,8,14> [0]
<3,5,8> [0]
VLSI Physical Design: From Graph Partitioning to Timing Closure
<6,5,11> [0]
0
Chapter 8: Timing Closure
O2
20
Lienig
3
I4
Delay Budgeting with the Zero-Slack Algorithm Example
Example: Use the zero-slack algorithm to distribute slack
Format: <AAT, Slack, RAT>, [timing budget]
Find the path with the minimum nonzero slack
© KLMH
8.2.2
O1: <13,4,17>
I1
I2
I3
<1,4,5> [0]
O2: <6,8,14>
<3,4,7> [0]
2
<0,5,5> [0]
4
<7,4,11> [0]
<13,4,17> [0]
O1
6
<1,6,7> [0]
<6,8,14> [0]
<3,5,8> [0]
VLSI Physical Design: From Graph Partitioning to Timing Closure
<6,5,11> [0]
0
Chapter 8: Timing Closure
O2
21
Lienig
3
I4
Delay Budgeting with the Zero-Slack Algorithm Example
Example: Use the zero-slack algorithm to distribute slack
Format: <AAT, Slack, RAT>, [timing budget]
Find the path with the minimum slack
Distribute the slacks and update the timing budgets
© KLMH
8.2.2
O1: <17,0,17>
I1
I2
I3
<1,0,1> [1]
O2: <6,8,14>
<3,0,4> [1]
2
<0,2,2> [0]
4
<9,0,9> [1]
<16,0,16> [1]
O1
6
<1,4,5> [0]
<6,8,14> [0]
<3,4,7> [0]
VLSI Physical Design: From Graph Partitioning to Timing Closure
<6,4,10> [0]
0
Chapter 8: Timing Closure
O2
22
Lienig
3
I4
Delay Budgeting with the Zero-Slack Algorithm Example
Example: Use the zero-slack algorithm to distribute slack
Format: <AAT, Slack, RAT>, [timing budget]
Find the path with the minimum slack
Distribute the slacks and update the timing budgets
© KLMH
8.2.2
O1: <17,0,17>
I1
I2
I3
<1,0,1> [1]
O2: <6,8,14>
<4,0,4> [1]
2
<0,0,0> [2]
4
<9,0,9> [1]
<16,0,16> [1]
O1
6
<1,4,5> [0]
<6,8,14> [0]
<3,4,7> [0]
VLSI Physical Design: From Graph Partitioning to Timing Closure
<6,4,10> [0]
0
Chapter 8: Timing Closure
O2
23
Lienig
3
I4
Delay Budgeting with the Zero-Slack Algorithm Example
Example: Use the zero-slack algorithm to distribute slack
Format: <AAT, Slack, RAT>, [timing budget]
Find the path with the minimum slack
Distribute the slacks and update the timing budgets
© KLMH
8.2.2
O1: <16,0,16>
I1
I2
I3
<1,0,1> [1]
O2: <6,8,14>
<4,0,4> [1]
2
<0,0,0> [2]
4
<9,0,9> [1]
<16,0,16> [1]
O1
6
<1,2,3> [2]
<6,8,14> [0]
<3,2,5> [0]
VLSI Physical Design: From Graph Partitioning to Timing Closure
<6,2,8> [2]
0
Chapter 8: Timing Closure
O2
24
Lienig
3
I4
Delay Budgeting with the Zero-Slack Algorithm Example
Example: Use the zero-slack algorithm to distribute slack
Format: <AAT, Slack, RAT>, [timing budget]
Find the path with the minimum slack
Distribute the slacks and update the timing budgets
© KLMH
8.2.2
O1: <17,0,17>
I1
I2
I3
<1,0,1> [1]
O2: <10,4,14>
<4,0,4> [1]
2
<0,0,0> [2]
4
<9,0,9> [1]
<16,0,16> [1]
O1
6
<1,0,1> [3]
<10,4,14> [0]
<3,1,4> [0]
VLSI Physical Design: From Graph Partitioning to Timing Closure
<7,0,7> [3]
0
Chapter 8: Timing Closure
O2
25
Lienig
3
I4
Delay Budgeting with the Zero-Slack Algorithm Example
Example: Use the zero-slack algorithm to distribute slack
Format: <AAT, Slack, RAT>, [timing budget]
Find the path with the minimum slack
Distribute the slacks and update the timing budgets
© KLMH
8.2.2
O1: <17,0,17>
I1
I2
I3
<1,0,1> [1]
O2: <10,4,14>
<4,0,4> [1]
2
<0,0,0> [2]
4
<9,0,9> [1]
<16,0,16> [1]
O1
6
<1,0,1> [3]
<10,4,14> [4]
<3,0,3> [1]
VLSI Physical Design: From Graph Partitioning to Timing Closure
<7,0,7> [3]
0
Chapter 8: Timing Closure
O2
26
Lienig
3
I4
Timing-Driven Placement
© KLMH
8.3
8.1
Introduction
8.2
Timing Analysis and Performance Constraints
8.2.1 Static Timing Analysis
8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
8.3.1 Net-Based Techniques
8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
8.4.2 Prim-Dijkstra Tradeoff
8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
8.5.1 Gate Sizing
8.5.2 Buffering
8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
27
Lienig
8.7 Conclusions
Timing-Driven Placement
© KLMH
8.3
Timing-driven placement optimizes circuit delay
to satisfy timing constraints
Let T be the set of all timing endpoints
Constraint satisfaction is measured by worst negative slack (WNS)
W NS minslack(τ)
τ
Or total negative slack (TNS)
TNS
slack(τ)
τ, slack ( τ ) 0
Classifications: net-based, path-based, integrated
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
28
Lienig
Net-Based Techniques
© KLMH
8.3.1
Net weights are added to each net – placer optimizes weighted wirelength
Static net weights: computed before placement (never changes)
ω1 if slack 0
Discrete net weights: w
where ω1 > 0, ω2 > 0, and ω2 > ω1
ω
if
slack
0
2
slack
Continuous net weights: w 1
t
α
where t is the longest path delay
and α is a criticality exponent
Based on net sensitivity to TNS and slack
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
29
Lienig
w wo α(slacktarget slack) swSLACK β swTNS
Net-Based Techniques
© KLMH
8.3.1
Dynamic net weights: (re)computed during placement
Estimate slack at every iteration:
slackk slackk 1 sLDELAY L
where ΔL is the change in wirelength
1
2 υ k 1 1 if among the top 3% of critical nets
Update net criticality: υ k
1
υ k 1
otherwise
2
Update net weight:
Variations include updating every j iterations, different relations
between criticality and net weight
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
30
Lienig
wk wk 1 1 υ k
Embedding STA into Linear Programs for Placement
© KLMH
8.3.2
Construct a set of constraints for timing-driven placement
Physical constraints define locations of cells
Timing constraints define slack requirements
Optimize an optimization objective
Improving worst negative slack (WNS)
Improving total negative slack (TNS)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
31
Lienig
Improving a combination of both WNS and TNS
Embedding STA into Linear Programs for Placement
© KLMH
8.3.2
For physical constraints, let:
xv and yv be the center of cell v V
Ve be the set of cells connected to net e E
left(e), right(e), bottom(e), and top(e) respectively be the coordinates
of the left, right, bottom, and top boundaries of e’s bounding box
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
32
Lienig
δx(v,e) and δy(v,e) be pin offsets from xv and yv for v’s pin connected to e
Embedding STA into Linear Programs for Placement
© KLMH
8.3.2
Then, for all v Ve:
left(e) xv δ x (v, e)
right(e) xv δ x (v, e)
bottom(e) yv δ y (v, e)
top(e) yv δ y (v, e)
Define e’s half-perimeter wirelength (HPWL):
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
33
Lienig
L(e) right(e) left(e) top(e) bottom(e)
Embedding STA into Linear Programs for Placement
© KLMH
8.3.2
For timing constraints, let
tGATE(vi,vo) be the gate delay from an input pin vi to the output pin vo for cell v
tNET(e,uo,vi) be net e’s delay from cell u’s output pin uo to cell v’s input pin vi
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
34
Lienig
AAT(vj) be the arrival time on pin j of cell v
Embedding STA into Linear Programs for Placement
© KLMH
8.3.2
For every input pin vi of cell v :
AAT(vi ) AAT(uo ) t NET (uo , vi )
For every output pin vo of cell v :
AAT(vo ) AAT(vi ) tGATE (vi , vo )
For every pin τp in a sequential cell τ:
slack(τ p ) RAT(τ p ) AAT(τ p )
Ensure that every slack(τp) 0
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
35
Lienig
Embedding STA into Linear Programs for Placement
© KLMH
8.3.2
Optimize for total negative slack:
max :
slack(τ
p)
τ p Pins( τ), τ
Optimize for worst negative slack:
max : WNS
Optimize а linear combination of multiple parameters:
min :
L(e) α W NS
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
36
Lienig
eE
Timing-Driven Routing
© KLMH
8.4
8.1
Introduction
8.2
Timing Analysis and Performance Constraints
8.2.1 Static Timing Analysis
8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
8.3.1 Net-Based Techniques
8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
8.4.2 Prim-Dijkstra Tradeoff
8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
8.5.1 Gate Sizing
8.5.2 Buffering
8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
37
Lienig
8.7 Conclusions
Timing-Driven Routing
© KLMH
8.4
Timing-driven routing seeks to minimize:
Maximum sink delay: delay from the source to any sink in a net
Total wirelength: routed length of the net
For a signal net net, let
s0 be the source node
sinks = {s1, … ,sn} be the sinks
G = (V,E) be a corresponding weighted graph where:
V = {v0,v1, … ,vn} represents the source and sink nodes of net, and
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
38
Lienig
the weight of an edge e(vi,vj) E represents the routing cost between vi and vj
Timing-Driven Routing
© KLMH
8.4
For any spanning tree T over G, let:
radius(T) be the length of the longest source-sink path in T
cost(T) be the total edge weight of T
Trade off between “shallow” and “light” trees
“Shallow” trees have minimum radius
Shortest-paths tree
Constructed by Dijkstra’s Algorithm
“Light” trees have minimum cost
Minimum spanning tree (MST)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
39
Lienig
Constructed by Prim’s Algorithm
Timing-Driven Routing
© KLMH
8.4
2
3
2
2
3
3
6
6
7
5
5
s0
s0
radius(T) = 8
cost(T) = 20
radius(T) = 13
cost(T) = 13
radius(T) = 11
cost(T) = 16
“Shallow”
“Light”
Tradeoff between
shallow and light
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
40
Lienig
s0
© 2011 Springer Verlag
5
The Bounded-Radius, Bounded-Cost Algorithm
© KLMH
8.4.1
Trades off radius for cost by setting upper bounds on both
In the bounded-radius, bounded-cost (BRBC) algorithm, let:
TS be the shortest-paths tree
TM be the minimum spanning tree
TBRBC is the tree constructed with parameter ε that satisfies:
radius (TBRBC ) (1 ε) radius (TS )
and
When ε = 0, TBRBC has minimum radius
When ε = ∞, TBRBC has minimum cost
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
41
Lienig
2
cost (TBRBC ) 1 cost (TM )
ε
Prim-Dijkstra Tradeoff
© KLMH
8.4.2
Prim-Dijkstra Tradeoff based on Prim’s algorithm and Dijkstra’s algorithm
From the set of sinks S, iteratively add sink s based on different cost function
Prim’s algorithm cost function:
Dijkstra’s algorithm cost function:
Prim-Dijkstra Tradeoff cost function:
cost(s0 , si ) cost(si , s j )
γ cost(s0 , si ) cost(si , s j )
γ is a constant between 0 and 1
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
42
Lienig
cost(si , s j )
Prim-Dijkstra Tradeoff
© KLMH
8.4.2
4
4
7
11
8
8
7
9
s0
radius(T) = 19
cost(T) = 35
radius(T) = 15
cost(T) = 39
γ = 0.25
γ = 0.75
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
43
Lienig
s0
© 2011 Springer Verlag
9
7
Minimization of Source-to-Sink Delay
© KLMH
8.4.3
Iteratively forms a tree by adding sinks, and optimizes for critical sink(s)
In the critical-sink routing tree (CSRT) problem, minimize:
n
α(i) t (s , s )
0
i
i 1
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
44
Lienig
where α(i) are sink criticalities for sinks si, and t(s0,si) is the delay from s0 to si
Minimization of Source-to-Sink Delay
© KLMH
8.4.3
In the critical-sink Steiner tree problem, construct a
minimum-cost Steiner tree T for all sinks except for the most critical sink sc
Add in the critical sink by:
H0: a single wire from sc to s0
H1: the shortest possible wire that can join sc to T, so long as the path
from s0 to sc is the shortest possible total length
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
45
Lienig
HBest: try all shortest connections from sc to edges in T and from sc to s0.
Perform timing analysis on each of these trees and pick the one
with the lowest delay at sc
Physical Synthesis
© KLMH
8.5
8.1
Introduction
8.2
Timing Analysis and Performance Constraints
8.2.1 Static Timing Analysis
8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
8.3.1 Net-Based Techniques
8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
8.4.2 Prim-Dijkstra Tradeoff
8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
8.5.1 Gate Sizing
8.5.2 Buffering
8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
46
Lienig
8.7 Conclusions
Physical Synthesis
© KLMH
8.5
Physical synthesis is a collection of timing optimizations to fix negative slack
Consists of creating timing budgets and performing timing corrections
Timing budgets include:
allocating target delays along paths or nets
often during placement and routing stages
can also be during timing correction operations
Timing corrections include:
gate sizing
buffer insertion
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
47
Lienig
netlist restructuring
Gate Sizing
© KLMH
8.5.1
size (vC ) size (vB ) size (v A )
Let a gate v have 3 sizes A, B, C, where:
Gate with a larger size has lower output resistance
When load capacitances are large:
t (vC ) t (v B ) t (v A )
Gate with a smaller size has higher output resistance
When load capacitances are small:
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
48
Lienig
t (vC ) t (v B ) t (v A )
Gate Sizing
© KLMH
8.5.1
Let a gate v have 3 sizes A, B, C, where:
size (vC ) size (vB ) size (v A )
40
40
35
35
Delay (ps)
30
25
20
15
23
18
24
21
25
24
30
27
30
26
27
33
28
A
B
C
20
15
0.5
1.0
1.5
2.0
2.5
Load Capacitance (fF)
VLSI Physical Design: From Graph Partitioning to Timing Closure
3.0
Chapter 8: Timing Closure
49
Lienig
5
© 2011 Springer Verlag
10
Gate Sizing
© KLMH
8.5.1
a
b
vA
v
d
C(d) = 1.5
e
C(e) = 1.0
f
C(f) = 0.5
t(vA) = 40
VLSI Physical Design: From Graph Partitioning to Timing Closure
C(d) = 1.5
e
C(e) = 1.0
f
C(f) = 0.5
a
b
vC
d
C(d) = 1.5
e
C(e) = 1.0
f
C(f) = 0.5
t(vC) = 28
Chapter 8: Timing Closure
50
Lienig
a
b
d
Buffering
© KLMH
8.5.2
Buffer: a series of two serially-connected inverters
Improve delays by
speeding up the circuit or serving as delay elements
changing transition times
shielding capacitive load
Drawbacks:
Increased area usage
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
51
Lienig
Increased power consumption
Buffering
© KLMH
8.5.2
vB
f C(f) = 1
g C(g) = 1
C(vB) = 5 fF
h C(h) = 1
t(vB) = 45 ps
d C(d) = 1
e C(e) = 1
a
b
vB
y
C(vB) = 3 fF
f C(f) = 1
g C(g) = 1
h C(h) = 1
t(vB) = 33 ps
C(y) = 3 fF
t(y) = t(vB) + t(y) = 66 ps
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
52
Lienig
a
b
d C(d) = 1
e C(e) = 1
Netlist Restructuring
© KLMH
8.5.3
Netlist restructuring only changes existing gates, does not change functionality
Changes include
Cloning: duplicating gates
Redesign of fanin or fanout tree: changing the topology of gates
Swapping communicative pins: changing the connections
Gate decomposition: e.g., changing AND-OR to NAND-NAND
Boolean restructuring: e.g., applying Boolean laws to change circuit gates
Can also do reverse transformations of above, e.g., downsizing, merging
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
53
Lienig
Netlist Restructuring
© KLMH
8.5.3
a
b
vB
d
e
C(d) = 1
C(e) = 1
f
C(f) = 1
g
C(g) = 1
h
C(h) = 1
a
d C(d) = 1
e C(e) = 1
vA
b
f
vB
C(f) = 1
g C(g) = 1
h C(h) = 1
and reduce downstream capacitance
d
e
v
f
…
…
a
b
d
v
e
f
g
h
VLSI Physical Design: From Graph Partitioning to Timing Closure
v’
…
g
h
Chapter 8: Timing Closure
54
Lienig
a
b
Cloning can reduce fanout capacitance
© 2011 Springer Verlag
Netlist Restructuring
© KLMH
8.5.3
Redesigning the fanin tree can change AATs
a <4>
b <3>
(1)
f <6>
(1)
b <3>
c <1>
d <0>
(1)
f <5>
(1)
(1)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
55
Lienig
© 2011 Springer Verlag
c <1>
d <0>
a <4>
(1)
Netlist Restructuring
© KLMH
8.5.3
Redesigning fanout trees can change delays on specific paths
path1
path1
y1 (1)
(1)
(1)
(1)
(1)
y2 (1)
path2
path2
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
56
Lienig
© 2011 Springer Verlag
y2 (1)
Netlist Restructuring
© KLMH
8.5.3
Swapping commutative pins can change the final delay
a <0>
b <1>
f <5>
(1)
(2)
c <2>
b <1>
a <0>
(1)
(1)
f <3>
(1)
(2)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
57
Lienig
© 2011 Springer Verlag
c <2>
(1)
(1)
Netlist Restructuring
© KLMH
8.5.3
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
58
Lienig
© 2011 Springer Verlag
Gate decomposition can change the general structure of the circuit
Netlist Restructuring
© KLMH
8.5.3
Boolean restructuring uses laws or properties,
e.g., distributive law, to change circuit topology
(a + b)(a + c) = a + bc
a <4>
(1)
b <1>
x <6>
(1)
y <6>
x <5>
(1)
y <6>
(1)
(1)
(1)
(1)
x(a,b,c) = (a + b)(a + c)
x(a,b,c) = a + bc
y(a,b,c) = (a + c)(b + c)
y(a,b,c) = ab + c
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
59
Lienig
c <2>
(1)
b <1>
c <2>
(1)
© 2011 Springer Verlag
a <4>
Performance-Driven Design Flow
© KLMH
8.6
8.1
Introduction
8.2
Timing Analysis and Performance Constraints
8.2.1 Static Timing Analysis
8.2.2 Delay Budgeting with the Zero-Slack Algorithm
8.3 Timing-Driven Placement
8.3.1 Net-Based Techniques
8.3.2 Embedding STA into Linear Programs for Placement
8.4 Timing-Driven Routing
8.4.1 The Bounded-Radius, Bounded-Cost Algorithm
8.4.2 Prim-Dijkstra Tradeoff
8.4.3 Minimization of Source-to-Sink Delay
8.5 Physical Synthesis
8.5.1 Gate Sizing
8.5.2 Buffering
8.5.3 Netlist Restructuring
8.6 Performance-Driven Design Flow
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
60
Lienig
8.7 Conclusions
Performance-Driven Design Flow
© KLMH
8.6
Baseline Physical Design Flow
1. Floorplanning, I/O placement, power planning
2. Logic synthesis and technology mapping
3. Global placement and sequential element legalization
4. Clock network synthesis
5. Global routing and layer assignment
6. Congestion-driven detailed placement and legalization
7. Detailed routing
8. Design for manufacturing
9. Physical verification
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
61
Lienig
10. Mask optimization and generation
© KLMH
Performance-Driven Design Flow
Video
Video
Pre/Postprocessing Codec DSP
Control + DSP
Audio
Audio
Pre/Postprocessing
Codec
Embedded
Protocol
Controller for
Processing
Dataplane
Processing
Baseband DSP
Baseband
PHY
MAC/Control
Main Applications
Security
Memory
CPU
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
62
Lienig
Analog-to-Digital and
Digital-to-Analog Converter
Analog Processing
Floorplanning Example
© 2011 Springer Verlag
8.6
Performance-Driven Design Flow
© KLMH
8.6
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
63
Lienig
© 2011 Springer Verlag
Global Placement Example
Performance-Driven Design Flow
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
64
Lienig
© 2011 Springer Verlag
Clock Network Synthesis Example
© KLMH
8.6
Performance-Driven Design Flow
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
65
Lienig
© 2011 Springer Verlag
Global Routing Congestion Example
© KLMH
8.6
Performance-Driven Design Flow
© KLMH
8.6
Chip Planning and Logic Design
Performance-Driven
Block Shaping, Sizing
and Placement
I/O Placement
Chip Planning
With Optional Net Weights
Performance-Driven
Block-Level
Delay Budgeting
Single Global Net Routes
and Buffering
Trial Synthesis and
Floorplanning
fails
Logic Synthesis and
Technology Mapping
Power Planning
RTL
Timing Estimation
passes
(see full flow chart in Figure 8.26)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
66
Lienig
© 2011 Springer Verlag
Block-level or Top-level Global Placement
Performance-Driven Design Flow
© KLMH
8.6
Block-level or Top-level Global Placement
Global Placement
Obstacle-Avoiding Single
Global Net Topologies
With Optional Net Weights
Physical Buffering
Delay Estimation
Using Buffers
OR
Layer Assignment
Virtual Buffering
Buffer Insertion
Physical Synthesis
(see full flow chart in Figure 8.26)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
67
Lienig
Static
Timing Analysis
passes with
fixable violations
© 2011 Springer Verlag
fails
Performance-Driven Design Flow
© KLMH
8.6
Physical Synthesis
Timing Correction
Timing-Driven
Restructuring
Boolean Restructuring
and Pin Swapping
AND
passes
Gate Sizing
Redesign of Fanin
and Fanout Trees
Routing
(see full flow chart in Figure 8.26)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
68
Lienig
Static
Timing Analysis
© 2011 Springer Verlag
fails
Performance-Driven Design Flow
© KLMH
8.6
Routing
Legalization of
Sequential Elements
Clock Network Synthesis
With Layer Assignment
Timing-Driven Routing
passes
(Re-)Buffering and
Timing Correction
Global Routing
fails
Static
Timing Analysis
Detailed Routing
Timing-driven
Legalization + CongestionDriven Detailed Placement
2.5D or 3D
Parasitic Extraction
(see full flow chart in Figure 8.26)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
69
Lienig
© 2011 Springer Verlag
Sign-off
Performance-Driven Design Flow
© KLMH
8.6
Sign-off
passes
Manufacturability,
Electrical, Reliability
Verification
passes
Mask Generation
Design Rule Checking
Layout vs. Schematic
Antenna Effects
Electrical Rule Checking
(see full flow chart in Figure 8.26)
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
70
Lienig
Static Timing Analysis
fails
© 2011 Springer Verlag
ECO Placement and Routing
fails
© KLMH
Summary of Chapter 8 – Timing Constraints and Timing Analysis
Circuit delay is measured on signal paths
From primary inputs to sequential elements; from sequentials to primary outputs
From sequentials to sequentials
Components of path delay
Gate delays: over-estimated by worst-case transition per gate
(to ensure fast Static Timing Analysis)
Wire delays: depend on wire length and (for nets with >2 pins) topology
Timing constraints
Actual arrival times (AATs) at primary inputs and output pins of sequentials
Required arrival times (RATs) at primary outputs and input pins of sequentials
Static timing analysis
Two linear-time traversals compute AATs and RATs for each gate (and net)
At each timing point: slack = RAT-AAT
Negative slack = timing violation; critical nets/gates are those with negative slack
Time budgeting: divides prescribed circuit delay into net delay bounds
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
71
Lienig
© KLMH
Summary of Chapter 8 – Timing-Driven Placement
Gate/cell locations affect wire lengths, which affect net delays
Timing-driven placement optimizes gate/cell locations to improve timing
Interacts with timing analysis to identify critical nets, then biases placement opt.
Must keep total wirelength low too, otherwise routing will fail
Timing optimization may increase routing congestion
Placement by net weighting
The least invasive technique for timing-driven placement
Performs tentative placement, then changes net weights based on timing analysis
Placement by net budgeting
Allocates delay bounds for each net; translates delay bounds into length bounds
Performs placement subject to length constraints for individual nets
Placement based on linear programming
Placement is cast as a system of equations and inequalities
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
72
Lienig
Timing analysis and optimization are incorporated using additional inequalities
© KLMH
Summary of Chapter 8 – Timing-Driven Routing
Timing-driven routing has several aspects
Individual nets: trading longer wires for shorter source-to-sink paths
Coupling capacitance and signal integrity: parallel wires act as capacitors
and can slow-down/speed-up signal transitions
Full-netlist optimization: prioritize the nets that should be optimized first
Individual net optimization
One extreme: route each source-to-sink path independently (high wirelength)
Another extreme: use a Minimum Spanning Tree (low wirenegth, high delay)
Tunable tradeoff: a hybrid of Prim and Dijkstra algorithms
Coupling capacitance and signal integrity
Parallel wires are only worth attention when they transition at the same time
Identify critical nets, push neighboring wires further away to limit crosstalk
Full-netlist optimization
Run trial routing, then run timing analysis to identify critical nets
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
73
Lienig
Then adjust accordingly, repeat until convergence
© KLMH
Summary of Chapter 8 – Physical Synthesis
Traditionally, place-and-route have been performed after the netlist is known
However, fixing gate sizes and net topologies early
does not account for placement-aware timing analysis
Gate locations and net routes are not available
Physical synthesis uses information from trial placement to modify the netlist
Net buffering: splits a net into smaller (approx. equal length) segments
A long net has high capacitance, the driver may be too weak
Gate/buffer sizing: increases driver strength & physical size of a gate
Large gates have higher input pin capacitance, but smaller driver resistance
Larger gates can drive larger fanouts, longer nets; faster transitions
Large gates require more space, larger upstream drivers
Gate cloning: splits large fanouts
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 8: Timing Closure
74
Lienig
Cloned gates can be placed separately, unlike with a single larger gate