Scalable Detailed Placement Legalization for Complex Sub

Download Report

Transcript Scalable Detailed Placement Legalization for Complex Sub

Scalable Detailed Placement
Legalization for Complex
Sub-14nm Constraints
Kwangsoo Han, Andrew B. Kahng and Hyein Lee
{kwhan, abk, hyeinlee}@ucsd.edu
http://vlsicad.ucsd.edu/
ECE Department, UC San Diego
Outline
• Motivation & Previous Work
• Problem Formulation
• Our Approach
• Experimental Setup and Result
• Conclusion
2
Motivation
• In old technology nodes, once the library cells were
correctly designed, design rule violations (DRVs)
could not occur during placement
• Limitations of patterning resolution lead to complex
design rules for front-end-of-line (FEOL) layers
• Placing several ‘legal’ standard cells next to each
other may cause violations of FEOL layer rules
Final detailed cell placement phase is needed
to maintain placement legality
with respect to new N10 FEOL rules
3
Cell Layout in N10 Node
• The FEOL layers which affect legal placement
include implant layer, oxide diffusion layer and poly
• Implant layers decide the threshold (Vt) of transistors
• Oxide diffusion (OD) defines the active region of
transistors
• Dummy poly gates are inserted at the (vertical) standard
cell boundaries to avoid edge device variability
A
Y
Fin
Poly
Oxide diffusion (OD)
M1
Middle of line
M2 Power/ground
Cell boundary, implant region
4
(1) Minimum implant width (IW)
• Limitation of the current optical lithography technology
 New design rule (i.e., minimum implant width)
• Two same-Vt cells are misaligned vertically
 A narrow, “staircase” implant layer shape
 Inter-row IW (IW1) violation
• A narrow cell is surrounded by different-Vt cells
 Intra-row IW (IW2) violation
IW2
HVT
HVT
HVT
LVT
LVT
IW1
HVT
HVT
5
(2) Minimum OD jog length (OW)
• Cells can have different oxide diffusion (OD) region
heights
• Lithographic corner rounding
 minimum OD jog length rule
• Cells with different OD heights abutment
 Cause OD jog length violation
OD
OD jog
Cell boundary
6
(3) Drain-drain abutment (DDA)
•
•
•
•
Dummy poly gates create extra dummy transistors
Dummy transistors can induce leakage power
Dummy transistors must be tied off to power/ground rails
Two drain nodes are abutted
• Extra dummy poly gate  tied up with power/ground rails
• Cell flipping/displacement
Drain-drain
abutment
D
D
D
D
D
S D
D
S D
√
S
S
7
Previous Works
• Dynamic programming-based approaches
• Optimal interleaving for intra-row optimization [Hur and Lillis, ICCAD00]
• Row-based placement [Kahng et al., ASPDAC99, GLSVLSI04]
• Integer Linear Programming (ILP)-based approaches
• Placement by branch-and-price [Ramachandaran et al., ASPDAC05]
• MIP-based detailed placement [Li and Koh, ISPD12]
• DDA-aware placement
• [Du and Wong, DATE14] propose a graph model with shortest-path
algorithm
• Use cell flipping and adjacent-cell swapping
• No consideration of inter-row constraints (e.g., IW constraint)
Our work: MILP-based optimization to provide
the comprehensive support of N10-relevent FEOL rules
8
Our Contributions
• Develop a mixed integer linear programming (MILP)based placer, called DFPlacer
• Address new DRVs caused by complex N10 FEOL rules
• Propose a scalable partitioning-based optimization
method
• Incorporate our flow into a commercial tool-based
placement and routing (P&R) flow for evaluation
• Provide insight into timing and area impacts of the
dummy poly gate library cell strategy
• Standard cells with dummy poly gates (DDA and OW
violation free)
• Standard cells without dummy poly gates
9
Outline
• Motivation & Previous Work
• Problem Formulation
• Our Approach
• Experimental Setup and Result
• Conclusion
10
Detailed Placement Problem Formulation
• Input: Placement with design rule violations
• Objective: Legal placement with minimum cell
displacements
• Subject to:
• Minimum implant width (IW) constraint
• Minimum oxide diffusion jog length (OW) constraint
• Drain-drain abutment (DDA) constraint
IW
HVT
OW
HVT
HVT
LVT
LVT
OD
HVT
DDA
D
D
HVT
Cell boundary
11
Outline
• Motivation & Previous Work
• Problem Formulation
• Our Approach
• Experimental Setup and Results
• Conclusion
12
Mixed-ILP Model [Li12]
• Single-cell-placement binary variable λck
• Placement state k (location and orientation) of cell c
• Site occupation variable scrqk
• Represent if site (r,q) is occupied by cell c with placement
state k
λc1 = {0, 1, 0}
sc111
=1
sc211
=1
sc311
=0
(1,7)
(0, 0)
λc2 = {4, 0, 1}
λck = {xc, yc, fc}, where xc(yc) is x(y) location of cell c
fc is an indicator whether c is flipped
[Li12] S. Li and C.-K. Koh, “Mixed Integer Programming Models for Detailed Placement”, Proc. ISPD, 2012, pp. 87-94.
13
Placement Problem Formulation
Objective
min
( 𝑥𝑐 − 𝑥𝑐,𝑖𝑛𝑖𝑡 + 𝑦𝑐 − 𝑦𝑐,𝑖𝑛𝑖𝑡
𝑐𝑒𝑙𝑙𝑠 𝑐
⇒ Minimize
)
displacements
For each cell c
𝑠𝑡𝑎𝑡𝑒𝑠 𝑘
𝜆𝑘𝑐 = 1 ⇒ Select one placement state per cell
Orientation, x/y location, site occupation are determined by λck
𝑓𝑐 =
𝑠𝑡𝑎𝑡𝑒𝑠 𝑘
𝑓𝑐𝑘 𝜆𝑘𝑐
𝑥𝑐 =
𝑠𝑡𝑎𝑡𝑒𝑠 𝑘
𝑥𝑐𝑘 𝜆𝑘𝑐
𝑦𝑐 =
𝑠𝑡𝑎𝑡𝑒𝑠 𝑘
𝑦𝑐𝑘 𝜆𝑘𝑐
𝑠𝑐𝑟𝑞 =
𝑠𝑡𝑎𝑡𝑒𝑠 𝑘
𝑘
𝑠𝑐𝑟𝑞
𝜆𝑘𝑐
Placement constraints
𝑐𝑒𝑙𝑙𝑠 𝑐
𝑠𝑐𝑟𝑞 ≤ 1 No overlap
+ more constraints to support IW, OW and DDA
14
IW Constraints Formulation
• New: 𝑣𝑟𝑞 , a binary vector indicating Vt of the site (r,q)
• Vt boundaries are checked with inter-/intra-row
variables
Vt boundary
Vt boundary
|W| = 3
|W| = 3
At the Vt boundary, at least |W|
consecutive sites must be same Vt
At the Vt boundary where two
vertically neighboring sites are
same Vt, the Vt must be kept for
at least |W| sites in the both
upper and lower rows
15
OW and DDA Constraints Formulation
• Pre-characterize all adjacency conditions which
violate OW and/or DDA for each cell pair
• Add mutual exclusion constraints
• λc1i and λc2 j is forbidden pair

λc1i + λc2 j ≤ 1
λc1i
λc2 j
16
Distributable Global Optimization
• Limitation of MILP-based approach ⇒ Runtime
• Distributable optimization of many windows of cells
• Split the post-route layout into small clips
• Run optimization for each clip with fixed boundaries
• Cells on boundaries are handled by shifting windows
1st iteration
2nd iteration
Fixed cells
clip
Layout
17
Overall Flow
Routed layout w/ DRVs
DFPlacer
Global optimization
Local optimization
Make new windows
Removed overlapping windows
Solve multiple
windows in parallel
Shift partitioning lines
Optimization for each
window
Complex constraints
for N10
Remaining DRVs
ILP
formulation
- DDA, OW, IW
ILP solver
(CPLEX)
Solve multiple
windows in parallel
#DRVs < δ ?
N
Y
Cell location solution
ECO Routing
Routed layout with #DRVs < δ
18
Outline
• Motivation & Previous Work
• Problem Formulation
• Our Approach
• Experimental Setup and Results
• Conclusion
19
Experimental Setup
• SP&R tools: Synopsys Design Compiler H-2013.03-SP3 and Cadence
Encounter Digital Implementation System XL 13.1
• Technology: two kinds of 7nm dual Vt libraries
• 62 standard cells without dummy poly gates (CWOD)
• 62 standard cells with dummy poly gates (CWD)
• Design: AES, JPEG [OpenCores], ARM Cortex M0, ARM Cortex M0 x 3
• *_d – implemented with CWD library
Fin
• *_nd – implemented with CWOD library
[OpenCores] http://opencores.com/
Design
M0_nd
AES_nd
M0x3_nd
Y
8260
12147
A
27248
LVT
(%)
52
54
Util.
(%)
77
78
#Inst
A
JPEG_nd
Y
47948
8238
56
51
51
80
77
77
WL
Inverter
cell layout
114685
142294Inverter
392540cell layout
694624
(um)
in CWD library
in CWOD library
Area
(um2)
7668
8894
M0_d
24463
49629
Poly
AES_ddiffusion
M0x3_d(OD)
JPEG_d
Oxide
M1
12491
26690
48317
Middle of line
54
55
52
M2 Power/ground
Cell80boundary,
79 implant
77 region
116866
150632
409579
764738
8668
10596
27400
55824
20
Experimental Results (1)
• Report ∆wirelength and ∆worst setup slack
• Up to 3.42% wirelength increase
• *_nd cases shows similar or slightly larger ∆WL% than *_d
• ∆WSS ranges from -19ps to 68ps
• Positive ∆WSS  there is room to improve timing
∆ WL (%)
4.00%
3.50%
3.00%
2.50%
2.00%
1.50%
1.00%
0.50%
0.00%
-0.50%
-1.00%
∆ WL (%)
∆ WSS (ps)
80
∆ WSS (ps)
60
40
20
0
-20
-40
21
Experimental Results (2)
Remaining violations (%)
• Global optimization fixes ~90% of DRVs
• Runtime of global optimization using CWOD library are 1.8x
larger than those using CWD library (except for Cortex M0)
• The runtime of the global optimization phase can be further
reduced with more computing resource
100%
m0_nd
jpeg_nd
m0x3_d
90%
80%
70%
60%
aes_nd
m0_d
jpeg_d
m0x3_nd
aes_d
Global optimization (3rd iteration);
1.8x90% violations are fixed
50%
40%
30%
20%
10%
0%
0
500
1000
1500
2000
2500
3000
Runtime (sec)
22
Experimental Results (3)
• DFPlacer fixes 99% of design rule violations
Design
M0_nd
AES_nd
M0x3_nd
JPEG_nd
M0_d
AES_d
M0x3_d
JPEG_d
Init.
IW #vio.
926
1771
3514
4056
988
1566
2810
6296
Init.
DDA/OW
#vio.
1611
1900
4230
12024
0
0
0
0
Final
total #vio.
25
34
65
164
10
11
27
43
• Example solution
DDA violation
IW violation
flipped
Cells are
moved
OW violation
IW violation
23
Outline
• Motivation & Previous Work
• Problem Formulation
• Our Approach
• Experimental Setup and Results
• Conclusion
24
Conclusion and Future Work
• Propose a scalable detailed placement
legalization flow for complex FEOL constraints
arising at the foundry 10nm node
• Constraints include minimum implant width,
minimum OD jog rules and drain-drain abutment
• Fixes 99% of DRVs with 3% increase in wirelength
and minimal impact on timing
• Future work
• Timing and wirelength-driven placement legalization
• “Smart ECO” method for few remaining DRVs after global
placement legalization
25
Thank you!
Experimental Setup: Designs and Technologies
• Minimum OD jog length = 4 sites width
• Minimum implant width = 4 sites width
• Number of violations of cell pair
• Minimum implant width rule violation: 7172 out of 15376 (= 62 x 62 x 2 x 2)
• Minimum OD jog length rule violation: 280 out of 15376
• 7nm cell library with scaled 28nm BEOL (back-end-of-line) LEF
• Site width/height: 0.136/0.9 um
min M1 pitch of 28nm node
min M2 pitch
of 28nm node
Scale by 2.5x
A1 A0 B0 B1
Y
OAI22 in 7nm node
27
Scaling of 7nm Cells
• Scale 7nm cells by 2.5X
• Left figure is the scaled OAI22_X1
• All the pins are on track with 0.135um M1 vertical pitch
• However, encounter does not work with 0.135um M1 vertical pitch
• Right figure shows the modified OAI22_X1 (fit into 0.136um M1 vertical pitch)
• Increase width from 0.81um  0.816um ( = 0.81 + (0.81/135))
• Shift the pins to be aligned to the vertical track with 0.136um pitch
0.067 0.135 0.135 0.135
0.135 0.135 0.068
0.068 0.136 0.136 0.136
0.1
0.136 0.136 0.068
0.1
0.9
0.05
A1
0.9
0.05
A0
B0
B1
Y
0.81
0.816
28