Presentation kit - UCSD VLSI CAD Laboratory

Download Report

Transcript Presentation kit - UCSD VLSI CAD Laboratory

Learning-Based Prediction of
Embedded Memory Timing
Failures During Initial Floorplan
Design
Wei-Ting J. Chan, Kun Young Chung, Andrew B.
Kahng, Nancy D. MacDonald and Siddhartha Nath
UC San Diego / VLSI CAD Laboratory
-1-
Outline







Motivation
Previous Work
Our Work
Multiphysics Analysis
Modeling Methodology
Results
Conclusions
-2-
Early Prediction of Slack Failure in SRAMs

Timing closure is time-consuming and complex at advanced
nodes  significantly increases turnaround time
– Multiphysics effects (IR drop, thermal, etc.) affect timing
closure

Floorplanning with SRAMs is complicated
– Creates placement and routing blockages
– Makes timing unpredictable at the post-P&R stage


Early prediction of post-P&R slack can reduce design cost
and turnaround time
Post-P&R timing estimation at the floorplan stage is
challenging due to many factors
– Wire delay must be estimated without information on spatial
No tool predicts post-P&R slack at an early
embedding
– Gate delay must be design
estimatedstage
without information on
buffering
-3-
Single vs. Multiple Physics


Design teams can achieve more accurate timing results by
closing multiphysics analysis loops
But, multiphysics results are non-trivial to predict in early stages
No IR
Static IR
Dynamic IR (1st loop)
Dynamic IR (2nd loop)
Dynamic IR (3rd loop)
Dynamic IR (4th loop)
SRAM Slack (ps)

Multiphysics STA: performing STA with more than one “physics”
Examples of multiple physics: IR, thermal, reliability, crosstalk,
etc.
SRAM Slack (ps)

29ps
25ps
Implementation Index
SRAM #1
SRAM #5
-4-
Challenge: Sensitivity of Slack to Spacing
between Memories

Slack values vary in a highly nonobvious and/or noisy manner as
the spacing is changed
1
2
3
4
5
-0.7
Blockage
Placement region for
standard cells
Blockage
Blockage
WNS of SRAMs (ns)

The spacing (channel width) between memories is varied in
steps of 10μm
The difference in slack can be larger than 300ps at a spacing of
10μm due to congestion, buffer placement, etc.
sram_spacing

Delta slack > 300ps
-0.8
-0.9
slack-1
-1
slack-2
-1.1
slack-3
slack-4
-1.2
slack-5
-1.3
0
10
20
30
SRAM Pitch (µm)
-5-
Challenge: Sensitivity of IR Drop Map to
Power Pad Locations

Distribution density and location choices of power pads
affect the IR drop map

In (a) IR map has very few IR drop hotspots for
uniformly placed pads

In (b) and (c) IR maps have more hotspots due to fewer
power pads
(a)
(b)
(c)
-6-
Challenge: Abstraction of P&R Stages and
Tool Noise

Modeling must comprehend multiple stages of physical
design

Our approach: an approximate function f to estimate
the combined effects of netlist, constraints, placement,
clock network synthesis, routing, extraction and timing
Gate Netlist
𝑦 = 𝑓(𝑋)
Constraints
𝑋 = netlist, constraints,
floorplan parameters
Modeling
Scope
Floorplan, Powerplan
Placement
Clock network synthesis
Routing
Extraction, Timing,
Verification
Slack (w/, w/o IR)
Signoff
Extraction,
Timing
𝑓 = ???
Costly
Iteration
𝑦 = Slack (w/, w/o IR)
-7-
Previous Work

Post-P&R timing prediction from netlist
– adoption of physical synthesis [Alpert07]
– analytical buffered delay or wire models [Alpert06] [Jones94]
[Vujkovic12]
– detection of congestion during synthesis [Clarke11]
– models using regression on existing synthesized designs
[Karchmer12]
– Thermal-aware delay model at floorplan [Kim12]
– Closed-form SRAM latency model w.r.t. process variation
[Yaldiz09]

P&R outcome prediction with machine learning
– Defect classification using SVMs [Huang10]
– Nonlinear ML models for CTS skew [Kahng13]

None of the above works answer how to avoid suboptimal
decisions at the floorplanning stage
-8-
Our Work




First to propose a modeling methodology to
predict post- P&R slack values at endpoints on
SRAMs at the floorplan stage
Extend our methodology to predict multiphysics
slack values of SRAMs at the floorplan stage
Enables early filtering and improvement of
floorplans that would lead to timing failures at
the post-layout and signoff stages
A new implementation of Boosting technique
based on SVMs as weak learners and a
weighting strategy for negative slack outcomes
to avoid critical timing failures
-9-
Multiphysics Analysis Flow


We consider IR drop (RedHawk) and crosstalk (PTSI) in our
work
Other multiphysics effects such as thermal and reliability
will be explored in the future
.sdc, .db,
.v, .spef
Timing Analysis
(PTSI)
Timing Windows
per Pin (.timing)
.lib, .def,
.spef, .tech
Not Explored
in This Work
IR Drop per
Instance (.tcl)
Temp, Reliability,
Other Physics
IR Analysis
(RedHawk)
-10-
Floorplanning and SRAM Placement

Floorplans are parameterized including core width and
height, SRAM spacings, surrounding space, and widths of
routing channels
core_w
sram_w
SRAM
sram_h
Buffer screen
sram_spacing
Blockages
(emulate
SRAMs)
screen_w
screen_w
screen_w
vc
blockage_w
hc
core_h
blockage_h
screen_w
screen_w
-11-
PDN Design

We also parameterize PDN stripe pitches and stripe widths
VDD
GND
Power ring: V = M9, H = M10
(width = 2µm)
Top mesh: V = M9, H = M10
Power rail: M2
SRAM
Secondary mesh: M6
SRAM
SRAM: from M1 to M4
Power pad
M1, M2, M3, M4, M5, M6, M7, M8: signal routing
M6: local meshes
M9, M10: top mesh
M9, M10: power rings
-12-
Parameter Selection

Three categories of parameters
– Netlist structure
– Floorplan parameters
– Layout constraints

Sensitivity analysis
– Independent sweeping of each parameter
– Combined effects of parameters using variance inflation factor
Parameter
Range of Value(s)
(VIF)
Aspect ratio
0.8~1.2
Utilization (std cells)
PDN stripe width
PDN stripe pitch
SRAM spacing (channel width)
Buffer screen width
Routing metal layers
Memory placement
Clock period
Max transition
Max fanout
Threshold voltage mixes
Clock buffer sizes
NDRs on clock nets
40%~70%
0.5~3.5μm
7~40μm
6~24μm
10~16μm
7, 8
{Face-to-face, face-to-back}
THEIA = 3.0~4.0ns
nova = 3.2~4.2ns
artificial = 2.0ns
200~280ps
8~10
{LVT}, {LVT, RVT}, {RVT}
{X32}, {X32, X24}, {X32, X24, X16}
1W1S, 2W2S, 3W3S, 3W2S, 2W3S
-13-
List of Parameters
Parameter
Description
Type
Per-memory?
N1
Max delay across all timing paths at the post-synthesis stage
Netlist
Yes
Netlist
Netlist
Yes
Yes
N2
N3
Area of cells in the intersection of startpoint fanout and endpoint fanin
cones of max-delay incident path
Number of stages in the max-delay incident path
N4, N5, N6
N7
FP1
FP2
FP3, FP4
FP5
FP6
FP7, FP8
FP9, FP10
FP11
Max, min and average product of #transitive fanin and #transitive fanout
endpoints
Width and height of memory
Aspect ratio of floorplan
Standard cell utilization
PDN stripe width and pitch
Size of buffer screen around memories
Area of blockage (%) relative to floorplan area
Lower-left placement coordinates of memories
Width, height of channels for memories
#memory pins per channel
C1
C3, C4
Sum of width and spacing of top-three routing layers after applying nondefault rules (NDRs) C2 % cells that are LVT
Max fanout of any instance in data and clock paths
C5, C6
C7
C8
C9
Max transition time of any instance in data and clock paths
Delay of the largest buffer expressed as FO4 delay
Clock period used for P&R expressed as FO4 delay
Ratio of clock periods used during synthesis and P&R
Netlist
Netlist
Floorplan
Floorplan
Floorplan
Floorplan
Floorplan
Floorplan
Floorplan
Floorplan
Yes
Yes
No
No
No
No
No
Yes
Yes
Yes
Constraint
Constraint
No
No
Constraint
Constraint
Constraint
Constraint
No
No
No
No
-14-
Modeling Techniques and Flow
Parameters from
sequential graph of
netlist
Parameters from floorplan
context, constraints
Slack reports from
P&R, multiphysics
STA
Ground Truth
LASSO with L1
regularization
SVM with RBF
kernel
ANN with 1 input, 2
hidden, 1 output
layer
Boosting with
SVM as weak
learner
Combine using weights
Save model and exit
-15-
Boosting with SVM
Input parameters
(netlist, floorplan
context,
constraints)
SVM weak
learner
W1
SVM weak
learner
W2
P&R,
Multiphysics
slack reports
β1
β2
β3
SVM weak
learner
∑
Boostingpredicted output
βk
Wk
SVM weak
learner
-16-
Experimental Setup and Testcases






Standard cells: 28nm FDSOI foundry technology
SRAMs: 28nm FDSOI foundry SRAMs
Synthesis: Design Compiler
P&R: IC Compiler
STA: PrimeTime SI (PTSI)
IR drop analysis: APACHE RedHawk
Netlist
THEIA v0
THEIA v1
THEIA v2
THEIA v3
THEIA v4
nova
artificial
Clock Period(ns)
3
2.7
3
3
3
2
2
#Std Cells
147274
146505
146914
146243
146606
66031
201015
#SRAMs
40
5
6
8
10
5
6
Logic Area (μm2)
157416
157068
157012
156212
155991
68970
213075
SRAM Area
(μm2)
347252
40027
48032
64043
80054
25117
14925
-17-
A General “Tic-tac-toe” Floorplan

A floorplan is divided to a array of “tic-tac-toe” blocks

Three types of blocks are defined as memory, blockage,
and standard cells
– enables generality and parameterizability,
– enables the ability to explore a discrete design space
systematically, and
– captures how designers tend to floorplan their blocks
Memory
STD cells
Blockage
-18-
Example: Memory Placements
Implementation examples of tic-tac-toe
SRAM
Implementation
of cross / L / T–shaped floorplans
-19-
Simple-Minded Modeling Yields Large Errors



No apparent correlation between post-P&R and postsynthesis slack values
Modeling with only netlist parameters
Worst-case error = 358ps ; average error = 42ps
Technique
Worst-Case
Error (ps)
Average
Error (ps)
LASSO
565
87
SVM (linear)
412
55
SVM (w/ RBF
kernel)
358
42
-20-
Post-P&R Slack Prediction
Errors in data points with negative slack are penalized more
to avoid critical timing failures
Error of Slack Prediction (ns)

Worst error = 224ps
Average error = 4ps
Actual Slack (ns)
-21-
Multiphysics Slack Prediction

Annotate per-cell IR-drop from RedHawk in PTSI
Worst error = 253ps
Average error = 9ps
-22-
Modeling Fidelity

False negatives = 3%
– pessimistic predictions in which we provide guidance to
change a floorplan that is actually not required

False positive = 4%
– our model incorrectly deems a floorplan to be good
Positive slack data points:
Precision: tp/(tp +fp) = 93.3%
Recall: tp/(tp +fn) = 95.0%
Pass
584
42
Fail
Predicted
Negative slack data points:
Precision: tn/(tn +fp) = 92.5%
Recall: tn/(tn +fn) = 90.1%
Actual
Pass
Fail
31
384
False
negativesRecall
Recall
Precision
Precision
False
positives
-23-
Conclusions

Early stage timing failure prediction and timing closure
with multiphysics analyses are important

We present a machine learning-based methodology for
the early stage timing failure prediction problem
– Worst-case error = 224ps (w/o multiphysics)
– Worst-case error = 253ps (w/ multiphysics)

We present a new implementation of Boosting based
on SVMs as weak learners

Our ongoing works include
– Applying our methodology to product/test engineering data
from an SoC company
– Predicting defectivity in silicon and providing floorplan
guidance to avoid such defectivity
-24-
Acknowledgments

Work supported by Samsung Electronics

We thank P. Agrawal (ANSYS) and J.-A.
Desroses (ST Microelectronics) for their help
with setup and enablement of iterative DVD
analysis and signoff timing flow
-25-
Thank You!
-26-
Backup
-27-