Transcript slides

Proximity Optimization for Adaptive
Circuit Design
Ang Lu, Hao He, and Jiang Hu
Overview
•
•
•
•
Introduction
Proposed Techniques
Experiment Result
Conclusion
2
Design Challenges
Process Variations
Device Aging
Power
3
Adaptive Circuit
• Apply power according to actual chip variations
• More energy-efficient than the worst case desgin
4
Sensors and Tuning Knobs in Adaptive Circuit
• Delay variation sensors
– Critical path replica, used in IBM Power5 processor
– Canary flip-flop, like in Razor
• Tuning knobs
– Adaptive body bias
– Adaptive supply voltage (voltage interpolation)
Liang, et al., Micro 2009
5
Pros and Cons of Adaptive Circuit
Pros:
More energy-efficient than the
worst case designs
Cons:
 Potentially large area overhead
 Higher design complexity
6
Clustering for Adaptive Circuits
Two extreme cases:
• Tune each cell individually?
 Too much overhead
• Tune entire circuit collectively?
 Less energy savings
Achieve desired trade-off?
• Clustering
7
Overview
• Introduction
• Proposed Techniques
– Overall Flow
– Time and Location Aware Cell Clustering
– Clustering Driven Incremental Placement
• Experiment Result
• Conclusion
8
Proposed Flow
9
Existing Clustering Methods
• Manual partition for regular datapath
Timing
Location
• Cluster cells based on spatial
proximity
• Cluster cells based on their timing characteristics
– Kulcarni, et al., TCAD 2008
– Monte Carlo (MC) simulation
– Optimize for each MC run
Timing
Location
10
Need to Consider Both Timing & Spatial Proximity
Both paths A & B are critical
Bad Cluster: A1 & B2
(Similar timing characteristic)
Good Cluster: A2(A1) & B2 (B1)
11
Clustering Algorithm
Clustering algorithm
𝐿𝑙𝑜𝑦𝑑′ 𝑠 𝐾 − 𝑚𝑒𝑎𝑛𝑠 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚
Distance definition
Dist = α| si – sj | + β| 𝜓i - 𝜓j | + γdij
Timing
Location
• Timing:
• Si : Slack of gate i
• 𝜓i : Slack sensitivity of gate i
• Location:
• dij : Physical distance between gate i & j
12
Timing Slack
si = 𝑚𝑎𝑥(𝑚𝑖𝑛 𝑠i , 𝜃𝑚𝑎𝑥 , 𝜃𝑚𝑖𝑛)
• Slack includes nominal value + standard deviation, or at
process corners
• Cut out threshold 𝜃
Slacks are treated as the same if they are too large or too small
13
Timing Sensitivity
ψ=
𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙_𝑃𝑎𝑡ℎ_𝑆𝑙𝑎𝑐𝑘_𝐷𝑒𝑐𝑟𝑒𝑎𝑠𝑒
𝑇𝑢𝑛𝑖𝑛𝑔_𝐶𝑜𝑠𝑡
• Critical path slack decrease due to tuning at cell 𝑐𝑖
• Tuning cost => power increase due to tuning
Subsequent adaptivity optimization prefers low cost tuning
14
Correlations?
• Highly correlated cells are clustered together
• Spatial correlation is partially addressed by spatial
proximity
• Structural correlation
– Not every cell on one critical path need to be tuned
– Structurally correlated cells are rarely too far apart
15
Overview
• Introduction
• Proposed Techniques
– Overall Flow
– Time and Location Aware Cell Clustering
– Clustering Driven Incremental Placement
• Experiment Result
• Conclusion
16
Cluster Driven Incremental Placement
•
•
•
•
•
Make cells of the same cluster to form a contiguous region
Minimize total cell movement
Assignment problem
Simultaneously processing all clusters => multicommodity flow problem
Process one cluster at a time
17
Min-Cost Network Flow Formulation
Source nodes
Sink nodes
18
Implementation
Them min-cost network flow problem is solved by the
Edmond-Karp algorithm
Move cells heuristically for fractional flow solutions
19
Wirelength Overhead Control
• After incremental placement, wirelength increase is
estimated
• If the increase > threshold, rerun clustering with
increased weight on spatial proximity
20
Overview
•
•
•
•
Introduction
Proposed Techniques
Experiment Result
Conclusion
21
Experiment Setup
• Benchmark:
– ICCAD 2014 Incremental Timing-Driven
Placement Contest benchmark suites
– 7 circuits, (130K , 960K) cells
• Adaptive body bias is employed as platform of adaptive circuit design
• # cluster is empirically chosen in a range from 10 to 25
22
Comparison and Methodology
Methods that are compared:
1.
2.
3.
4.
Over design
Location-driven clustering
Timing-driven clustering
Location and timing driven clustering (Ours)
Methodology
For each method, simulate multiple times with varying parameters and report
the average results
23
Testcases and Placement Perturbations
Circuit
# gates
Cells moved
Avg. cell move
distance
edit_dist
130674
5.1%
7
matrix_mult
155341
5.7%
8
vga_lcd
164891
8.1%
27
b19
219268
8.3%
12
leon3mp
649191
3.9%
89
leon2
794286
8.3%
50
netcard
958792
5.5%
90
24
Results from only Forward Body Bias
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Over design
Power
location-aware
clustering
∆Area
Timing-aware
clustering
Ours
∆Wire-length
Our method achieves
• 99% timing yield like other methods
• 1/4 less power than over design
• 1/3 less area overhead
• substantial less wire overhead
25
Results from Forward and Reversed Body Bias
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Over design
location-aware
clustering
Power
Timing-aware
clustering
∆Area
Ours
∆Wire-length
Our method achieves
• 99% timing yield
• similar power
• much less area overhead than location-only
• much less wire overhead than timing-only
26
Power/Area – Timing Tradeoff
Circuit “mgc_matrix_mult”
Power/area – Timing tradeoff
Δ Area
Adapt Power
12000
14000
10000
12000
10000
8000
8000
6000
6000
4000
4000
2000
0
3850
2000
3900
3950
4000
4050
4100
0
4150
Timing constraint
Adapt Power
Δ Area
27
Impact of Weighting Factors
Circuit “mgc_matrix_mult”
α| si – sj | + β| 𝜓i - 𝜓j | + γdij
α
β
γ
# clusters
Adapt Power
∆ Area
∆ wire
0
1
1
23
3707
4111
0.00%
5
1
1
24
3500
1851
0.61%
6.5
7.2
15
20
1
5
5
5
1
1
1
1
1
0
2
4
1
1
1
1
0
1
1
1
22
22
12
12
20
24
24
24
7764
9582
10376
6686
3771
3500
3377
4330
8538
10585
10903
8423
2533
1851
1653
3673
0.76%
0.82%
6.70%
8.50%
9.50%
0.61%
0.60%
0.55%
28
Entire Flow Runtime
6000
5000
4000
3000
2000
1000
0
Over Design
Location Cluster
Ours
Runtime (s)
29
Conclusion
Clustering and cluster-driven placement are
proposed for adaptive circuit designs
 Reduce area and power overhead of adaptive
circuit, outperform previous methods
 Assure gates of the same cluster locate
in a contiguous region
 Wire-length increase <1%.
30
Built-In Self Optimization for Variation
Resilience of Analog Filters
Thank you!