FPGA Power Reduction Using Configurable Dual-Vdd
Download
Report
Transcript FPGA Power Reduction Using Configurable Dual-Vdd
Trace-Based Framework for Concurrent
Development of Process and FPGA
Architecture Considering Process Variation
and Reliability
1Lerong
1Lei
1EE
Cheng, 1Yan Lin,
He, and 2Yu Cao
Department, UCLA
2EE
Department, ASU
Address comments to [email protected]
Outline
Introduction
Review of existing work
Process models
Concurrent development of process and architecture
Power and delay
Process variation
Concurrent development for reliability
Device aging
Permanent soft error rate (SER)
Interaction between process variation and reliability
Conclusion
Review of Previous Work
Circuit
element usage
Application
Benchmark
Set
Input Vector
FPGA
Architecture
Short circuit
power ratio
VPR
and
Psim
Switching
activity
Trace
Critical path
structure
Circuit Level
Delay, Power
and Area
Device and architecture co-optimization
Power and delay [Cheng DAC’05]
Process variation [Wong ICCAD’05]
Soft error rate [Lin, ICCAD’07]
Ptrace
Chip
Level
Delay,
Power,
and Area
Limitation of Ptrace
Ptrace requires a stable SPICE model which is able to
consider all process corners
SPICE model is not available at the early stage of process
development
Circuit simulation for all process corners is time consuming
The accuracy of circuit simulation is not needed for quick
architecture evaluation
Does not handle realistic variation
Non-Gaussian variation sources
Spatial correlation
Does not handle device aging
Extended Ptrace (Ptrace2)
Input
Process
parameters
PTrace2
Transistor
Electrical
Characteristics
Circuit Level
Power and
Delay
Estimation
Trace
Circuit Element Statistics
Critical Path Structure
Switching Activity
Output
Chip Level
Power and
Delay
Estimation
Variation
Analysis
Reliability
Chip Level
Leakage Power
Dynamic Power
Delay
Process Variation
Power Distribution
Delay Distribution
Reliability
Device Aging
Soft Error Rate
Early-Stage Circuit Modeling
ITRS MASTAR4 model [ITRS MASTAR4 2005]
Inputs: Lgate
Tox
Outputs:
Ioff
Nbulk
Ion
Xjext
Igon
W
Racc
Igoff
Cg
T
Vdd
Cdiff
Extended Ptrace
Input
Process
parameters
PTrace2
Transistor
Electrical
Characteristics
Circuit Level
Power and
Delay
Estimation
Trace
Circuit Element Statistics
Critical Path Structure
Switching Activity
Output
Chip Level
Power and
Delay
Estimation
Variation
Analysis
Reliability
Chip Level
Leakage Power
Dynamic Power
Delay
Process Variation
Power Distribution
Delay Distribution
Reliability
Device Aging
Soft Error Rate
Circuit Level and Chip Level Power and
Delay
Circuit level power and delay
Inverter
Pass transistor driven by an inverter
Chip level power and delay
Similar to the original Ptrace [Cheng DAC’05, Wong ICCAD’05]
Outline
Introduction
Review of existing work
Process models
Concurrent development of process and architecture
Power and delay
Process variation
Concurrent development for reliability
Device aging
Permanent soft error rate (SER)
Interaction between process variation and reliability
Conclusion
Experimental Setting
20 MCNC benchmarks
Assume all 20 MCNC benchmarks are placed in the same chip
ITRS high performance 32nm technology (HP32)
Architecture
Cluster size N=6
LUT size K=7
Wire segment length W=4
Device
Vdd=1.0, 1.05, 1.1 V
Lgate=31, 32, 33 nm
Baseline ITRS HP32
Delay and Power Tradeoff
3.1X energy span and 1.3X delay span within search space
Power and Delay Optimization
Device Power
(W)
HP32
1.19
Min-ED 0.77
Delay
(ns)
3.90
4.55
Energy
ED
(nJ)
(nJ·ns)
1.88
22.6
3.50
15.9(-29.4%)
Device tuning reduces energy delay product by 29.4%
Outline
Introduction
Review of existing work
Process models
Concurrent development of process and architecture
Power and delay
Process variation
Concurrent development for reliability
Device aging
Permanent soft error rate (SER)
Interaction between process variation and reliability
Conclusion
Experimental Setting
Variation sources
Doping density Nbulk
3σg=5% of nominal value, 3σr=3% of nominal value
Gate channel length Lgate
3σg=0.8nm, 3σr=0.6nm
Simulation
M=10,000 sample Monte Carlo simulation
Power and Delay Distribution
Power and Delay Variation
Leakage (mW)
Delay (ns)
Device
µ
σ
µ
σ
HP32 942
334
3.91
0.119
Min-ED 340 45 (-87%) 4.55 0.159 (+34%)
Min-ED device setting significantly reduce leakage variation
with a small increase of delay variation
Outline
Introduction
Review of existing work
Process models
Concurrent development of process and architecture
Power and delay
Process variation
Concurrent development for reliability
Device aging
Permanent soft error rate (SER)
Interaction between process variation and reliability
Conclusion
NBTI and HCI
Negative-bias-temperature-instability (NBTI) effect increases
the threshold voltage of PMOS [Wang DAC’06]
hot-carrier-injection (HCI) increases the threshold voltage of
NMOS [Wang CICC’07]
Inputs: Lgate
Outputs:
Tox
Nbulk
Xjext
ΔVth(NBTI)
W
Racc
ΔVth(HCI)
T
Vdd
Vth Increase Caused by NBTI and HCI
Vth increase is the most significant in the first year
Device burn-in can be applied to reduce the impact of device
aging
Impact of Device Burn-in
W/O Burn-in
Device
Current
W/ Burn-in
10 years
P (mW) D (ns) P (mW)
D (ns)
Current
10 years
P (mW) D (ns) P (mW)
D (ns)
HP32
854
3.90
640
4.23
(-25.1%) (+8.5%)
711
4.01
627
4.25
(-10.0%) (+5.5%)
Min-ED
328
4.55
311
4.64
(-5.2%) (+2.0%)
317
4.59
309
(-1.9%)
4.65
(+1.1%)
High performance device setting is more sensitive to device
aging
Device aging leads to 8.5% of delay degradation after 10
years
Device burn-in reduce delay degradation from 8.5% to 5.5%
after 10 years
Outline
Introduction
Review of existing work
Process models
Concurrent development of process and architecture
Power and delay
Process variation
Concurrent development for reliability
Device aging
Permanent soft error rate (SER)
Interaction between process variation and reliability
Conclusion
Permanent Soft Error Rate
Single-event upset (SEU) due to cosmic rays or high energy
particles may affect configuration SRAMs in FPGAs and result
in permanent soft error
Inputs: Lgate
Tox
Nbulk
Outputs:
Xjext
SER
W
Racc
T
Vdd
SER under Different Device Setting
Device
SER (FIT)
HP32
362.45
Min-ED
368.25 (+1.6%)
SER for both device setting is similar
Outline
Introduction
Review of existing work
Process models
Concurrent development of process and architecture
Power and delay
Process variation
Concurrent development for reliability
Device aging
Permanent soft error rate (SER)
Interaction between process variation and reliability
Conclusion
Impact of Device Aging on Power and Delay
Variation
σ Leakage (W)
σ Delay (ns)
Device
Current
10 years
Current
10 years
HP32
334
116 (-65.2%)
0.119
0.121 (+1.67%)
Min-ED
45.0
30.3(-32.7%)
0.159
0.159 (+0.16%)
Device aging significantly reduces leakage variation and
slightly increase delay variation
Impact of Device Aging and Process
Variation on SER
Current
SRAM SER (FIT) 2.914E-5
10 years
Variation
+0.3%
-0.18% ~ +0.17%
Neither device aging nor process variation has significant
impact on permanent SER
Outline
Introduction
Review of existing work
Process models
Concurrent development of process and architecture
Power and delay
Process variation
Concurrent development for reliability
Device aging
Permanent soft error rate (SER)
Interaction between process variation and reliability
Conclusion
Conclusion
A trace-based framework has been developed to enable
concurrent process and FPGA architecture co-development
Device tuning achieves significant energy delay product
reduction
Applying device burn-in reduces delay degradation from 8.5%
to 5.5% within 10 years
Device aging significantly reduces leakage variation but has
has almost neglegible impact on delay variation
Neither device aging nor process variation has significant
impact on permanent SER