FPGA Power Reduction Using Configurable Dual-Vdd

Download Report

Transcript FPGA Power Reduction Using Configurable Dual-Vdd

Routing Track Duplication with FineGrained Power-Gating for FPGA
Interconnect Power Reduction
Yan Lin, Fei Li and Lei He
EE Department, UCLA
Partially supported by NSF grant CCR-0306682.
Address comments to [email protected].
Outline

Review and Motivation

Interconnect Leakage Power Reduction
using Power-gating

Interconnect Dynamic Power Reduction
using Dual-Vdd

Conclusions and Ongoing Work
Power Limitation of FPGAs

Existing FPGAs are HIGHLY power inefficient
(> 100X more than ASIC)


E.g. [Kusse, ISLPED’98]
Design Example
Vdd
Energy
Xilinx XC4003A
5v
4.2mW/MHz
Static CMOS ASIC
3.3v
5.5uW/MHz
Power is likely the largest limitation for
FPGAs
FPGA Power Reduction

Power aware FPGA CAD algorithms for
existing FPGA architectures



CAD algorithms to minimize power-delay
product [Lamoureux et al, ICCAD’03]
Configuration inversion for leakage reduction
[Anderson et al, FPGA’04]
Power efficient FPGA circuits and
architectures


Dual-Vdd and Vdd-programmable FPGA logic
blocks [Li et al, FPGA’04][Li et al, DAC’04]
Vdd-programmable FPGA interconnects


[Li et al, ICCAD’04]
[Anderson et al, ICCAD’04]
Overall FPGA Structure

Cluster-based Island Style FPGA Structure


Logic blocks are embedded into routing resources
Wire segment connectivity is programmable
FPGA Routing Structure

Subset Programmable
switch block


An incoming track can
be connected to
different outgoing
tracks with the same
track number
Programmable
connection block
Vdd-programmable Interconnects
[Li et al, ICCAD’04]

Conventional routing switch

Vdd-programmable switch



Vdd selection for used switch
Power-gating unused switch
Configurable Vdd-level conversion

Avoid excessive leakage when low Vdd switch drives high Vdd
switches
Power
transistor
Limitation of Vdd-programmable
Interconnects [Li et al, ICCAD’04]

Fine-grained Vdd-level converter insertion

Area overhead


Leakage overhead


36% leakage overhead for circuit s38584
SRAM cell overhead


54% area overhead for circuit s38584
300% SRAM cell overhead for each switch
Area/SRAM efficient low-power
interconnects are needed
Outline

Review and Motivation

Interconnect Leakage Power Reduction
using Power-gating

Interconnect Dynamic Power Reduction
using Dual-Vdd

Conclusions and Ongoing Work
Low Utilization Rate of
Interconnects

78.15% of total power is consumed by global
interconnect power [Li et al, DAC’04]

47% of global interconnect power is leakage


Why?
Extremely low utilization rate (~12% w/ minimum array)
Circuit
# of total interconnect
switches
# of unused interconnect
switches
Utilization rate
(%)
alu4
apex4
bigkey
clma
des
diffeq
dsip
elliptic
ex5p
frisc
36478
43741
63259
653181
87877
42746
75547
140296
45404
2388523
31224
37703
54017
593343
79932
36974
70138
125800
39288
216993
14.40%
13.80%
9.87%
9.16%
9.04%
13.50%
7.16%
10.33%
13.47%
9.15%
Average
11.90%
Interconnect Utilization Rate is
Intrinsically Low

Programmable switch block


Programmable
connection block


no more than 25%
Only one is used (for 64
tracks)
Power-gating unused interconnects is necessary
Vdd-gateable Routing Switch

Conventional routing switch

Vdd-gateable routing switch


Only two states for a routing switch
 High Vdd
 Power-gating
Enable power-gating capability w/o extra SRAM cells
Power
transitor
Vdd-Gateable Connection Block


Conventional connection block

Vdd-gateable connection block
Enable power-gating capability w/ only one extra SRAM
for a connection block


Only n+1 SRAM cells for 2n connection switches
A low leakage decoder is needed
Power and Delay of Vdd-gateable
Switch

Vdd-gateable switch compared to
conventional switch



Dynamic power is almost the same
>300X leakage power reduction
~6% delay increase
Routing switch delay (ns)
Energy per switch
(Joule)
Vdd
w/o
powergating
w/ powergating
w/o
powergating
w/
powergating
1.3v
5.90E-11
6.26E-11(6%)
3.3E-14
3.25E-14
1.0v
6.99E-11
7.42E-11(6.1%)
1.63E-14
1.65E-14
Power Reduction by Power-gating Unused
Interconnects
Circuit
Single-Vdd (baseline)
Interconnect
power (W)
Total Power Saving
Total power (W) [Li et al,
ICCAD04]
Vdd-gateable
Interconnects
alu4
0.0657
0.0769
25.13%
29.09%
apex4
0.0437
0.0500
21.83%
30.70%
bigkey
0.1044
0.1375
33.38%
24.89%
clma
0.4918
0.5450
23.42%
45.69%
des
0.1688
0.2136
36.71%
31.79%
diffeq
0.0292
0.0360
17.50%
45.20%
dsip
0.1003
0.1280
34.34%
43.66%
Avg.
--
--
25.19%
38.18%
Vdd-programmable interconnects
Vdd-gateable interconnects
Outline
Review and motivation
 Interconnect Leakage Power Reduction
using Power-gating


Interconnect Dynamic Power Reduction
using Dual-Vdd



FPGA fabrics and algorithms
Design flow and quantitative evaluation
Conclusions and Ongoing Work
Pre-Defined Dual-Vdd Routing
Architecture

Reduce dynamic power with dual-Vdd by making use of
timing slack

Partition routing channel into VddH and VddL regions


Vdd-gateable interconnect switch is used
Ratio of VddH/VddL track is an architectural parameter
Ratio of VddH to VddL Track


Determine ratio using dual-Vdd assignment
profile without considering layout constraint
Sensitivity-based dual-Vdd assignment


Assignment unit --- a routing tree
Power sensitivity --- ΔP/ ΔVdd


Power difference for a routing tree between VddH and VddL
Greedy algorithm --- sensitivity based


Initial: uniform VddH assignment
Procedure: assign VddL to routing tree with largest power
sensitivity (but without increasing critical delay)
Profile of Dual-Vdd Assignment
Assignment with no critical path delay increase
(VddH:VddL=1.5v:1.0v)

Circuits
#of
routing
trees
# of
logic
blocks
# of I/O
blocks
VddL routing
trees (%)
VddL logic
blocks (%)
alu4
782
162
22
49.74
82.10
apex4
849
134
28
35.45
78.36
bigkey
1542
294
426
67.77
85.03
clma
7995
1358
144
69.74
89.84
s38417
5426
982
135
64.17
80.05
seq
1138
274
76
20.74
61.62
spla
2091
461
122
54.52
88.47
54.54
80.28
Avg.

Set the ratio of VddH/VddL track to 1:1
Level Converter is NOT Needed
B
A
Wire segment can only be connected to another wire segment
with the same track number via a subset switch block
Level Converter is NOT Needed
B
A
Wire segment can only be connected to another wire segment
with the same track number via a subset switch block
No level converter is needed in switch block
Layout Constraint Due to Dual-Vdd

Dual-Vdd introduces performance
degradation due to layout constraint



Insufficient routing resources for Vddmatched routing trees
May introduce detours
Solutions


Vdd-programmable interconnects [Li et al,
ICCAD’04]
Provide sufficient routing tracks for Vddmatched routing trees

Control leakage by power-gating unused
interconnects
Design Flow for Dual-Vdd
Interconnects
Tech Mapped
Netlist (Single-Vdd)
Timing Driven Layout (Single-Vdd)
Arch
Spec
Dual-Vdd Assignment for Routing Trees
Timing Driven Layout (Dual-Vdd)
Power-gating Unused Switches
Delay/Pow
er Model
Delay/Power Estimation
(dual-Vdd)
Delay
Power
Double
Channel
width
Dual-Vdd Routing Algorithm


Based on the maze routing algorithm in VPR
Modify the cost function
TotalCost (n)  PathCostDv(n)
   ExpectedCostDv (n, j )
   Matched (T , n)




TotalCost(n): the cost of routing tree T through wire segment n to the
target sink j
PathCostDv(n): the cost of the path from the current partial routing
tree to wire segment n
ExpectedDv(n,j): the estimated cost from wire segment n to the target
sink j
Matched(T,n): boolean function describing Vdd-matching status
Outline
Review and motivation
 Interconnect Leakage Power Reduction
using Power-gating


Interconnect Dynamic Power Reduction
using Dual-Vdd



FPGA fabrics and algorithms
Quantitative evaluation
Conclusions and Ongoing Work
Comparison of Low Power Architectures
arch-SV
arch-PV
arch-PV+PG
1.3v
arch-DV+PG(1.5W)
power (watt)
0.27
1.5v
0.22
1.3v/1.0v
1.0v
1.5v/0.8v
0.9v
0.17
1.0v/0.8v
0.9v/0.8v
1.5v/0.8v
1.3v/1.0v
1.0v/0.8v
0.12
0.9v/0.8v
0.07
0.9v/0.8v
60
1.3v/0.9v
1.0v/0.8v
70
80
1.5v/0.8v
90
Circuit: S38584
100
110
120
130
clock frequency (MHZ)

Dual-Vdd interconnects with fine-grained power gating



May have performance degradation due to layout constraint
Can reduce more power than purely power-gating unused
switches
Achieve 9.78% interconnect dynamic power reduction, 38.68%
total power saving with 1.5W channel width

W is the nominal routing channel width in single-Vdd FPGA
50%
power saving
normalized clock frequency
45%
0.955
clock frequency
0.838
45.00%
0.743
40%
power saving
38.68%
35% 34.86%
30%
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
2.0
normalized clock frequency
power saving
Impact of Routing Channel Width
channel width


We get the power reduction percentage at the maximum clock
frequency achieved by dual-Vdd interconnects
Channel width increases from 1.0W to 2.0W


Power saving increases from 34.86% to 45%
Normalized clock frequency increases from 0.743 to 0.955
Area Overhead of Vdd-gateable
Interconnects

Device area is dominant
SingleDual-Vdd w/ Dual-Vdd w/ Dual-Vdd w/
[Li et al,
Vdd
PowerPowerPowerICCAD’04]
(baseline) gating (1.0W) gating (1.5W) gating (2.0W)
Total FPGA
area
7077044
11092744
15420197
20249865
22678225
Area
overhead (%)
-
57%
118%
186%
220%


Area overhead is mainly due to power transistors for powergating capability
Track duplication with power-gating vs Vdd-programmable
interconnects [Li et at, ICCAD’04]

More power reduction (45% vs 25%) & less area overhead


Mainly due to Vdd-level converter removal
High Vdd interconnects with power gating is BEST
considering area
Outline
Review and motivation
 Interconnect Leakage Power Reduction
using Power-gating

Interconnect Dynamic Power Reduction
using Dual-Vdd
 Conclusions and Ongoing Work

Conclusions and Ongoing Work

Conclusions




Developed power-gateable interconnects w/ virtually
no extra SRAM cell
Achieved 38.18% total power reduction using Vddgateable interconnects
Achieved 24.78% interconnect dynamic power
reduction, 45.00% total power reduction with
duplicated (2W) channel width
Ongoing work



Power-ground design to support dual-Vdd
Optimal mix of Vdd-programmable and Vddgateable interconnects
Architecture evaluation considering Vdd
programmability [Lin et al, to appear in FPGA’05]