Transcript ppt

ECE 636
Reconfigurable Computing
Lecture 16
Power Reductions Techniques for FPGAs
Lecture 16: Power Reduction Techniques
November 5, 2013
Overview
• FPGAs generally considered power hungry compared to ASIC
and processor counterparts
- Mostly due to unused interconnect
• Recent area of extensive research
• Device techniques
- Voltage scaling
- Sleep mode
• Software techniques
- Reduced switching
- Reduced capacitance
Lecture 16: Power Reduction Techniques
November 5, 2013
Dynamic Power
°
Dynamic power is required to charge and
discharge load capacitances when transistors
switch.
°
One cycle involves a rising and falling output.
°
On rising output, charge Q = CVDD is required
°
On falling output, charge is dumped to GND
VDD
Short circuit current
iDD(t)
Charge/discharge current
fsw
C
Courtesy: Harris
Lecture 16: Power Reduction Techniques
November 5, 2013
Dynamic Power
T
Pdynamic
1
  iDD (t )VDD dt
T 0
T
VDD

iDD (t )dt

T 0
VDD

TfswCVDD 
T
 CVDD 2 f sw
Short circuit power <10% of
dynamic power
Lecture 16: Power Reduction Techniques
VDD
iDD(t)
fsw
C
November 5, 2013
FPGA Static Power Consumption
° Junction leakage
° Gate oxide leakage
° Subthreshold leakage
Lecture 16: Power Reduction Techniques
November 5, 2013
FPGA Static Power Consumption
° Junction leakage
• Small fraction of leakage
° Gate oxide leakage
• When Vgs < Vt still some
source-drain current
• Increases exponentially as Vt
decreases
• Decreases exponentially as Vgs
decreases
° Subthreshold leakage
• Increases exponentially as Vgs
increases
Technology trend
Lecture 16: Power Reduction Techniques
Courtesy: Nowak
November 5, 2013
FPGA Power Reduction Goals
• Dynamic power goals
- Reduce Vdd along non-critical paths
- Low swing signalling
- Use CAD approaches to limit long high-toggle paths
- Pdynamic = 0.5 * C * Vdd2 * f
• Static power goals
- Cut-off Vdd for unused transistors
- Use high Vt transistors for SRAM cells
- Various other voltage biasing techniques
Lecture 16: Power Reduction Techniques
November 5, 2013
Traditional Routing Switch
Courtesy: Anderson
SRAM cell
S S
...
CONFIG
S
S
i1
i1
i2
i3
i4
MP2
S
MP1
i2
OUT
…..
MUX
S
MUX
in
VINT
i3
S
i4
level-restoring
buffer
Lecture 16: Power Reduction Techniques
November 5, 2013
Proposed Switch Designs: Anderson
° Based on 3 observations:
• Routing switch inputs tolerant to
weak-1 signals (level-restoring buffers).
• Considerable slack in FPGA designs  many switches can be
slowed down.
• Most routing switches feed other routing switches.
- Can produce weak-1 logic signals.
Lecture 16: Power Reduction Techniques
November 5, 2013
“Basic” Switch Design
CONFIG
SRAM cell
S S
...
VDD
S
VDD
MNX
MPX
~SLEEP
LOW_POWER v SLEEP
VVD
VVD
i1
i2
i3
i4
OUT
…..
MUX
GND
in
MODE
OPERATION:
Lecture 16: Power Reduction Techniques
LOW_POWER
s
GND
~LOW_POWER
high-speed: MNX & MPX ON
low-power: MNX ON, MPX OFF
sleep: MNX OFF, MPX OFF
November 5, 2013
High-Speed Mode
CONFIG
SRAM cell
S S
...
VDD
S
VDD
MNX
MPX
~SLEEP
LOW_POWER v SLEEP
VVD = VDD
VVD
i1
i2
i3
i4
OUT
…..
MUX
output swing:
rail-to-rail.
GND
in
MODE
OPERATION:
Lecture 16: Power Reduction Techniques
LOW_POWER
s
GND
~LOW_POWER
high-speed: MNX & MPX ON
low-power: MNX ON, MPX OFF
sleep: MNX OFF, MPX OFF
November 5, 2013
Low-Power Mode
CONFIG
SRAM cell
S S
...
VDD
S
VDD
MNX
MPX
~SLEEP
LOW_POWER v SLEEP
VVD =
VDD - VTH
VVD
i1
i2
i3
i4
OUT
…..
MUX
GND
in
LOW_POWER
s
~LOW_POWER
output swing:
output
swing:
GND-toGND-to(VDD
-VTH).
GND
(VDD-VTH).
high-speed: MNX & MPX ON
MODE
OPERATION: low-power: MNX ON, MPX OFF
sleep: MNX OFF, MPX OFF
Lecture 16: Power Reduction Techniques
November 5, 2013
Sleep Mode
CONFIG
SRAM cell
S S
...
VDD
S
VDD
MNX
MPX
~SLEEP
LOW_POWER v SLEEP
VVD
VVD
i1
i2
i3
i4
OUT
…..
MUX
GND
in
LOW_POWER
s
GND
~LOW_POWER
high-speed: MNX & MPX ON
MODE
OPERATION: low-power: MNX ON, MPX OFF
Lecture 16: Power Reduction Techniques
sleep: MNX OFF, MPX OFF November 5, 2013
Leakage Power Results: Anderson
% leakage power reduction vs.
high-speed mode
70
60.8
Basic
60
50
40
39.7
36
38.7
30
20
10
0.3
0
LP mode
Lecture 16: Power Reduction Techniques
Sleep mode
LP mode
(+unused
fanout)
LP mode
(+used
fanout)
Traditional
switch
November 5, 2013
Region Constrained Placement
• Rather than just focusing on routing, consider constraining
logic
• Most circuits exhibit locality
• Gayasen: FPGA’2004
Lecture 16: Power Reduction Techniques
November 5, 2013
Region Constrained Placement
• Several issues to consider
• Size of sleep transistor
- Too large: increases leakage, area
- Too small: affects logic performance
• Size of region
- Too large: possibly unused resources, complicates
placement
- Too small: Sleep transistors take up too much room
Lecture 16: Power Reduction Techniques
November 5, 2013
Experimental Flow: RCP
• Different region sizes
considered for flow
• Area constraints for
portions of design
determined by hand
• May encourage designers
to create granular
designs
Lecture 16: Power Reduction Techniques
November 5, 2013
Power Savings: RCP
• Note significant reduction in leakage power savings as
region size increases
• Bottom curve primarily due to luck
Lecture 16: Power Reduction Techniques
November 5, 2013
Performance Limitation: RCP
• Performance limited by use of regions
• Nearly 10% clock frequency reduction for many designs
Lecture 16: Power Reduction Techniques
November 5, 2013
Low-swing Signalling
• Techniques we have examined so far look at tinkering with
supply voltage
• Also possible to modify wire signalling to reduce voltage
swing
• Most of FPGA is made up of interconnect
• Approach targets dynamic power consumption
George and Rabaey: 1997
Lecture 16: Power Reduction Techniques
November 5, 2013
Low-swing Signalling
• Interconnect swing is at 0.8V while rest of circuit
operates at 1.5V
• Cascode circuitry used at sink to overcome slow
speed issues
• 50% energy savings at cost of 25% delay
Lecture 16: Power Reduction Techniques
November 5, 2013
Alternate approach: Modifying FPGA CAD
• FPGA architecture modification impact all designseven those that don’t care about power
• Can placement and routing be modified to consider
dynamic power
- Need to know which signals are high toggle
- Attempt to minimize length of high-toggle wires
- Minimize impact on performance and area
• Techniques fit well into our previous work on
placement and routing
Lamoreaux and Wilton
Lecture 16: Power Reduction Techniques
November 5, 2013
Modifying FPGA CAD Placement
• Previous cost metrics for annealing considered
bounding box wire length and timing costs
• Include additional term which considers signal
switching activity
Lecture 16: Power Reduction Techniques
November 5, 2013
FPGA Placement for Power
• Previous cost metrics for annealing considered
bounding box wire length and timing costs
• Include additional term which considers signal
switching activity
• Post-route energy reduced by 3.0%. Power
decreased by 7% but delay increases by 4%
Lecture 16: Power Reduction Techniques
November 5, 2013
FPGA Routing Modifications for Power
• Original routing cost function takes congestion b(n)
and delay(n) into account
• Augment with factor that takes net activity into
account
• Minimize length of most active nets, even in the
presence of congestion.
Lecture 16: Power Reduction Techniques
November 5, 2013
FPGA Routing for Power Results
• Potential benefits somewhat limited by placement
• Note that most nets have low activity
• Power is decreased by 6% but delay increased by
4%. Energy savings of about 3%
Lecture 16: Power Reduction Techniques
November 5, 2013
FPGA Embedded Memory Blocks
° Embedded memory blocks (EMBs) are important parts of FPGAs
° Consume roughly 14% of Altera Stratix II dynamic power *
• Increasing in recent designs
* Stratix II Low Power Applications Note, 2005
Lecture 16: Power Reduction Techniques
November 5, 2013
Embedded Memory Block Port Internal View
Clk Enable
Bit Line
Pre-charge
MClk
MClk
Clk
BIT
BIT
RAM cell
Row Decode
Column Mux
Write Buffers
Sense Amps
MClk
Write
Enable
MClk
Address
Read
Enable
MClk
Latch
Read Data
Write Data
Reducing clocking saves dynamic power
Lecture 16: Power Reduction Techniques
November 5, 2013
Power Optimization #1
° Convert EMB read enable/write enable signals to associated
read/write clock enable signals
° Limitations
• Each port has read or write enable control signal
• Embedded memory block has read enable input
Before
Data
Vcc
Wren
Write
Address
Data
Wr clk
enable
Write
enable
After
Q
Q
Rd clk
enable
Vcc
Read
enable
Rden
Write
Read
Address Address
Clock
Lecture 16: Power Reduction Techniques
Read
Address
Data
Wren
Vcc
Write
Address
Data
Wr clk
enable
Write
enable
Q
Q
Rd clk
enable
Rden
Read
enable
Vcc
Write
Read
Address Address
Read
Address
Clock
November 5, 2013
Implementation
° Conversion mode
• Ties off R/W enable to RAM clock enables
• Doesn’t make transform if CE already present on port
° Combining mode
• AND user RAM clock enables with derived R/W clock
• Could impact performance
Write Enable
User-defined Write Clk Enable
Lecture 16: Power Reduction Techniques
Combined Write Clk Enable
November 5, 2013
FPGA RAM Processing
FIFO, Shift Register,
RAM specification
Create
Logical
Memory
Logical-tophysical
RAM
processing
Logical RAMs/
logic
Memory/
logic
placement
Placed
Memory
RAM blocks/
logic
° FIFOs and Shift registers converted into logical RAMs
° Logical RAMs mapped to RAM blocks
Lecture 16: Power Reduction Techniques
November 5, 2013
Mapping RAM to EMBs
° Implementation choice can impact design area, performance, and
power.
° Some mappings may require multiple EMBs
User-defined
(logical) memory
16K bits
4k deep x
4 wide
Physical (EMB)
memory
4K bits
4K bits
4K bits
4K bits
M4K
M4K
M4K
M4K
512K MRAM
Lecture 16: Power Reduction Techniques
November 5, 2013
Memory Organization
° Each EMB can be configured to have different depth and width
(e.g. Stratix II M4K)
4K words deep
128 words deep
512 words deep
32 bits wide
8 bits wide
1 bit wide
° All hold 4K bits
° Slightly lower power consumption for wider EMB configurations
(not including routing)
Lecture 16: Power Reduction Techniques
November 5, 2013
Area and Delay Optimal Mapping
° Configure each EMB to be as deep as possible
° Number of address bits on each EMB same as on logical memory
° Area and performance efficient: no external logic needed
° Power inefficient: All EMBs must be active during each logical
RAM access
Vertical Slicing
Logical memory
4k words deep and 1 bit wide
(4 times)
4k words deep and
4 bits wide
4 EMBs active
during access
Addr[0:11]
EMB
Data[0:3]
Lecture 16: Power Reduction Techniques
November 5, 2013
Alternative Mapping
° Configure EMB to have width of logical RAM (e.g. 1Kx4)
• Allows shutdown of some RAMs each cycle
• But adds some logic
° Saves RAM power, adds combinational logic and register
power
Addr[10:11]
Horizontal Slicing
Addr
Decoder
1K deep x 4 wide
More Power Efficient:
Logical memory
(4 times)
Addr[0:9]
1 EMB active
during access
4k words deep and
4 bits wide
4
Addr[10:11]
Data[0:3]
Lecture 16: Power Reduction Techniques
November 5, 2013
RAM Slicing - Example
° Power reduction available with different slicing
4kx32 Dynamic Power
Dynamic Power (mW)
Multiplexer Power Increasing
140
Best
range
120
100
80
60
40
20
0
128
256
512
1k
2k
4k
EMB Power Increasing
Maximum Depth
Lecture 16: Power Reduction Techniques
November 5, 2013
Power Optimization #2: Power-aware RAM Partitioning
FIFO, Shift Register
Create
Logical
Memory
Power-aware
Physical RAM
processing
Power Library
Insert Decode
and Mux Logic
Memory/
Logic
Placement
Completed placement
° Algorithm considers possible logical to physical RAM mappings
Lecture 16: Power Reduction Techniques
November 5, 2013
Experimental Approach
° 40 designs evaluated
° Quartus 5.1
° Mapped to smallest possible device and target max frequency
° Simulation with test vectors
° Power analysis with PowerPlay
Lecture 16: Power Reduction Techniques
November 5, 2013
Memory Power
° 21.0% average reduction for all techniques (9.7% with
convert/combine)
80
Enable convert/
combine
% Dyn Power Reduction
70
60
Enable convert/
combine + Mem
partition
50
40
30
20
10
0
-10
1
3
5
7
9
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Designs
Lecture 16: Power Reduction Techniques
November 5, 2013
Overall Core Dynamic Power
° 6.8% average power reduction for all techniques (2.6% with
convert/combine)
35
Enable convert/
combine
% Dyn. Power Reduction
30
Enable convert/
combine + mem
partition
25
20
15
10
5
0
1
3
5
7
9
11
13 15
17 19
21
23
25 27
29
31
33
35 37
39
-5
Designs
Lecture 16: Power Reduction Techniques
November 5, 2013
Design Performance
° 1.0% average performance loss for all techniques (0.1% for enable
convert/combine)
Average Design Clock Frequency
10
% Frequency Improvement
5
0
-5
-10
Enable
Convert/
Combine
-15
Enable
Convert/
Combine +
Mem Partition
-20
-25
-30
Designs
Lecture 16: Power Reduction Techniques
November 5, 2013
Results Summary
° Almost 7% core dynamic power reduction across all designs
• Some designs benefit more than others
° Minimal clock frequency hit for most designs
Enable
convert
Enable
convert/
combine
Enable
convert/
combine +
Mem
partition
-1.8%
-2.6%
-6.8%
Memory dynamic
power
-6.3%
-9.7%
-21.0%
Max clk freq
-0.1%
-0.2%
-1.0%
LUT count
0.0%
0.1%
0.7%
Core dynamic power
Lecture 16: Power Reduction Techniques
November 5, 2013
Impact of Multiple Embedded Memory Blocks
° Rerun 40 designs but only allow one type of target EMB for each
mapping
° All designs targeted to Stratix II EP2S180
° Significant power impact for most designs versus EP2S180 target
with no restrictions
M512
Designs completed
Core dynamic power
Memory power
Max clk freq.
LUT count
Lecture 16: Power Reduction Techniques
M4K
M-RAM
23
38
4
40.4%
6.6%
47.3%
279.5%
33.3%
754.0%
-2.2%
0.6%
-1.0%
0.4%
-0.5%
0.0%
November 5, 2013
Summary
° Key to reducing RAM power is keeping clocks disabled.
° Movement of read/write enables to clock enables limits dynamic
activity
° Power-aware RAM partitioner attempts to select power-optimal
mapping – combined with clock enable enhancement
° Overall
• About 21% average memory power reduction
- 10% enable convert/combine
• About 7% average dynamic power reduction
- 3% enable convert/combine
• Diversity of EMBs reduces power by 33%
Lecture 16: Power Reduction Techniques
November 5, 2013
Summary
• FPGA power consumption under consideration at numerous
level: architecture, circuit, CAD, and physical
• FPGA companies just now embracing power-aware CAD,
power-aware architectures on the way
• Many circuit-level techniques still possible
• RTL CAD synthesis techniques provide a promising area for
exploration
Lecture 16: Power Reduction Techniques
November 5, 2013