Diskreetne Matemaatika. S.

Download Report

Transcript Diskreetne Matemaatika. S.

IAY 0600
Digitaalsüsteemide disain
Low Power Design
Lab. 6
Alexander Sudnitson
Tallinn University of Technology
Motivation for Low Power Design



Low power design is important from different
reasons
Device temperature
– Failure rate, Cooling and packaging costs
Life of the battery
– Meantime between charging, System cost
Environment
– Overall energy consumption
2
Problems of High Power Dissipation

Continuously increasing
performance demands

Increasing power dissipation of
technical devices

Today: power dissipation is a main
problem

High Power dissipation leads to:
 Reduced time of operation
 High efforts for cooling
 Higher weight (batteries)
 Increasing operational costs
 Reduced mobility
 Reduced reliability
3
Trends - Power Density
Nuclear Reactor →
←Hot Plate
Source: http://cpudb.stanford.edu/
Power and Energy

Power is drawn from a voltage source attached to the
VDD pin(s) of a chip.

Instantaneous Power:


Energy:
Average Power:
P(t )  iDD (t )VDD
T
T
0
0
E   P(t )dt   iDD (t )VDDdt
T
E 1
Pavg    iDD (t )VDD dt
T T 0
5
Metrics: Energy and Power

Energy




Measured in Joules or kWh
“Measure of the ability of a system to do work or produce a
change”
“No activity is possible without energy.”
Power



Measured in Watts or kW
“Amount of energy required for a given unit of time.”
Average power
• Average amount of energy consumed per unit time
• Simplified to "power" in clear contexts

Instantaneous power
• Energy consumed if time unit goes to zero
6
Low Power or Low Energy Design
E(T) = ∫

Power
–
–

P(t) dt
Direct impact on instantaneous energy consumption
and temperature
Power consumption is critical for heat dissipation
limited systems
Energy
–
–
Power integrated over time is energy and impact on
battery shelf life and environment
Energy consumption is critical for battery-powered
systems
7
CMOS


We will restrict our attention to CMOS devices, this
technology being the most widely adopted in current
VLSI systems.
– Static, complementary CMOS gates are
remarkably efficient in their use of power to
perform computation
– However, leakage increasingly threatens to drive
up chip power consumption
We consider inverter as circuit used for power
consumption analysis
8
Consumption in CMOS



Voltage (Volt, V)
Current (Ampere, A)
Energy
Water pressure (bar)
Water quantity per second (liter/s)
Amount of Water
1
CL
0
Energy consumption is proportional to capacitive load!
9
CMOS NAND Gate
A
B
Y
0
0
1
0
1
1
1
0
1
1
1
0
OFF
A=1
B=1
OFF
Y=0
ON
ON
3-input NAND Gate


Y pulls low if ALL inputs are 1
Y pulls high if ANY input is 0
Y
A
B
C
Power consumption analysis
VDD

Vout
Vin


Static dissipation due to leakage
circuit
Short-circuit dissipation
Charge and discharge of a load
capacitor
GND
P = Pdyn + Psc + Plk
–
Pdyn is dynamic or switching power (is due to charging and
discharging load capacitances);
–
–
Psc is shirt-circuit power;
Plk is leakage power (is static in nature)
12
Dynamic Energy Consumption
Vdd
Transition
Power
Vin
Vout
CL
Energy/transition = 1/2*CL * VDD2
Total energy (both charge and discharge) = CL * VDD2
Power =
CL * VDD2 * f
13
Dynamic Energy Consumption
Short-circuit
Power
Vdd
Vin
Vout
CL
Energy/transition = tsc * VDD * Ipeak * P 0/11/0
Power =
tsc * VDD * Ipeak * f
14
Leakage Energy
Vout
OFF
Gate leakage
Drain
junction
leakage
Subthreshold
current
Independent of switching
15
Define and quantity power

For CMOS chips, traditional dominant energy
consumption has been in switching transistors, called
dynamic power
2
Powerdynamic  1/ 2  CapacitiveLoad  Voltage  FrequencySwitched

For mobile devices, energy better metric
2
Energydynamic  CapacitiveLoad  Voltage

For a fixed task, slowing clock rate (frequency
switched) reduces power, but not energy

Dropping voltage helps both
16
Energy and performance

In some cases, energy can be saved by reducing
performance.
– Speed decreases linearly, power decreases as V2.
– Power goes down faster than performance.

Example of quantifying power
Suppose 15% reduction in voltage results in a 15% reduction in
frequency. What is impact on dynamic power?
Powerdynamic  1 / 2  CapacitiveLoad  Voltage  FrequencySwitched
2
 1 / 2  .85  CapacitiveLoad  (.85Voltage)  FrequencySwitched
2
 (.85)3  OldPowerdynamic
 0.6  OldPowerdynamic
17
Activity Factor


Suppose the system clock frequency = f
Let fsw = af, where a = activity factor
–
–

If the signal is a clock, a = 1
If the signal switches once per cycle, a = ½
Dynamic power:
Pdynamic  aCVDD2 f
18
Power Equations in CMOS
P = α f CL VDD2 + VDD Ipeak (P01 + P10 ) + VDD Ileak
Dynamic power
(≈ 40 - 70% today
and decreasing
relatively)
Short-circuit power
Leakage power
(≈ 10 % today and
decreasing
absolutely)
(≈ 20 – 50 %
today and
increasing)
19
Trends cont‘d
Power Dissipation [W]
(100 mm² Chip)
1400
Power Dissipation by
Leakage currents
1200
1000
800
Dynamic Power
Dissipation
600
400
200
0
90 nm 65 nm 45 nm 32 nm 22 nm 16 nm
Technology
Technologie
Source: S. Borkar (Intel), ‘05
20
Rules for reducing power consumption



Turn it off.
– Eliminates leakage current.
Slow it down, reduce voltage.
– Performance is linear with clock frequency.
– Power is V2.
Don’t change its inputs.
– Activity-dependent.
21
Logic/circuit optimizations

Turn off gate where possible.
– Not an option in most FPGAs, but it should be.

Operate gate at low voltage.
– Speed decreases linearly, power decreases as V2.
22
Transition Probabilities for CMOS Cells
Example: Static 2 Input NOR Cell
Probability is the measure of the likeliness that an event will occur.
If A and B with same input signal probability:
Truth table of NOR2 cell
A
B
Out
1
1
0
0
1
0
1
0
0
0
0
1
PA=1 = 1/2
PB=1 = 1/2
Then:
POut=0 = 3/4
POut=1 = 1/4
P0→1
= POut=0 * POut=1
= 3/4 * 1/4 = 3/16
Ceff = P0→1 * CL = 3/16 * CL
Copyright Sill Torres, 2012
23
Transition Probabilities cont’d






A and B with different input signal probability:
PA and PB : Probability that input is 1
P1
: Probability that output is 1
Switching activity in CMOS circuits: P01 = P0 * P1
For 2-Input NOR: P1 = (1-PA)(1-PB)
Thus: P01 = (1-P1)*P1 = [1-(1-PA)(1-PB)]*[(1-PA)][1-PB] (see next slide)
P01 = Pout=0 * Pout=1
NOR
(1 - (1 - PA)(1 - PB)) * (1 - PA)(1 - PB)
OR
(1 - PA)(1 - PB) * (1 - (1 - PA)(1 - PB))
NAND
PAPB * (1 - PAPB)
AND
(1 - PAPB) * PAPB
XOR
(1 - (PA + PB- 2PAPB)) * (PA + PB- 2PAPB)
Copyright Sill Torres, 2012
24
Logic Restructuring
 Logic restructuring: changing the topology of a logic
network to reduce transitions
AND: P01 = P0 * P1 = (1 - PAPB) * PAPB
0.5
A
B
0.5


(1-0.25)*0.25 = 3/16
W
7/64 = 0.109
X
15/256
C
F
0.5
D
0.5
0.5 A
0.5 B
0.5
C
0.5 D
3/16
Y
15/256
F
Z
3/16 = 0.188
Chain implementation has a lower overall switching activity than tree
implementation for random inputs (consumes less power due to
differences in the total transition probabilities of the gates).
Minimized area does not result in minimum power.
Source: Jan M. Rabaey
25
Input Ordering
(1-0.5x0.2)*(0.5x0.2)=0.09
0.5
A
B
0.2
X
C
0.1
F
(1-0.2x0.1)*(0.2x0.1)=0.0196
0.2
B
X
C
F
0.1
A
0.5
AND: P01 = (1 - PAPB) * PAPB
Beneficial: postponing introduction of signals with
a high transition rate (signals with signal
probability close to 0.5)
Source: Jan M. Rabaey
26
Glitching
A
B
X
Z
C
ABC
101
000
X
Z
Unit Delay
This hazard may be propagated through additional logic levels and result
in multiple gate output transitions before the circuit resolves to a final
state, even if the final state is unchanged from the previous state.
Source: Jan M. Rabaey
State assignment for low power
State assignment for low power has also been explored. In
general, the state assignment problem has targeted
minimizing area, and this approach tends to reduce power as
well.
Low-power state assignment techniques assignment augment
the state transition graph of the state machine with the state
probabilities and transition probabilities between states, and
use these probabilities to guide the state assignment.
Adjacent binary encodings are assigned to states connected
with high probability edges of the graph. This minimizes the
number of state signal transitions, thus attempting to
minimize transitions in the next state and output signal
combinational logic.
One approach attempts to minimize area in conjunction with
switching activity by generating multiple sets of state
encodings with similar switching energy costs from which a
final assignment is chosen on the basis of area.
28
State assignment impact on power
counter encoding
S1
S2
S0
S7
S3
S6
S4
S5
State
S0
S1
S2
S3
S4
S5
S6
S7
Total number of
transistors
Max transitions
per clock cycle
Clock load
Gray Code Binary Code
000
001
011
010
110
111
101
100
8
000
001
010
011
100
101
110
111
14
1
3
3
3
Table compares Gray and binary state assignments. Comparison
shows that Gray technique reduces both the average number of
logic transitions per clock and the overall number of transitions for
a cycle of the state machine.
29
Dynamic power management
Dynamic Power Management (DPM) is a design
methodology to reduce the power dissipation by
disabling the parts of the circuit that are inactive
 Design methodology to control power versus performance

Frequency control → clock gating

Voltage control → shutdown
 Control can be located in hardware:

Example: gated-clock controller
 Control can be located in software:

Example: hard-disk power management
(c) Giovanni De Micheli
30
Register-transfer optimizations

Hold inputs when a unit’s output will not be used.
– Put register at inputs.

Turn off units when they won’t be used for several
cycles.
– Can’t selectively turn off LEs in most FPGAs.
– Not an option in most FPGAs, but it should be.
31
Guard Evaluation
Guard
Latches
Combinational
Logic
S
Guarded evaluation relies on input blocking for transition reduction.
Transparent latches are added to inputs of existing logic and are
appropriately disabled when the logic output can be determined
without new input values being driven from the disabled latches. This
technique is common in the design of datapath functions in lowpower processors.
32
Clock gating
D
CLK
&
Q
C
Enabl
Gated clocking eis a commonly applied technique used to reduce
power by gating off of clock signals to registers, latches, and
clock regenerators. Gating may be done when there is no
required activity to be performed by logic whose inputs are
driven from a set of storage elements. Since new output values
from the logic will be ignored, the storage elements feeding the
logic can be blocked from updating to prevent irrelevant
switching activity in the logic.
33
Circuit with clock drivers and clock gating
CLK
R1
CL3
R2
CL4
R3
R4
CLK GATING SIGNAL
&
CLK
R1
R2
CL3
R3
CL4
R4
34
FSM Stochastic Analysis




Given the FSM description and the input probabilities, the probabilistic
behavior of a FSM can be studied by regarding its transition structure
as a Markov chain.
A Markov process is a stochastic process, where the past has no
influence on the future. In other words, the future behavior depends
only on the current state of the process (a “Markov property”). Markov
process is called a Markov chain (MC) if its state space is discrete
(either finite or countable).
One example of MC is the process of playing a board game, where
player's next action is determined entirely by rolling a dice. In order to
make a move, one takes into account only the current state of the
board. It doesn't really matter how the game progressed to that state.
Alternatively, in a card game player's move is motivated not only by
the cards he or she currently holds, but also the cards which have
already been used during the course of the game.
Using steady state probabilities, which are received in the result of
such analysis, it is possible to build different kinds of quantitative
estimations of FSM’s stochastic behavior.
35
A Case Study: Low-Power Design


To demonstrate the use of applets in conjunction
with FPGA-based development boards, the
procedure of computational kernel extraction and
implementation will be considered in Lab.
Sequential circuits may have an extremely large
number of reachable states, but probabilistic
analysis show that during normal operation only a
relatively small subset is actually being visited. A
power optimization paradigm is based on the
concept of computational kernel, a highly
optimized logic block, which mimics the steadystate behaviour of the original specification.
36
Probability distribution of the FSM
The first step of computational kernel extraction procedure
is probabilistic analysis of the FSM.
State
Steady state
probability
init0
0.5000001408
init1
0.3346775136
init2
0.0877016376
init4
0.0584677584
IOwait
0.0161290368
read0
0.0006720432
write0
0.0006720432
RMACK
0.0006720432
WMACK
0.0006720432
read1
0.0003360216
It is seen that FSM “opus”-benchmark spends 83% of its
operation time in states “init0” and “init1”.
37
Decomposed FSM network


After computational kernel is identified, it should be
separated from the rest of the circuit.
The applet of additive decomposition is used to divide the
original circuit into two alternatively working sub-FSMs.
38
Implementation summary


VHDL description for prototype FSM and decomposed
network can be generated by decomposition applet. This
descriptions are used to implement and verify both designs
using FPGA-based development board.
XPower Analyzer is a tool for power consumption
estimation featured in Xilinx ISE. It is used to evaluate the
quality of the decomposed design in comparison with the
original.
Design
Area (LUTs)
Power Consumptions (mW)
Original
25
4.65
Decomposed
36
1.85
As it is seen from the table, the dynamic power
consumption has been reduced by the factor of 2.5, while
area overhead is 44%.
39
Decomposition applet
40