Transcript Xiao

Power Saving at Architectural
Level
Xiao Xing
March 7, 2005
Purpose of Power Saving In VLSI
Circuits
• For Portability: So that portable Devices
Don’t require Batteries That are
as Large as A Brief Case.
• For Cooling:
So one does NOT Have to
Resort to Expensive Cooling
Equipment, that Might Cost more
than the Circuit you’re trying to
Cool off.
Types of Power Consumption
• Dynamic Power ( Main type of Power
Consumption)
• Short Circuit Power
• Static Power [1]
– Leakage
– Sub-threshold
Power Saving Schemes at Different
Levels
• Transistor Level [Decreasing Transistor & Interconnect
Capacitances]
• Gate-Level [Input Ordering, Tree Vs. Chain]
• Logic Level [MCML (Low Voltage Swing), Domino
(Small Device Count)]
• Architectural Level [Parallelism, Pipelining, etc]
Can Save the Most Power for Suitable Applications [2]
Pipelining to Save Power
• PDynamic = C * f * VDD2 * Alpha [3]
• Decreasing VDD has the largest Impact on
Decreasing Dynamic Power
• Decreasing VDD should also decrease Leakage
Power
• Sub-Threshold & Short-Circuit (up or down)
Power Dissipation might increase, due to the
Slightly Increased Device Count (Pipe-Line
Registers)
• Decreasing VDD will also slow down your
Circuit, But With Pipelining & Parallelism, This
Loss of Speed Can be Compensated.
Pipeline Operation Illustrated
Idea behind Pipelining for Power
Saving
• Pipelining Utilizes Parallelism to Boost the
Throughput of the Non-Pipelined Circuit
• The Throughput Boost can be Nullified by
Decreasing VDD of the Pipelined Circuit
(The Pipelined Circuit Now has Roughly
the Same Throughput as the NonPipelined Circuit)
• But the Decreased VDD  Decreased
Dynamic Power Consumption
Pipelined Data Path for a RISC Micro-Processor
Enable from
Control Unit
16 or 32-bit instructions
16 bit value in Read Register 1
.
.
.
.
16+7+3+3+3
=
32
Flip
Flops
Instruction
Fetch /
Register
Access
Pipe-Line
Registers
Signal from
Control Unit
Indicating if
2 writes
Are NEEDed
Enable from
Control Unit
.
.
MOST
Significant
16-bit of ALU
Output result
16-bit
Immediate
Value
.
.
7-bit
Op Code
Register
File
.
.
3-Bit
addressing
The read
register 1
3-Bit
addressing
The read
register 2
3-Bit
addressing
The
Destination
Register
.
.
.
.
16+16+7+3
=
42
7-bit
OP-Code
Flip
Flops
16+16+3+1
=
36
ALU
Flip
Flops
.
.
Register
Access /
Execute
.
.
Enable from
Control Unit
Pipe-Line
Registers
16 bit value in Read Register 2
Execute /
Write-Back
.
.
Least
Significant
16-bit of ALU
Output result
.
.
Pipe-Line
Registers
16 or 32-bit data written back
•
Actual Circuit Utilized To Analyze
Pipelining as a Viable Power Saving
A 32-Bit Shift RegisterScheme
– Not Large Scale, Transparent to Implement
– 32 Flip-Flops, Pipelined to 4 Stages, requiring 3 Extra Flip-flops,
with Each Extra Flip-Flop Serving as the Corresponding PipeLine Register
– Power Ratio is 10+ : 1 (Possibly 1 of the Better Cases, Almost
Trivializing the Power by the Pipeline Registers), So Power
Saved by Decreasing VDD, should Substantially Out-Weight the
Extra Power of the Extra Flip-Flops
– Power Ratio Comparable to that of a VLSI with its necessary
Pipe-Lined Registers (the # of the FF ‘s Required Generally
proportional to the Size of the VLSI Circuit)
– Parallel Version, Parallel + Pipelined Version
– Layout of the Flip-Flop For Power/Area, Simulation/Estimation
– Interested in the Relative % (Should be Applicable to a Bigger
Picture) Power Saved
Architecture Analyzed
• Plain Shift-Register
– 32 Flip-Flops
– VDD at Max (2.5 or 3V for CMOSP18)
– Input Rate == 1 Bit Inputted (Processed) Every 32
Clock Cycles
– Clock Period decreased to find out the Maximum
Operating Frequency (By Looking at Waveform
Quality, and Voltage Swing)
– Throughput = Input Rate * Frequency
= (1 Bit/ 32 Cycles) * (f cycles/second)
= x Bit/Second
Architecture Analyzed
• Pipelined Shift-Reg
– 35 Flip-Flops
– Input Rate == 1 Bit Inputted Every 8 Clock
Cycles
– VDD, f initially same as that of Plain Version,
then Drop to Achieve the same Through-Put
8 Flip-Flops
1
8 Flip-Flops
1
8 Flip-Flops
1
8 Flip-Flops
Architecture Analyzed
• Parallel Shift-Reg
– 64 Flip-Flops, 1 Demux, 1 Mux
– Input Rate = 2 Bits Inputted Every 32 ClockCycles
– VDD, f initially same as that of Plain Version,
then Drop to Achieve the same Through-Put
32 Flip-Flops
MuX
De-Mux
32 Flip-Flops
Architecture Analyzed
• Pipelined + Parallel
– 70 Flips-Flops, 1 Mux, 1 DeMux
– Input Rate = 2 Bits Every 8 Clock Cycles
– VDD, f initially same as that of Plain Version,
then Drop to Achieve the same Through-Put
8 Flip-Flops
1
8 Flip-Flops
1
8 Flip-Flops
1
8 Flip-Flops
8 Flip-Flops
1
8 Flip-Flops
1
8 Flip-Flops
1
8 Flip-Flops
Summary
• The Effectiveness of Architectural Approaches
(Pipelining, Full-Parallelism, etc) as Viable
Power-Saving Schemes for Digital IC ‘s, will be
Simulated on a Smaller Scale.
• The Resulting Relative Percentage PowerSaved, should be Applicable on a Grander
Scale.
• Pipelining An Average VLSI circuit, May need
more than 10% of Hardware/Power for the PipeLine Registers (Flip-Flops)
Time Table
• Feb 1
 March 1: Literature Survey
• March 8
 March 12 : Layout
• March 14  March 18: Simulating Serial & Pipelined Versions
• Mach 19  March 23: Simulating Parallel & The Combo Version
• March 24  End of March: Preparing for the Final Presentation
• April 1st  April 15: Write up the Final Report
References
[1]. Jan. M Raebaey, “Digital Integrated
Circuits”, 2nd Ed., Prentice Hall, 2003
[2]. Jerry Frenkil, “A Multi-Level Approach to LowPower IC Design”, IEEE Spectrum, Vol 35, Number 2,
1998
[3]. Anantha P. Chandrakasan, “Low Power CMOS Digital
Design, IEEE Journal of Solid State Circuits, pp. 473
-- 484, 1992
[4]. K.K. Parhi, "Low-Power Digital VLSI Approaches", Chapter in
Circuits and Systems in the Information Age , Edited by Y. Huang
and C. Wei, pp. 3-22, IEEE Press, June 1997 (ISCAS-97 Tutorial
Book)
Aside
•
Portability:
If your portable device is very power hungry,
and Knowing the limited advancement there
has been/will be in terms of Battery Capacity,
one would need a Very Large Battery to
expect it to keep going and going.
Intel CPUs getting hotter and hotter than they
used to be, and Average House hold Maybe
able to afford a CPU, but not necessarily
something as Drastic as a Vapor Cooling
Computer Case.
Application Suitability for Pipelining-For-Power-Saving:
1) Power Consumption of the VLSI being pipelined, must >>
the Power Consumption of the Pipeline Registers.
2) Large & Complex Data Dependency  Large & Complex
3) Huge Discrepancy between the delays of the Pipeline stages
(1 + 1 + 1000 clock Cycles)