Power Optimization Techniques Using Multiple VDD Presented
Download
Report
Transcript Power Optimization Techniques Using Multiple VDD Presented
Power Optimization Techniques Using
Multiple VDD
Presented by: Rajesh Panda
LOW POWER VLSI DESIGN
(EEL 6936-002)
Dr. Sanjukta Bhanja
Literature Review
1) M. Donno, L. Macchiarulo, A. Macii, E. Macii and, M. Poncino, Enhanced Clustered Voltage
Scaling for Low Power, GLSVLSI’02, 2002, New York, USA.
2) K. Usami and M. Horowitz, “Clustered Voltage Scaling technique for low-power design,” in
proc. Proc. ISLPD, April 1995.
3) Y. Yeh, S. Kyo, and J. Jou, “Converter-Free Multiple-Supply Voltage Scaling Techniques for
Low-Power CMOS Digital Design”, IEEE Trans., vol. 20, NO.1, 2001.
4) A. Chandrakasan, S. Sherng, and R. Broderson, “Low-Power CMOS digital design,” IEEE
J.Solid –State Circuits, vol. 27, April 1992.
5) J.M. Chang and M. Pedram, “Energy minimization using multiple supply voltages,” in
proc.ISLPED, 1996.
6) N.H.E. Weste and K. Eshraghian, Priciples of CMOS VLSI Design- A Systems Perspective,
2nd ed. Reading, MA: Addison-Wesley, 1992.
7) S.Raje and M. Sarrafzadeh, “Variable Voltage Scheduling,” in proc. ISLPD, Apr. 1995.
8) C. Yeh, M. Chang “Gate-Level Voltage Scaling for Low-Power Design Using Multiple Supply
Voltages,” IEE Proceedings, vol.146, No. 6, 1999.
9) V. Sunderarajan, K.K. Parhi “Synthesis of Low Power CMOS VLSI Circuits using Dual
Supply Voltages” DAC-36.
10) J.M. Chang and M. Pedram, “Energy minimization using multiple supply voltages,” IEEE
Transactions on VLSI Systems, vol. 5, 1997.
INTRODUCTION
Power Optimization has always been a major goal
in designing digital circuits.
All of the circuit determines power dissipation but
only a small fraction of the gates determine circuit
performance.
We should use high performance devices on critical
path.
Circuit Design Techniques:
1) Multiple Vdd.
2) Multiple Threshold voltages.
3) Gate Resizing.
Close up Look on Slack
The average distribution of gates with different slack for
16 benchmark circuits.
Reference : Chunhong Chen, Member, IEEE, Ankur Srivastava, Student
Member, IEEE, and Majid Sarrafzadeh, Fellow, IEEE
Multiple Vdd
Approach Idea : Determine what supply voltage level will
allow the results to arrive just in time.
Scale down Vdd
Quadratic Reduction in Power:
P = CL. Vdd2. A. f
Reduces Speed:
td = ½ . CL . Vdd [ 1/ C1 (Vdd - Vtn)2 + 1/ C2 (Vdd + Vtp)2 ]
Dual Vdd to maintain performance:
Critical Path is assigned High Vdd and Gates on the noncritical paths are assigned Low Vdd.
Level Converter
Low Vdd gates cannot drive High Vdd gates:
PMOS does not turn off
Results in flow of static current
Insertion of Level converters required:
Similar to amplifiers in memories
Problem with Level Converters
Level converters introduce a new source of power
dissipation.
They take more silicon area.
They add delay to the circuit.
Approach: We need a strategy to limit the number of
Level Converters !
Clustered Voltage Scaling
“Usami and Horowitz” proposed Clustered Voltage
Scaling Structure to limit the number of Level
converters.
CVS results in the clustering of gates in two sets: A
set of gates at high Vdd and a set of gates at low
Vdd.
CVS structure: Primary I/p -> High Vdd cells -> Low
Vdd cells -> Level Converters -> Primary O/p.
CVS Algorithm is a search algorithm which tries to
substitute as many cells as possible with low Vdd
cells while maintaining the required performance.
CVS Structure
Primary
I/p
VddH
VddL
VH
Cluster
VL
Cluster
LC
Primary
O/p
CVS Algorithm
1. Pick a new cell C connected to a primary output.
2. Substitute it with a VDDL analogous cell.
3. Perform a new static timing analysis.
4. If the new timing worsen the original one, go back to step 1.
5. Pick a cell feeding the last substituted.
6. Verify it’s viability for substitution through a DFS.
7. If the new timing worsen the original one, go back to step 5.
8. If there are unanalyzed PO cells, go back to step 1.
Reference: Monica Donno et al.
Application of Original CVS Algorithm
This is the algorithm which was used in the CVS structure proposed
by Usami and Horowitz.
7
2
3
10
5
1
6
4
Reference: Monica Donno et al.
8
9
Partial DFS Algorithm
Forward DFS -> Checks whether substitution is feasible for all
the transitive fanouts of a node or not -> Might take a long
time!
Donno et al. proposed alternative implementation to improve
results and/or execution time without changing the basic CVS.
They Proposed “Partial DFS Algorithm”.
Partial DFS Algorithm -> Stops the search whenever a node
is declared unfeasible -> Skips to the following PO -> Search
space is reduced by cutting substitutions which are not likely
to affect the results substantially -> Saves Computation time!
Application of Partial DFS Algorithm
7
2
3
10
5
1
9
6
4
Reference: Monica Donno et al.
8
Results for two Algorithms
The following result for c6288 is the biggest benchmark
circuit the authors have considered. (Monica et al.)
Algorithm
Circuit
Power Red. CPU Time
DFS
C6288
0.35%
20 Min.
Partial
DFS
C6288
0.35%
8 Min.
CFMV Scaling
Y.J. Yeh, S.Y. Kuo and J.Y.Jou proposed converter free
multiple voltage scaling technique.
Approach: No level converters at all !
How? -> Put constraints on the voltage differences
between adjacent gates !
Idea -> No static current if,
VddR > Vdd – l VtpІ
VddR : Reduced supply voltage
Vtp : Threshold voltage of PMOS
How to Determine VddR
Subthreshold effect makes the prediction of VddR
imprecise.
Solution : Determine VddR by a circuit simulator,
such as HSPICE, when the acceptable value of
static current is given.
Arrangement of Supply Voltages
Vddn-1
Primary
I/p
Cn-1
Cluster
Vssn-1
Vdd1
…
C1
Cluster
Vss1
Vdd0
C1
Cluster
Vss0
Vdd0 > Vdd1 … > Vddn-1 and ( Vddi – Vddi+1 ) > Vst
Primary
O/p
CFMV Structure
A combinational circuit can be represented as a
directed acyclic graph G = ( V,E ).
Proper Directed Cut: [ V1, V2 ] is a proper directed cut of
G if V2 contains all the sinks of G, all the boundary vertices of
G and all the vertices in their reachable set.
C1 is a proper directed cut but not C2
Algorithm for 2 supply voltages
DFS (m)
1 For (each vertex v with voltage level m) Do
2 DFS-Visit (v,m);
DFS-Visit (v,m)
1 If (v is marked) Then
2 return;
3 If (v is a sink or boundary vertex) Then
4 Mark v;
5 Else
6
For (each fanin vertex u of v) Do
7
DFS-Visit (v,m);
8 If (all the voltage levels of v’s fanins are (m+1) ) Then
9 set v’s voltage level to (m+1);
10 If (there exists negative slack) Then
11 set v’s voltage level back to m;
12 Mark v;
Reference: Yeh et al.
Results of CFMV
Circuit
CVS(5,3)
Power Red.
CVS(5,3)
CPU time
CFMV(2 way) CFMV(2 way)
Power Red.
CPU time
C432
0.11%
0.01
4.18%
0.02
C880
17.08%
0.07
14.25%
0.10
C1908
6.53%
0.06
17.36%
0.41
C6288
1.69%
0.44
8.63%
1.97
Summary
According to Yeh et al. , on average, 9 – 18%
power reduction can be obtained using the CFMV
technique.
We can observe that the CPU time in this case is
more than CVS.
I wonder, if we can we improve the CPU time by using
partial DFS algorithm here too, without substantially
affecting the results. ? ? ?
This is indeed a very challenging research topic !