Advanced VLSI Design - WSU EECS

Download Report

Transcript Advanced VLSI Design - WSU EECS

EE 587
SoC Design & Test
Partha Pande
School of EECS
Washington State University
[email protected]
1
Power & Low Power Design
Physical Design Methodologies
2
Metric1 : Power
• Ref. 5.9 of HJS
• If we improve a design relative to power but it slows down the
circuit, then it might not be acceptable
• Comparing the power of two designs might be misleading
• the lower power design might just be slower
3
Metric 2 : Energy / Operation
• Rather than looking at power, look at the total energy needed to
complete some operation. Fixes obvious problems with the
Power metric, since changing the operating frequency does not
change the answer
4
Metric 3 : EDP
5
Energy vs. Delay
6
Technology Optimization
• Energy per transition is proportional to Vdd2
• When the supply voltage approaches the threshold then
delay increases significantly
7
Technology Optimization
• Modification of the threshold voltage
• Reduction of threshold voltage and supply reduction is offset by an
increase in leakage current
8
Transistor Sizing
• Optimum transistor sizing
• The first stage is driving the gate capacitance of the second and the
parasitic capacitance
• input gate capacitance of both stages is given by NCref, where Cref
represents the gate capacitance of a MOS device with the smallest
allowable (W/L)
9
Transistor Sizing
•
•
•
•
•
When there is no parasitic capacitance contribution (i.e., α = 0),
the energy increases linearly with respect to N and the solution of
utilizing devices with the smallest (W/L) ratios results in the
lowest power.
At high values of α, when parasitic capacitances begin to
dominate over the gate capacitances, the power decreases
temporarily with increasing device sizes and then starts to
increase, resulting in a optimal value for N.
The initial decrease in supply voltage achieved from the reduction
in delays more than compensates the increase in capacitance due
to increasing N.
after some point the increase in capacitance dominates the
achievable reduction in voltage, since the incremental speed
increase with transistor sizing is very small
Minimum sized devices should be used when the total load
capacitance is not dominated by the interconnect
10
Power Dissipation in Interconnects
• In the deep-submicron era, interconnect wires (and the
associated driver and receiver circuits) are responsible for an
ever increasing fraction of the energy consumption of an
integrated circuit.
• Most of this increase is due to global wires, such as busses and
clock and timing signals.
• More than 90% of the power dissipation of traditional FPGA
components (over a wide range of applications) is due to the
interconnect
• For gate array and cell library based designs it has been found
that the power consumption of wires and clock signals can be
up to 40% and 50% of the total on-chip power consumption
respectively.
11
Energy Metric
Edyn  (Cw  CL )  VDD  Vswing
12
Low-swing Circuits
Conventional Level Converter
• Extra power rail
• Special low-Vt device needed
13
Dynamically-Enabled Drivers
VDD
REF
PRE
in
REF
EN
CL
out
SA
EN2
• The basic idea is to control the charging/discharging time of the
drivers so that a desirable swing on the interconnect is obtained.
• Wire is floating when the driver is disabled
14
Low Swing Bus
• Power dissipated in an n-bit
bus
2
P  n  f  Cw  VDD
• Increasing
the number of switching bits n causes a proportional
increase in power dissipation
15
Low Swing Bus
• The voltage swing can be reduced by using an
additional bus wire, called the dummy ground
n
nCw
• This dummy ground is initially discharged to the
real ground level and then immediately isolated
from the ground.
• The charge of bus wiring capacitance is
discharged to the dummy ground instead of the
real ground.
Dummy Ground
1
Cw
• When n bits of the bus signals switch from “I” to
“0,” the voltage swing is reduced to
Vswing
Vdd

n  1
16
Low Swing Bus
• The bus power dissipation
required to switch n bits of
the bus is given as
P  n  f C w Vswing  Vdd

n
 f  Cw  Vdd2
n 1
• The
voltage swing is further reduced as the number of switching bits
increases
17
SSDLC
VDD
VDD
VDD
P1
in2
in
CL
P2
N3
A
P3
B
N1
out
N2
• Symmetric Source-Follower Driver with Level Converter
• The driver limits the interconnect swing from Vtn to Vdd-Vtn
• Assume that node in2 goes from low to high; Vtn to Vdd-Vtn.
• Initially, node A sits at Vtn and node B sits at Ground.
• During the transition period, with both N3 and P3 conducting, A and B rise to Vdd-Vtn
• Consequently, N2 is turned on, and out goes to low. The feedback transistor PI pulls A
further up to Vdd to cut off P2 completely. in2 and B stay at Vdd-Vtn.
18
Level Converter with Low-Vt Device
Enew  REF 

 
E full  Vdd 
2
19
Gated Clocks
MSB
REG
CLK
REG
For
Bits
Logic
Block
REG
MSB
Comparator
A>B
0-N-2
Comparator
A>B
For
Bits 0-(N-1)
REG
For
Bits
Conditionally
Switched
CLK
0-N-2
Gated Clock
20
Low Power Through Circuit Design
• Low-Power Logic Styles: CMOS Versus Pass-Transistor Logic
by Zimmermann and Fichtner
• Power savings through proper choice of logic styles
– Switching Capacitance
– Transition Activity
– Short Circuit Currents
• Power dissipation of various logic styles need to be analyzed
21
Circuit Design Styles
• Nonclocked Logic
– CMOS, Pseudo-NMOS, Differential Cascade Voltage Switch
(DCVS), Pass-Transistor
• Clocked Logic
– Domino, Differential Current Switch Logic (DCSL)
22
Complementary CMOS - Advantages
• Simple monotonic gates can be realized very efficiently with
only a few transistors, one signal inversion level, few circuit
nodes
– Area and Power reduces, delay reduces
• Robustness against voltage scaling and transistor sizing
• Input signals are connected to gate inputs only
23
Complementary CMOS- Disadvantages
• Large PMOS transistors
– Area, Power, Delay increase
• Series transistors in the output stage
– Weak output driving capability
• Delay increases
24
Pseudo-NMOS Logic



Reduced complexity of logic
and hence, lower capacitance,
and faster speed
Ratioed Logic, better suited for
large fan-in design
Static Current
 Power Dissipation is high
25
Performance of Pseudo-nMOS
Size, W/Lp
Logic 0 voltage
Logic 0 static
power
Delay
0→1
4
0.693 V
564 μW
14 ps
2
0.273 V
298 μW
56 ps
1
0.133 V
160 μW
123 ps
0.5
0.064 V
80 μW
268 ps
0.25
0.031 V
41 μW
569 ps
J. M. Rabaey, A. Chandrakasan and B. Nokolić, Digital Integrated
Circuits, Upper Saddle River, New Jersey: Pearson Education, 2003.
26
Negative Aspects of Pseudo-nMOS
• Output 0 state is ratioed logic.
• Faster gates mean higher static power.
• Low static power means slow gates.
27
DCVS Logic



No static power
dissipation
Speed advantage of
ratioed logic
Has larger area and
switched capacitances
28
Pass-Transistor Logic Styles
• One pass-transistor network is sufficient to perform the logic
operation
– Smaller no. of transistors, smaller input loads
• Threshold Voltage Drop
– Swing restoration Circuit required
• Multiplexer Structure
– Dual Rail Logic required
29
Complementary Pass-Transistor Logic (CPL)






Small input loads
 Power and delay reduces
Efficient XOR and MUX
implementation
Good output drive
Cross-coupled pull-up
 Large short-circuit current
Substantial number of nodes
Inefficient realization of simple gates
30
Double Pass-Transistor Logic (DPL)


Both PMOS and NMOS logic
networks are used in parallel
 Full swing on the output
signals
Number of transistors and the
number of nodes are quite high
 Substantial capacitive load
31
Swing Restored Pass-Transistor Logic
(SRPL)




Derived from CPL, Output inverters
are cross-coupled to a latch structure
 Swing restoration and output
buffering at the same time
Transistor sizing is difficult, poor
output driving capability
Slow switching
Large short-circuit current
32
Single-Rail Pass-Transistor Logic (LEAP)


Single NMOS networks are
required
 Area, Power, Delay
decreases
Swing restoration only works for
Vdd Vtn  Vtp

Robustness in the low voltages
is not guaranteed
33
Comparisons between CMOS and PassTransistor
• Pass-Transistor logic is claimed to be the low-power logic styles
– All the comparisons were based on the full adder
implementation
• Not representative
• Full adders have limited importance even in arithmetic circuits
34
Comparisons between CMOS and PL
• Higher Performance for CPL over CMOS in case of full adder
implementation
• In case of multiplexer and other monotonic gates CMOS
outperforms others
• In case of XOR CPL is faster, but power-delay product is more
• CPL provides best performance among all pass-transistor
design styles
35
Domino Logic







Nonratioed logic – sizing of
pMOS transistor is not important
for output levels.
Higher Speed
Only implements noninverting
logic gates
Best suited for large fan-in gates
Switching activity is high
Lower noise immunity
Large clock load
36
Logic Activity
• Probability of 0 → 1 transition:
– Static CMOS, p0 p1 = p0(1 – p0)
– Dynamic CMOS, p0
• Example: 2-input NOR gate
– Static CMOS, Pdyn = 0.1875 CLVDD2fCK
– Dynamic CMOS, Pdyn = 0.75 CLVDD2fCK
p1=0.5
p1=0.5
p1=0.25
p0=0.75
37
Selecting a Logic Style
• Static CMOS: most reliable and predictable, reasonable in power
and speed, voltage scaling and device sizing are well understood.
• Pass-transistor logic: beneficial for multiplexer and XOR dominated
circuits like adders, etc.
• For large fanin gates, static CMOS is inefficient; a choice can be
made between pseudo-nMOS, dynamic CMOS and domino CMOS.
38