VLSIpresentation

Download Report

Transcript VLSIpresentation

Low Power SRAM
VLSI Final Presentation
Stephen Durant
Ryan Kruba
Matt Restivo
Voravit Vorapitat
High Level Architecture
ADDR
5:32 Block
Enable
SRAM
Block
OUT
Block I/O
READ
WRITE
ADDR
DATA
BLK ENABLE
SRAM Block
OUT
Output Buffering
SRAM
Block
Sense
Amp
BLK_EN0
BLK_EN1
ADDR 13:12
2:4
Decode
r
ADDR 14
Out
BLK_EN2
BLK_EN3
Block Level Architecture
BLK_EN
CLK
Pulse
Gen 1
Precharge
Decode
r
6:64
ADDR
SRAM Block
BLK_EN
Pulse Gen 1
Pulse
Gen 2
Dela
y
S
A
S
A
S
A
Write
S
A
Input Gating
READ
WRITE
Register
ADDR
DATA
Buffer
ADDR 14:13
SRAM Block
x8
Word Line Pulse
WL



WL
Pulse
Pulse WL to reduce the drop in bit line voltage
during a read
Size the inverters to create min WL pulse length
min WL pulse occurs before the point where the
sense amp can no longer execute a read
Sense Amp Enabling

Sense amp enabled after WL pulse to
maximize differential current
 Wordline
pulse generator clocks a second
pulse generator to ensure proper SA timing

SAE signal and precharge signal separate
to allow outputs to hold to end of clock
cycle
Sense Amp
VDD
SAE
SAE
BL
VDD
BLB
SAE

Size the three nmos transistors to control:
 Bit
line voltage drop
 Delay
Gate Length Vs. Bit Line Voltage Drop
Using a 5 V vdd and allowing OutB to drop to 4 V min
2
1.8
1.6
bit line voltage drop
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
1
2
3
4
l e ngt h
5
6
7
Delay from SAE to Out
2.5
2
Time (nS)
1.5
1
0.5
0
0
1
2
3
4
Le ngt h
•From 50% SAE high to 50% Out low
•Same parameters as bit line voltage graph
5
6
7
Memory Partitioning

32 blocks *256 rows *128 columns
 balance
between idle block power savings
and peripheral circuitry
 resulting block aspect ratio relatively square
to limit maximum WL/BL capacitances
 WL partitioning and four words/row to reduce
power
Simulation Model

Multiple voltage
sources to accurately
measure energy
Sense Amp
 Wordline,
active
column, inactive
column, and peripheral

Etotal =
EWL+32Eact+96Einact+
Eperipheral
Bitcell
VSS
Sense Amp
VSS
Dummy Cells
Bitcell
VSS
VSS
Dummy Cells
VSS
Dummy Cells
VSS
Bitcell
Dummy Cells
Bitcell
Low Power Techniques
Optimal Signal Order for Energy
Goal: Making WL pulse as short as possible.
Read
 SAE must be asserted only after WL pulse
ends.
Write
 WL pulse must start after BL or BLB
completely discharged.
Write ’0’
CL
K
BL
WL
SA
E
Read
Write ’1’
Read
Lower Vdd
Energy=CeffVdd2
(Rail to Rail)
-Expected quadratic energy reduction
Energy=CeffVdd∆V (BL/BLB during read)
- ∆V should scale down but may not be as
fast as Vdd so we expect between linear
and quadratic energy reduction.
Simulation Result for 1 bit
Energy vs Vdd for 1 bit read/write
6
5
Energy (pJ)
4
WL
3
Write
Read
2
DRead
1
0
0
0.5
1
1.5
Vdd (V)
2
2.5
3
Note: The Read/Write/Dread shown here is BL energy only
Energy vs Vdd for 32 bit read/write
250
Energy (pJ)
200
150
TotalWrite
TotalRead
100
Total Average
50
How far should we go?
0
0
0.5
1
1.5
Vdd (V)
2
2.5
ERe ad 32  EWL  32ERe ad  96ED Re ad
EWrite32  EWL  32EWrite  96ED Re ad
3
Clock Gating
Try to reduce the capacitance that high
activity signal have to drive.
 Example: WL Pulse which have to drive
256 of 2-input NAND!

Level 0
EffLoad=256
Level 1
EffLoad=128+2
Even Further
Level 2
EffLoad=64+4
Simulation Result
Energy vs Clock Gating Level
25
Energy (pJ)
20
15
10
5
0
0
0.5
1
1.5
2
Level of Clock Gating
2.5
3
3.5
Some note about clock gating

It act like a decoder, in our design we
choose to use level 2 clock gating for WL
pulse so we did not need 8 to 256 decode
any more, we just need the 6 to 64.