Transcript BL(1)

ECE 300
Advanced VLSI Design
Fall 2006
Lecture 19: Memories
Yunsi Fei
[Adapted from Jan Rabaey et al’s Digital Integrated
Circuits ©2002, PSU Irwin & Vijay © 2002, and
Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]
Digital Integrated Circuits Chpt. 12
Review: Basic Building Blocks

Datapath
– Execution units
» Adder, multiplier, divider, shifter, etc.
– Register file and pipeline registers
– Multiplexers, decoders

Control
– Finite state machines (PLA, ROM, random logic)

Interconnect
– Switches, arbiters, buses

Memory
– Caches (SRAMs), TLBs, DRAMs, buffers
Digital Integrated Circuits Chpt. 12
A Typical Memory Hierarchy

By taking advantage of the principle of locality:
– Present the user with as much memory as is available in the cheapest
technology.
– Provide access at the speed offered by the fastest technology.
On-Chip Components
Control
eDRAM
Instr Data
Cache Cache
Speed (ns):
.1’s
1’s
10’s
100’s
Size (bytes):
100’s
K’s
10K’s
M’s
Cost:
RegFile
Datapath
ITLB DTLB
Second
Level
Cache
(SRAM)
highest
Digital Integrated Circuits Chpt. 12
Main
Memory
(DRAM)
Secondary
Memory
(Disk)
1,000’s
T’s
lowest
Semiconductor Memories
RWM
NVRWM
ROM
Maskprogrammed
Random
Access
Non-Random
Access
EPROM
SRAM
(cache,
register file)
FIFO/LIFO
E2PROM
DRAM
Shift Register
FLASH
CAM
Digital Integrated Circuits Chpt. 12
Electricallyprogrammed
(PROM)
Growth in DRAM Chip Capacity
1000000
256,000
100000
Kbit capacity
64,000
16,000
10000
4,000
1000
1,000
256
100
64
10
1980
1982
1984
1986
1988
1990
1992
Year of introduction
Digital Integrated Circuits Chpt. 12
1994
1996
1998
2000
1D Memory Architecture
m bits
m bits
S0
Word 0
S0
Word 0
S1
Word 1
S1
Word 1
S2
Word 2
A0
S2
Word 2
A1
S3
S3
Storage
Cell
Storage
Cell
Ak-1
Sn-2
Word n-2
Sn-2
Word n-2
Sn-1
Word n-1
Sn-1
Word n-1
Input/Output
n words  n select signals
Digital Integrated Circuits Chpt. 12
Input/Output
Decoder reduces # of inputs
k = log2 n
2D Memory Architecture
bit line
2k-j
word line
Aj
Aj+1
storage
(RAM) cell
Ak-1
m∙2j
A0
A1
Aj-1
Column Decoder
Sense Amplifiers
Read/Write Circuits
Input/Output (m bits)
Digital Integrated Circuits Chpt. 12
selects appropriate
word from memory row
amplifies bit line swing
3D Memory Architecture
Input/Output (m bits)
Advantages:
1. Shorter word and/or bit lines
2. Block addr activates only 1 block saving power
Digital Integrated Circuits Chpt. 12
Precharged MOS NOR ROM
Vdd
0  1 precharge
WL(0)
GND
0  1 WL(1)
WL(2)
GND
WL(3)
BL(0)
1
0
Digital Integrated Circuits Chpt. 12
BL(1)
1
1
BL(2)
1
1
BL(3)
1
0
MOS NOR ROM Layout
Metal1 on top of diffusion
WL(0)
GND (diffusion)
WL(1)
Basic cell
10 x 7 
Metal1
Polysilicon
WL(2)
GND (diffusion)
WL(3)
BL(0) BL(1) BL(2) BL(3)
Only 1 layer (contact mask) is used to program memory array, so
programming of the ROM can be delayed to one of the last process steps.
Digital Integrated Circuits Chpt. 12
Transient Model for NOR ROM
precharge
poly
metal1
rword
BL
Cbit
WL
cword
Word line parasitics
Resistance/cell: 35
Wire capacitance/cell: 0.65 fF
Gate capacitance/cell: 5.10 fF
Bit line parasitics
Resistance/cell: 0.15
Wire capacitance/cell: 0.83 fF
Drain capacitance/cell: 2.60 fF
Digital Integrated Circuits Chpt. 12
Propagation Delay of NOR ROM

Word line delay
– Delay of a distributed rc-line containing M cells
tword = 0.38(rword x cword) M2
= 20 nsec for M = 512

Bit line delay
– Assuming min size pull-down and 3*min size pull-up with reduced
swing bit lines (5V to 2.5V)
Cbit = 1.7 pF and IavHL = 0.36 mA so
tHL = tLH = 5.9 nsec
Digital Integrated Circuits Chpt. 12
Read-Write Memories (RAMs)

Static – SRAM
–
–
–
–
–
–

data is stored as long as supply is applied
large cells (6 fets/cell) – so fewer bits/chip
fast – so used where speed is important (e.g., caches)
differential outputs (output BL and !BL)
use sense amps for performance
compatible with CMOS technology
Dynamic – DRAM
–
–
–
–
–
–
periodic refresh required
small cells (1 to 3 fets/cell) – so more bits/chip
slower – so used for main memories
single ended output (output BL only)
need sense amps for correct operation
not typically compatible with CMOS technology
Digital Integrated Circuits Chpt. 12
Memory Timing Definitions
Read Cycle
Read
Read Access
Read Access
Write
Write
Setup
Data
Data Valid
Digital Integrated Circuits Chpt. 12
Write Cycle
Write
Hold
4x4 SRAM Memory
2 bit words
read
precharge
enable
bit line precharge
WL[0]
!BL BL
A1
WL[1]
A2
WL[2]
WL[3]
clocking and
control
A0
Column Decoder
sense amplifiers
BL[i] BL[I+1]
Digital Integrated Circuits Chpt. 12
write circuitry
2D Memory Configuration
Sense Amps
Digital Integrated Circuits Chpt. 12
Sense Amps
Decreasing Word Line Delay
 Drive the word line from both sides
driver
WL
polysilicon word line
metal word line
 Use a metal bypass
WL
polysilicon word line
metal bypass
 Use silicides
Digital Integrated Circuits Chpt. 12
driver
Decreasing Bit Line Delay (and Energy)

Reduce the bit line voltage swing
– need sense amp for each column to sense/restore signal

Isolate memory cells from the bit lines after sensing (to
prevent the cells from changing the bit line voltage further) pulsed word line
– generation of word line pulses very critical
» too short - sense amp operation may fail
» too long - power efficiency degraded (because bit line swing size depends on
duration of the word line pulse)
– use feedback signal from bit lines

Isolate sense amps from bit lines after sensing (to prevent bit
lines from having large voltage swings) - bit line isolation
Digital Integrated Circuits Chpt. 12
Pulsed Word Line Feedback Signal
Read
Word line
Bit lines
Dummy
bit lines
Complete
10%
populated
 Dummy
column
– height set to 10% of a regular column and its
cells are tied to a fixed value
– capacitance is only 10% of a regular column
Digital Integrated Circuits Chpt. 12
Pulsed Word Line Timing
Read
Complete
Word line
Bit line
Dummy bit line
 Dummy
V = 0.1Vdd
V = Vdd
bit lines have reached full swing
and trigger pulse shut off when regular bit
lines reach 10% swing
Digital Integrated Circuits Chpt. 12
Bit Line Isolation
bit lines
V = 0.1Vdd
isolate
Read
sense
amplifier
sense
V = Vdd
sense amplifier outputs
Digital Integrated Circuits Chpt. 12
6-transistor SRAM Cell
WL
M2
M5
Digital Integrated Circuits Chpt. 12
M6
!Q
M1
!BL
M4
Q
M3
BL
SRAM Cell Analysis (Read)
WL=1
M4
M5!Q=0
M6
Q=1
M1
Cbit
Cbit
!BL=1
BL=1
Read-disturb (read-upset): must carefully limit the allowed voltage
rise on !Q to a value that prevents the read-upset condition from
occurring while simultaneously maintaining acceptable circuit
speed and area constraints
Digital Integrated Circuits Chpt. 12
SRAM Cell Analysis (Read)
WL=1
M4
M5!Q=0
M6
Q=1
M1
Cbit
Cbit
!BL=1
BL=1
Cell Ratio (CR) = (WM1/LM1)/(WM5/LM5)
V!Q = [(Vdd - VTn)(1 + CR (CR(1 + CR))]/(1 + CR)
Digital Integrated Circuits Chpt. 12
Read Voltages Ratios
1.2
Vdd = 2.5V
VTn = 0.5V
Voltage Rise on !Q
1
0.8
0.6
0.4
0.2
0
0.3
0.6
0.9
1.2
1.5
Cell Ratio (CR)
Digital Integrated Circuits Chpt. 12
1.8
2.1
2.4
SRAM Cell Analysis (Write)
WL=1
M4
M5!Q=0
Q=1
M6
M1
!BL=1
BL=0
Pullup Ratio (PR) = (WM4/LM4)/(WM6/LM6)
VQ = (Vdd - VTn) ((Vdd – VTn)2 – (p/n)(PR)((Vdd – VTn - VTp)2)
Digital Integrated Circuits Chpt. 12
Write Voltages Ratios
1
Vdd = 2.5V
|VTp| = 0.5V
Write Voltage (VQ)
0.8
p/n = 0.5
0.6
0.4
0.2
0
0.3
0.6
0.9
1.2
1.5
Pullup Ratio (PR)
Digital Integrated Circuits Chpt. 12
1.8
2.1
2.4
Cell Sizing


Keeping cell size minimized is critical for large caches
Minimum sized pull down fets (M1 and M3)
– Requires minimum width and longer than minimum channel length
pass transistors (M5 and M6) to ensure proper CR
– But sizing of the pass transistors increases capacitive load on the word
lines and limits the current discharged on the bit lines both of which
can adversely affect the speed of the read cycle

Minimum width and length pass transistors
– Boost the width of the pull downs (M1 and M3)
– Reduces the loading on the word lines and increases the storage
capacitance in the cell – both are good! – but cell size may be slightly
larger
Digital Integrated Circuits Chpt. 12
6T-SRAM Layout
VDD
M2
M4
Q
Q
M1
M3
GND
M5
BL
Digital Integrated Circuits Chpt. 12
M6
BL
WL
Multiple Read/Write Port Cell
WL2
WL1
M2
M5
!Q
M4
Q
M7
M8
M1
!BL2
!BL1
Digital Integrated Circuits Chpt. 12
M6
M3
BL1
BL2
4x4 DRAM Memory
2 bit words
read
precharge
enable
bit line precharge
WL[0]
BL
A1
WL[1]
A2
WL[2]
WL[3]
sense amplifiers
clocking,
control, and
refresh
BL[0] BL[1]
A0
Digital Integrated Circuits Chpt. 12
BL[2] BL[3]
Column Decoder
write circuitry
3-Transistor DRAM Cell
WWL
WWL
RWL
write
Vdd
M3
M1
X
BL1
M2
Cs
X
Vdd-Vt
RWL
read
BL2
Vdd-Vt
BL2
BL1
No constraints on device sizes (ratioless)
Reads are non-destructive
Value stored at node X when writing a “1” is VWWL - Vtn
Digital Integrated Circuits Chpt. 12
V
3T-DRAM Layout
BL2
BL1
GND
RWL
M3
M2
WWL
M1
Digital Integrated Circuits Chpt. 12
1-Transistor DRAM Cell
WL
WL
M1
Cs
CBL
BL
write
“1”
read
“1”
X
X
BL
Vdd/2
Vdd-Vt
Vdd
sensing
Write: Cs is charged (or discharged) by asserting WL and BL
Read: Charge redistribution occurs between CBL and Cs
Read is destructive, so must refresh after read
Digital Integrated Circuits Chpt. 12
1-T DRAM Cell
Capacitor
Metal word line
M1 word
line
SiO2
poly
n+
Field Oxide
n+
poly
Inversion layer
induced by
plate bias
Diffused
bit line
Polysilicon
Polysilicon
plate
gate
(a) Cross-section
(b) Layout
Used Polysilicon-Diffusion Capacitance
Expensive in Area
Digital Integrated Circuits Chpt. 12
DRAM Cell Observations





DRAM memory cells are single ended (complicates the
design of the sense amp)
1T cell requires a sense amp for each bit line due to charge
redistribution read
1T cell read is destructive; refresh must follow to restore data
1T cell requires an extra capacitor that must be explicitly
included in the design
A threshold voltage is lost when writing a 1 (can be
circumvented by bootstrapping the word lines to a higher
value than Vdd)
Digital Integrated Circuits Chpt. 12