Lecture 7—Memory Sub

Download Report

Transcript Lecture 7—Memory Sub

ELEC 516 VLSI System Design and
Design Automation Spring 2010
Lecture 7 - Memory subsystem Design
Reading Assignment:
Chapter 10 of Rabaey
Chapter 8.3 of Weste
Note: some of the figures in this slide set are adapted from the slide set
of “ Digital Integrated Circuits” by Rabaey et. al., Copyright 2003
1
ELEC516/10 Lecture 7
Memory Element
• We can use register as an storage element. Its speed is fast but
the area cost is too great.
• Therefore memory elements are commonly used to store large
amount of temporary data.
• Memories can be made smaller and faster by taking advantage
of analog design technique.
• Types of memory (by read-write operation)
– Read-write memory
– Non-volatile read-write memory, read mostly memory
– Read-only memory
• Types of memory (by accessing time)
– Random access memory
– Serial access memory
– Content access memory
• Another classification - by number of read/write port
2
ELEC516/10 Lecture 7
Semiconductor Memory Classification
RWM
Random
Access
Non-Random
Access
SRAM
FIFO
DRAM
LIFO
NVRWM
ROM
EPROM
Mask-Programmed
E2PROM
Programmable (PROM)
FLASH
Shift Register
CAM
Volatile memory
3
Data is lost when supply
voltage is turned off
RAM is not a very representative acronym bcs
ROM & NVRAM are also random access memories
ELEC516/10
Video applic. may require other
types ofLecture
access7
Memory Architecture: Decoders
4
ELEC516/10 Lecture 7
Array-Structured Memory Architecture
AK
AK+1
AL-1
Row Decoder
Problem: ASPECT RATIO or HEIGHT (resulting design would be extremely slow)
(Ex: 1M words /8bit storage, the height will be 128,000 times larger than the width)
>> WIDTH (keep it close to 2:1)
Bit Line
2L-K
Storage Cell
Word Line
Multiple words
are stored in a
single row
Column decoder
is required
M.2K
Sense Amplifiers / Drivers
A0
Column Decoder
A K-1
Input-Output
(M bits)
5
Amplify swing to
rail-to-rail amplitude
Selects appropriate
word
Reducing the voltage
swing reduces both the
propagation delay and the
ELEC516/10
Lecture 7
power
consumption
Hierarchical Memory Architecture
Row
Address
Column
Address
Block
Address
Global Data Bus
Control
Circuitry
Block Selector
Global
Amplifier/Driver
I/O
6
Advantages:
1. Shorter wires within blocks
2. Block address activates only 1 block => power savings
ELEC516/10 Lecture 7
Block Diagram of 4 Mbit SRAM
Clock
generator
Z-address
buffer
X-address
buffer
Pre-decoder and block selector
Bit line load
[Hirose90]
128 K Array Block
Subglobal row decoder
SubglobalGlobal
row decoder
row decoder
Block31
30
Block
Block 1
Transfer gate
Column decoder
Local row deco
Sense amplifier and write driver
CS, WE
buffer
I/O
buffer
x1/x4
controller
Y-address
buffer
X -address
buffer
Row address (X), Column address (Y) and block address (Z).
7
ELEC516/10 Lecture 7
Contents-Addressable Memory
Commands
I/O Buffers
I/O Buffers
Comparand
Control Logic R/W Address (9 bits)
CAM Array
2 words 3 64 bits
9
Priority Encoder
Commands
Commands
Address Decoder
Mask
29 Validity Bits
I/O Buffers
Data (64 bits)
Bits
92Validity
Priority
Enc
Bits
Address Decoder
92Validity
Address Decoder Priority Enc
Instead of using an address to locate the data, a CAM uses a word as input in a
query style format. When the input data matches a data word stored in the
memory, a Match flag is ON.
8
ELEC516/10 Lecture 7
Memory Timing: Definitions
Read Cycle
READ
Read Access
Read Access
Write Cycle
WRITE
Write Access
Data Valid
DATA
4 important timing information are defined:
Read and Write access, read and write cycles
9
Data Written
ELEC516/10 Lecture 7
Memory Timing: Approaches
MSB
Address
Bus
LSB
Lower and upper halves are presented sequentially
Row Address Column Address
Address
Bus
RAS
Address
Address transition
initiates memory operation
CAS
RAS-CAS timing
DRAM Timing
Multiplexed Adressing
10
RAS: Row access strobe
CAS: Column access strobe
SRAM Timing
Self-timed
ELEC516/10 Lecture 7
Random Access Memory
• Memory that has an access time independent of the
physical location of the data; contrasted with serialaccess memories, which have some latency associated
with the reading or writing of a particular datum and with
content-addressable memories.
• Read only memory (ROM) or read/write memory (RAM)
are random access memory. The can be further divided
into static-load, synchronous, and asynchronous.
Synchronous RAMs or ROMs require a clock edge to
enable operation. The address to a synchronous memory
only needs to be valid for a certain setup time after the
clock edge.
• The memory cells used in RAMs can further be divided
into static and dynamic structures. Static structures used
some forms of latched storage while dynamic structure
use dynamic charge on a capacitor as storage
11
ELEC516/10 Lecture 7
Read-Only Memories
• Programs for processors with fixed applications
such as washing machines, calculators, game
machines, etc.. once developed and debugged, need
only reading.
– One transistor per bit of storage
• Content of a ROM is fixed during manufacturing
(mask defined)
– Leading to small and fast implementations.
• Many approaches can be used:
– Diode based, NMOS or PMOS based, pull-up versus
pull-down.
12
ELEC516/10 Lecture 7
Read-Only Memory Cells
• Static memory structure in which the state is retained indefinitely,
even without power.
BL
BL
BL
VDD
WL
WL
WL
1
BL
WL
BL
WL
BL
WL
0
GND
Diode ROM
The bit line is not isolated from
the word line. The current to
charge the BL must be provided
through the word line and its
drivers.
13
MOS ROM 1
MOS ROM 2
The bit line is isolated from the
word line. The word line is only
responsible for charging and
discharging the word line cap
Isolation comes at the expense of
larger silicon area (GND
bus).
ELEC516/10
Lecture 7
Memory core - ROM
• One implementation - NOR array, (pseudo-NMOS)
VDD
Pull-up devices
WL[0]
GND
WL[1]
WL[2]
GND
WL[3]
BL[0]
BL[1]
BL[2]
BL[3]
Supply rail must be distributed throughout the array:
Overhead reduced by sharing GND buses between neighboring cells.
14
ELEC516/10 Lecture 7
• pseudo-NMOS is faster, easier to design and require
no timing. However, it draws DC-power. The DC
power can be significantly reduced by turning the
pull-ups on according to the column address
decoding
15
ELEC516/10 Lecture 7
MOS NOR ROM Layout
Cell (9.5l x 7l)
WL0
GND
Programmming using the
Active Layer Only
Polysilicon
WL1
Metal1
Connection
to BL
Diffusion
WL2
Metal1 on Diffusion
GND
WL3
VDD
WL0
GND
WL
BL0
BL1
BL2
BL3
Selectively adding transistors when needed
16
WL
GND
WL3
BL0 BL
BL BL3
ELEC516/10
Lecture 7
MOS NOR ROM Layout
Cell (11l x 7l)
WL0
GND
Programmming using
the Contact Layer Only
WL1
Polysilicon
Space
lost
Metal1
WL2
Metal1 on Diffusion
Diffusion
GND
WL3
VDD
WL0
GND
WL
BL0
BL1
BL2
BL3
Selectively adding contacts when needed
17
WL
GND
WL3
BL0 BL
ELEC516/10
Lecture 7
BL BL3
MOS NOR ROM Layout
• Active implementation reveals in an area saving of 15% as
compared to the contact implementation.
• Contact programming has the advantage that the contact
layer is a later step in the manufacturing process.
• Wafers can then be prefabricated up to contact mask and
stockpiled. The remaining step can be executed quickly.
• Turnaround time is shorter using the contact approach.
• The choice between the two strategies depend upon:
– Size/cost/performance
– Turnaround time
• In both approaches, a large part of the cell is devoted to
the bit line contact and ground connection.
• One way to avoid that is to use a NAND configuration.
18
ELEC516/10 Lecture 7
MOS NAND ROM
V DD
Pull-up devices
BL[0]
BL[1]
BL[2]
BL[3]
WL[0]
WL[1]
less contact and no GND bus:
More compact memory cell
WL[2]
WL[3]
All word lines high by default with exception of selected row
19
ELEC516/10 Lecture 7
MOS NAND ROM Layout
Transistor
Short-circuited
Cell (8l x 7l)
Programmming using
the Metal-1 Layer Only
No contact to VDD or GND necessary;
drastically reduced cell size
No Short
Loss in performance compared to NOR ROM
BL[0]
BL[1] BL[2] BL[3]
WL
[0]
WL
[1]
Polysilicon
Diffusion
20
Metal1 on Diffusion
WL
[2]
WL
[3]
15% gain in area
compared to NOR ROM
ELEC516/10 Lecture 7
NAND ROM Layout
Cell (5l x 6l)
Programmming using
Implants Only
May not always be
Available to designers
Polysilicon
Implant would lower the threshold voltage
of the MOS making it ON.
No short required and hence no contact.
Memory more than 2 times smaller than NOR ROM.
21
Threshold-altering
implant
Metal1 on Diffusion
ELEC516/10 Lecture 7
NOR/NAND ROM sizing
• The analysis applied to pseudo-NMOS inverter is used.
• The question is when the PD transistor is ON what is VOL???
• For NOR ROM if we assume a 0.25um process and 2.5V supply,
a VOL=1.5V (1V swing to keep NMOS small) will require
PMOS/NMOS ration of 2.62.
• For NAND ROM, due to the chaining, the value of VOL is a
function of both the size of the memory (number of rows) and
the programming.
• Worst case occurs when all bits in a column are set to 1, which
means N transistors are in series in PD.
• Replacing the PD with a single transistor N-times longer.
• For a (8x8) array W/L)p=0.49 however for a (512x512):
W/L)p=0.0077.
• NAND ROMs are very rarely used for arrays with more than 8 or
16 rows.
22
ELEC516/10 Lecture 7
Equivalent Transient Model for MOS NOR ROM
V DD
Model for NOR ROM
BL
rword
WL
Cbit
cword
•
Word line parasitics
– Wire capacitance and gate capacitance
– Wire resistance (polysilicon)
•
Bit line parasitics
– Resistance not dominant (metal)
– Drain and Gate-Drain capacitance
23
ELEC516/10 Lecture 7
Equivalent Transient Model for MOS NAND ROM
V DD
Model for NAND ROM
BL
CL
r bit
WL
cbit
r word
cword
•
Word line parasitics
– Similar to NOR ROM
•
Bit line parasitics
– Resistance of cascaded transistors dominates
– Drain/Source and complete gate capacitance
24
ELEC516/10 Lecture 7
Decreasing Word Line Delay
Driver
WL
Polysilicon word line
Metal word line
(a) Driving the word line from both sides
Metal bypass
WL
Kcells
Polysilicon word line
(b) Using a metal bypass
25
ELEC516/10 Lecture 7
Precharged MOS NOR ROM
f
V DD
pre
Precharge devices
WL [0]
GND
WL [1]
WL [2]
GND
WL [3]
BL [0]
BL [1]
BL [2]
BL [3]
Clk can be used to turn the PMOS only when necessary
PMOS precharge device can be made as large as necessary,
but clock driver becomes harder to design.
26
ELEC516/10 Lecture 7
Low Power ROM Design (NOR-array)
• Sources of power dissipation
– Decoder
– ROM core
• Pre-charging of the bit-line
• Evaluation of the bit-line
– Control
• Generate internal signals, e.g. precharge, read
– Drivers
• Column multiplexer and driver select which
column is being read and drive the data bus
27
ELEC516/10 Lecture 7
Power Dissipation of ROM
• 2K X 18 bit ROM in 0.6 mm technology at 3.3V clocked at
100MHz
Block
Power (mW)
Percentage (%)
Decoder
0.06
2.1
ROM core
2.24
89
Control
0.18
7.2
Driver
0.05
1.7
• Reasons for ROM core to be the main power consumer:
– Large capacitance at the bit lines
• Drain cap. Of the transistor
• More than 18 bit lines are switched per access due to
the bit line multiplexing
28
ELEC516/10 Lecture 7
Low Power technique: Architecture
• Hierarchical word line
– Divide the memory in different block
– Only the bit cells of the desired block are accessed
• Selective pre-charge
– Only bit lines which will be accessed are pre-charged
• Minimization of Non-zero Terms
– Zero terms do not switch bit lines and reduce the
capacitance in both the bit lines and the row lines
– Inverted ROM
• Inverted the ROM core so that the number of one is
reduced.
– Inverted Row
• A given row is inverted if more than half of the bits are
non-zero terms.
• Additional bit to indicate whether the row is inverted
10111011
00000111
29
101000100
000000111
Additional bit
ELEC516/10 Lecture 7
NonVolatile Read-Write Memories
•
•
•
•
•
•
•
•
•
•
30
NVRW memories are virtually identical to the ROM structure.
Consist of an array of transistors designed in a grid
Memory is programmed by disabling or enabling the devices.
In ROM this is done via mask-level alterations.
In a NVRW memory, a modified transistor that permits its
threshold to be altered electrically is used instead.
The modified threshold is retained indefinitely even when
supply voltage is turned OFF  non-volatile.
To reprogram the memory, the programmed values must be
erased, and a new programming round can be started.
The reprogramming procedure is an order of magnitude slower
than the reading operation.
The method of erasing is the main differentiating factor between
the various classes of reprogrammable NVRWM.
The floating gate is at the heart of the majority of NVRWM.
ELEC516/10 Lecture 7
Non-Volatile Memories
The Floating-gate transistor (FAMOS)
Floating gate
Gate
Source
D
Drain
G
tox
tox
n+
p
n+_
S
Substrate
Device cross-section
Schematic symbol
• An extra polysilicon strip is inserted between the gate and the channel.
• The strip is isolated and is called a floating gate.
• Doubling the oxide thickness results in reduced transconductance as
well as increased threshold voltage: undesirable properties.
• Most interesting property: Threshold voltage is programmable.
• Applying a high voltage (above 10V) between the source and gatedrain creates a high electrical field and causes avalanche injection.
• Electrons acquire high energy and get trapped on the floating gate.
31
ELEC516/10 Lecture 7
Floating-Gate Transistor Programming
20 V
10 V
S
5V
0V
20 V
D
Avalanche injection
-5V
S
5V
0V
D
Removing programming
voltage leaves charge trapped
- 2.5 V
S
5V
D
Programming results in
higher V T .
• FAMOS: Floating gate avalanche-injection MOS (or FAMOS).
• Removing the voltage leaves the induced negative charge in place, and
results in a negative voltage on the intermediate gate.
• This can be translated as an effective increase in threshold voltage
which is typically 7V.
• To turn ON the device a higher voltage is needed and a 5-V gate to
source voltage is not sufficient to turn on the transistor: Tr: OFF.
32
ELEC516/10 Lecture 7
A “Programmable-Threshold” Transistor
• Threshold voltage of the programmed device is higher.
• By tailoring the impurity profiles, today’s tech use 12.5V instead of 25V
• An EPROM is erased by shining UV light which renders the oxide
slightly conductive by the generation of electron-hole in the material
• EPROM are low cost and attractive in applications not requiring
frequent reprogramming. However EPROM do suffer from:
– slow erasure process (few sec, min); Reliability issues since Vth
varies with repeated programming cycles, high power during
programming, “off-system erasure” process,
ELEC516/10 Lecture 7
33
FLOTOX EEPROM
EEPROM avoids “off system” erasure which is labor-intensive.
I
Gate
Floating gate
Drain
Source
V GD
-10 V
20–30 nm
n1
Substrate
p
10 V
n1
10 nm
Fowler-Nordheim
I-V characteristic
FLOTOX transistor
• FLOTOX (floating-gate tunneling oxide) resembles the FAMOS except
for a portion of the dielectric separating the floating gate from the
channel and the drain which is reduced in thickness at about 10nm.
• At 10V Electron travel to and from the floating gate through a
mechanism called Fowler Nordheim.
• Erasing is hence achieved by reversing the voltage applied during the
writing process.
34
ELEC516/10 Lecture 7
EEPROM Cell
BL
WL
VDD
- Absolute threshold control
is hard
- Removing two much charge
from the floating gate results
in a depletion device (always ON).
- Programmed Vth depends on the
Initial voltage as well as applied
Voltage: for better control
 2 transistor cell
• EEPROM pack less bits as they require more silicon area due to:
– One extra transistor + extra area due to the tunneling oxide.
•
35
While EEPROM are more expensive than EPROM:
– EEPROM last longer as they support up to 105 erase/write
ELEC516/10 Lecture 7
Flash EEPROM
Flash EEPROM combines the density of the EPROM & flexibility of EEPROM
Control gate
Floating gate
erasure
n 1 source
Thin tunneling oxide
programming
p-substrate
n 1 drain
Many other options …
• FLASH EEPROM uses the avalanche hot-electron-injection approach
to program the devices
• Erasure is performed using Fowler-Nordheim tunneling.
• The advantage is that the extra transistor required for the EEPROM is
not needed.
36
ELEC516/10 Lecture 7
Basic Operations in a NOR Flash Memory―
Erase
• A 0V is applied to the gate combined with a high voltage on the source
• Electrons if any, at the floating gate are ejected to the source by
tunneling and all cells are erased simultaneously.
37
ELEC516/10 Lecture 7
Basic Operations in a NOR Flash Memory―
Write
Programmed
as off.
• A high voltage pulse is applied to the gate, if a “1” is applied to the
drain at that time, hot electrons are generated and injected onto the
floating gate, raising the threshold and turning it Off.
• Practically a pulse of 1-10us must be applied
38
ELEC516/10 Lecture 7
Basic Operations in a NOR Flash Memory―
Read
• A 5V is applied to the row to be selected for read-out, which will cause
conditional discharge of the bit line.
39
ELEC516/10 Lecture 7
Cross-sections of NVM cells
Flash
Courtesy Intel
EPROM
• FLASH memory are very suitable for applications requiring large
storage densities, fast erasure and programming, and fast serial
access.
40
ELEC516/10 Lecture 7
NAND Flash Memory
Word line(poly)
Gate
Unit Cell
ONO
Gate
Oxide
FG
Source line
(Diff. Layer)
41
Courtesy Toshiba
ELEC516/10 Lecture 7
Read-Write Memories (RAM)
-STATIC (SRAM)
Data stored as long as supply is applied
Large (6 transistors/cell)
Fast
Differential
-DYNAMIC (DRAM)
Periodic refresh required
Small (1-3 transistors/cell)
Slower
Single Ended
42
ELEC516/10 Lecture 7
Static Random-Access Memory(SRAM)
• Static RAM cell can be viewed as variations on the designs
used for latches and flip-flops. The most commonly used in
ASIC memories is the 6-transistor, cross-coupled inverter
WL
circuit
VDD
M2
M4
Q
M6
Q
M5
M1
BL
43
M3
BL
ELEC516/10 Lecture 7
More on SRAM
• Static 6-transistor design is the commonly-used design since it
involves the least amount of detailed circuit design and
process knowledge and is the safest with respect to noise and
other effects that may be hard to estimate before silicon is
available.
• Reliable operation of the cell, however, imposes some sizing
constraints.
• In contrast to the ROM cells, in the 6-transistor structure the
two bit lines transferring both the stored signal and its inverse
are required.
• This improves the noise margins during both read and write
operations.
44
ELEC516/10 Lecture 7
SRAM - read operation
• Assume the bit lines are at some value and that the
word line is asserted.The following shows the path
of pull-up and pull-down of the corresponding bitline through the memory cell
- bit
bit
pullup
word
45
1
0
pulldown
ELEC516/10 Lecture 7
Sizing of transistor for SRAM read
• Assume both bit lines are pre-charged to Vdd. The
read-cycle is started by asserting the word line.
WL
VDD
M4
BL
Q= 0
M6
M5
VDD
Cbit
46
BL
Q= 1
M1
VDD
VDD
Cbit
• The bit-line capacitance includes diffusion capacitance
of the read/write control pass transistors connected to
the wire and the wire capacitance. For large memories,
it is usually large and in the pF range.
ELEC516/10 Lecture 7
Sizing of transistor for SRAM read (II)
• the value of BL does not drop instantaneously but
stays at the precharged value VDD upon enabling of
the READ operation.
• The combination M5-M1 then forms a saturated load
NMOS inverter. It is necessary that the dc value of Q
stays below the switching point of the inverter M2M4. If not, this could toggle the cross-coupled
inverter pair and destroy the value stored in the cell.
• The boundary constraints on the device sizes can be
derived as follows. At the boundary, Q is < 0.5Vdd in
order not to switch the other inverter. So we make Q
0.5Vdd.
47
ELEC516/10 Lecture 7
CMOS SRAM Analysis (Read)
WL
V DD
M4
BL
Q= 0
M5
V DD
Cbit
48
M1
Q= 1
V DD
We need to study
the DV as function
of the cell ratio CR
BL
M6
V DD
CR=(W/L)1/(W/L)5
Cbit
ELEC516/10 Lecture 7
CMOS SRAM Analysis (Read)
1.2
Voltage Rise (V)
1
0.8
0.6
0.4
0.2
0
Voltage
[V] 1 1.2 1.5
0 rise
0.5
2
2.5
3
Cell Ratio (CR)
M1 needs to be stronger in order to to flip the state of Q’ from 0”” t “1”
CR needs to be greater than 1.2.
The pass transistor M5 can be min size while the width of the pull-down NMOS
Need to be increased by a factor of 1.2 (in this technology).
Careful simulations are required using different corners are a must.
49
ELEC516/10 Lecture 7
SRAM - read operation (cont.)
• As an n-channel transistor is poor at passing a one
and the p-channel transistors in the RAM cell are
generally small, the timing for pull up the bit line is
longer. Pre-charge technique can be used to speed
up the pull-up time.
50
ELEC516/10 Lecture 7
SRAM - read operation (cont.)
• Bit and bit lines are precharged to VDD before the word
line is allowed to go high. One of the cell‘s inverters will
have its output at 1, and the other at 0; depending on
the value stored. The RAM inverter will attempt to pull
down the bit or -bit line. The bit-line pull-up circuit may
use p-channel transistors to precharge each bit line.
• In this example, the sense amplifier is an inverter that
forms a single-ended sense amplifier. The sense time is
roughly the time it takes one RAM cell pull-down and
access transistor to reach the inverter threshold. To
optimize speed, one might set the inverter threshold
above the VDD midpoint, but below an adequate noise
margin down from the VDD rail.
51
ELEC516/10 Lecture 7
SRAM - read operation (cont.)
• Alternatively, one can precharge the bit lines with nchannel transistors, which results in the bit lines
being precharged to an n threshold down from VDD .
52
ELEC516/10 Lecture 7
SRAM - read operation (cont.)
• This can dramatically improve the speed of the RAM
cell access. In addition, it reduces power dissipation
because the bit lines do not charge by the supply
voltage.
• The key aspect of the precharged RAM read cycle is
the timing relationship between RAM addresses, the
precharge pulse, and the enabling of the row
decoder. If the word-line assertion precedes the end
of the precharge cycle, the RAM cells on the active
wordline will see both bit lines pulled high and the
RAM cells may flip state.
53
ELEC516/10 Lecture 7
SRAM - write operation
• Apply voltages to the RAM cell such that it will flip state
54
ELEC516/10 Lecture 7
SRAM - write operation (cont.)
0
1 N4
N3
N2
N1
55
0
1
• First the write transistors
(N1,N2) are enabled and then
word selection transistors
(N3,N4) are asserted.
• During a WRITE cycle where a
one to be written, node -Cell has
to be pulled below the RAM-cell
inverter threshold and at the
same time node Cell has to be
pulled above the RAM-cell
inverter threshold. n-transistors
ND, N1 and N3 have to pull Pbit
below the inverter threshold. In
addition N5 has to be pulled low
by N1 and ND. On the other hand,
PD, N2 and N4 have to pull Nbit as
high as possible.
ELEC516/10 Lecture 7
Transistor sizing for SRAM (write)
• Assume that a 1 is stored in the cell (or Q = 1). A 0 is
written in the cell by setting BL to 0 and BL = 1.
WL
V DD
M4
Q =0
Q =1
M5
M6
M1
BL = 1
V DD
BL = 0
•The cell starts to switch when the node Q is pulled
below the switching threshold of the cross-couple
inverter, which is assumed to be Vdd/2. Node Q must
be raised above Vdd/2.
56
ELEC516/10 Lecture 7
CMOS SRAM Analysis (Write)
WL
V DD
M4
M5
Q= 1
M1
BL = 1
M6
Q= 0
V DD
BL = 0
• Q’ side of the cell cannot be pulled
high enough to ensure the writing
of 1 due to the sizing constraints
imposed by the read stability.
• The new value of the cell has to be
written through transistor M6.
• M6 needs to be stronger than the
pull-up PMOS in order to impose a
zero at Q and write the new value.
PR=(W/L)4/(W/L)6
57
ELEC516/10 Lecture 7
CMOS SRAM Analysis (Write)
• If we wish to pull the node bellow Vtn, the pull-up ratio has
to be bellow 1.8 (for this technology).
•Extensive simulations need to be carried out.
58
ELEC516/10 Lecture 7
6T-SRAM — Layout
VDD
M2
M4
Q
Q
M1
M3
GND
M5
BL
59
M6
BL
WL
• Substantial area is consumed
not only by the 6 transistors but
mainly the routing.
• Signal routing and connections
to two bit lines, a word line and
both supply rails.
• Needing PMOS transistors
increases substantially the area
as Nwell is required (because of
the min spacing requirement for
the well).
ELEC516/10 Lecture 7
Resistance-load SRAM Cell
• Resistive load SRAM or the 4T SRAM cell replaces the two PMOS by a
resistive load and the wiring is simplified.
• Size reduced by one third compared to the 6T cell.
WL
• Since the bit lines are externally
VDD
RL
RL
Q
Q
M3
BL
60
M4
M1
M2
precharged, PMOS are not only
involved in the pull-up process.
• No penalty in having very large
resistive pull-up.
• Ex: 1Mbit memory, and 10 k
BL resistance and 2.5V process
I=0.25mA, P=250W!!! (standby)
• Very large resistance, T are
obtained using undoped polysilicon.
Static power dissipation -- Want R large
L
Bit lines precharged to V DD to address t p problem ELEC516/10 Lecture 7
Performance of SRAM cell
• Read operation is the critical one. It requires the
(dis)charging of the large bit-line capacitance
through the small transistors of the selected cell.
• The WRITE time is dominated by the propagation
delay of the cross-coupled inverter pair, since
drivers that bring BL and BL to the desired values
can be large.
61
ELEC516/10 Lecture 7
3-Transistor DRAM Cell
BL1
BL2
WWL
WWL
RWL
RWL
M3
X
M1
X
VDD-VT
M2
BL1
VDD
BL2
VDD-VT
CS
DV
•The cell stores data on
the gate of the storage
transistor. Separate
read and control lines
are used.
•Multiple read-ports
may be added easily,
by adding read
transistors. In addition,
separate or merged
read and write data
busses may be used.
⚕This cell and the other dynamic cells have to be
refreshed
to retain the contents of the memory (Refresh cycle: 1-4 msec)
•No constraints on device ratios
㎕Reads are non-destructive
•Value stored at node X when writing a “1”= VWWL - Vtn
62
ELEC516/10 Lecture 7
Properties of the 3T-cell
• In contrast to the SRAM cell, no constraints exist on
the device ratios. This is a common property of
dynamic circuits. The choice of device sizes is
solely based on performance and reliability
considerations.
• - In contrast to other DRAM cells, reading the 3T-cell
contents is non-destructive, i.e. the data value
stored in the cell is not affected by a read.
• The value stored on the storage node X when writing
a “1” equal VWWL- VTN. This threshold loss reduces
the current flowing through M2 during a read
operation and increase the read access time. To
prevent this some design bootstrap the word line
voltage, i.e. raise VWWL to a value higher than VDD.
63
ELEC516/10 Lecture 7
3T-DRAM — Layout
BL1
BL2
BL2
BL1
GND
WWL
RWL
M3
RWL
M1
CS
M2
M3
X
M2
WWL
M1
• The total area of the cell is 576λ2 as compared to the 1092 λ2
of the SRAM cell. In addition further reduction is size can be
obtained when sharing buses with neighboring cells.
• Area reduction mainly due to elimination of contacts and
devices.
64
ELEC516/10 Lecture 7
1-Transistor DRAM Cell
•During WRITE, the
Write "1"
Read "1"
data value is placed
BL
WL
on the bit line BL and
WL
the word line WL is
X
raised. The cell
V DD -V T
GND
M1
CS
capacitance is either
charged or
V DD
BL
discharged.
V DD /2
V DD /2
•Before a READ, the
sensing
C BL
bit line is precharged
to a voltage Vpre.
•During RAED, a charge redistribution takes place between
the bit line and the storage capacitance. This results in a
voltage change on the bit line, the direction of which
determines the value of the data stored
CS
DV  VBL - V pre  (VBIT - V pre )
CS  C BL
65
ELEC516/10 Lecture 7
Charge distribution
• Charge distribution can be written such that:
CsVBL  CBLVBL  CsVBit  CBLVPRE
Charges after
Read pulse
Charges before
Read pulse
BL
WL
CsVBL  CBLVBL - CsVPE  CsVBit  CBLVPRE - CsVPE
DV  VBL - V pre  (VBIT - V pre )
CS
CS  C BL
M1
CS
C BL
Charge transfer ratio
• For reliable operation Cs needs to be quite high (need high
capacitance values to improve charge transfer ratios).
• Charge transfer ranges between 1% to 10%. DV needs ampli
• One difference between 1T and 3T struc. is the need for amp
66
ELEC516/10 Lecture 7
1-Transistor DRAM Cell
• VBL is the potential of the bit line after the charge
redistribution and Vbit is the initial voltage over the
cell capacitance CS.
• As the cell cap. is normally one or two orders of
magnitude smaller than the bit-line cap., this voltage
is very small (about 250mV). Amplification is need if
functionality is to be achieved.
67
ELEC516/10 Lecture 7
DRAM Cell Observations
 1T
DRAM requires a sense amplifier for each bit line, due
to charge redistribution read-out.
 DRAM memory cells are single ended in contrast to
SRAM cells.
The read-out of the 1T DRAM cell is destructive; read
and refresh operations are necessary for correct
operation.
 Unlike 3T cell, 1T cell requires presence of an extra
capacitance that must be explicitly included in the design.
 When writing a “1” into a DRAM cell, a threshold voltage
is lost. This charge loss can be circumvented by
bootstrapping the word lines to a higher value than VDD
68
ELEC516/10 Lecture 7
Sense Amp Operation
V BL
V(1)
V PRE
Bit line voltage
during Read-out.
D V(1)
Imposed by the
V(0) sensing amplifier
t
Sense amp activated
Word line activated
• The read-out is destructive, i.e the amount of charges stored in
the cell is modified during the read operation.
• After a read operation, the charge must be restored.
• Read and refresh operations are intertwined in 1T DRAM.
• Typ the output of the sense amplifier is imposed on the bit line
during read-out.
69
ELEC516/10 Lecture 7
1-T DRAM Cell
Capacitor
M 1 word
line
Metal word line
SiO2
Poly
n+
Field Oxide
n+
Poly
Inversion layer
induced by
plate bias
Cross-section
Diffused
bit line
Polysilicon
gate
Polysilicon
plate
Layout
Uses Polysilicon-Diffusion Capacitance
Expensive in Area
70
ELEC516/10 Lecture 7
SEM of poly-diffusion capacitor 1T-DRAM
71
ELEC516/10 Lecture 7
Advanced 1T DRAM Cells
Word line
Insulating Layer
Cell plate
Capacitor dielectric layer
Cell Plate Si
Capacitor Insulator
Refilling Poly
Transfer gate
Isolation
Storage electrode
Storage Node Poly
Si Substrate
2nd Field Oxide
Trench Cell
72
Stacked-capacitor Cell
ELEC516/10 Lecture 7
Content-Addressable Memory (CAM)
• The CAM examines a data word
and compares this data with
internally stored data.If any
data word internally matches
the input data word, the CAM
signals that there is a match.
These match signals can be
passed as word lines to a RAM
to enable a specific data word
to be output This structure may
be used as a translation lookaside buffer in the virtual
memory lookup in a
microprocessor.
73
ELEC516/10 Lecture 7
Static CAM Memory Cell
Bit
Bit
Bit
Bit
Bit
Word
CAM
•••
M4
CAM
Word
Word
CAM
•••
•••
CAM
Bit
•••
M8
M9
M6
M7
S
M3
Match
int
M5
S
M2
M1
Wired-NOR Match Line
•The memory cell may be written and read in the conventional
manner. Writes are used to store the match data in the cells,
whereas reads are used for testing purposes.
•A MATCH operation proceeds by placing the data to be matched
on the bit lines bit not asserting the word line.
74
ELEC516/10 Lecture 7
CAM in Cache Memory
CAM
SRAM
ARRAY
ARRAY
Hit Logic
Address Decoder
Input Drivers
Address
75
Tag
Sense Amps / Input Drivers
Hit
R/W
Data
ELEC516/10 Lecture 7
CAM array circuit
76
ELEC516/10 Lecture 7