Transcript Lecture8

8 Memory Subsystem
Contents
1. Classification
4. PLA
2. Architectures
5. Gate Matrix
3. Circuits
6. ROM
1) SRAM
2) DRAM
3) Address decoders
4) Sense Amplifier
8.1
1. Classification
 RWM(Read-Write Memory)

Random Access : SRAM, DRAM

Sequential Access : FIFO, Stack(LIFO)

Content Access : CAM(Associative Memory)
 NVRWM(Nonvolatile RWM)

EPROM

E2PROM

FLASH
 ROM

Mask Programmed

OTP(One-Time Programmable) ; PROM
8.2
2. Architectures
 1-dimensional memory : N(words)  M(bits/word)

Decoder reduces the number of wires
8.3
 2-dimensional array structure uses column decoder to make the chip
square.
8.4
 Hierarchical memory architecture using block address

Block address is used to activate only one block.
Other blocks(nonactive) are put in power-saving mode.
8.5
 Architecture of large memory
8.6
 Basic organization for a 4K SRAM(1989 Philips research).
8.7
 Schematic circuit diagram of 64K SRAM(Hitachi 1982).
8.8
 Another schematic of SRAM(column grouping).
SRAM chip block diagram
8.9
 Design Considerations

bit line precharge, sense amp enable 등을 위한 모든 clock의 발생은 address, CS,
WE 등 신호의 transition을 detect하는 회로에 의해 internal clock 발생기가
trigger 됨으로써 이루어진다.(전력소모 억제)

2-stage row address decoding : WL driver decodes A1.

Sense amp는 column switch 앞에, 혹은 뒤에 놓을 수 있다.
앞에 놓을 경우 : column의 cell pitch에 맞추기 위해 아주 simple한 SA를 사용
뒤에 놓을 경우 : 상대적으로 복잡한 SA 사용가능(SA의 input cap.는 증가
 윗 그림은 column을 (1024 column의 경우, by 4 인 경우) 크게 4로 나누고, 각각
을 16으로 나누어 각 소 group의 16개의 column을 한 SA가 담당토록하는
compromise 임.
8.10
3. Circuits
 Address decoders

Single stage(10-to-1024) decoder
i) # of transistors =
20/NAND 10  1024 = 20,480
ii) Large fanout requirement on
buffers generating Xi’s.
iii) series-connected transistors limit
discharge time.
8.11
 Predecoded scheme
i) Group 2 bits and predecode the word using 2-bit segments ;
(X9, X8), (X7, X6), …. (X1, X0)
ii) 2nd-stage decoder logic
# of transistors ;
10/NAND 5  1024 +   12,000
8.12
 Divided Word Line architecture
Global word line selects a block, while the local line is used to activate a word
line within the selected block.
8.13
 Hierarchical word decoding logic
8.14
 Row decoder circuits
(Complementary AND, pseudo NMOS, cascade NAND)
8.15
 Typical Symbolic Layout Style of row decoders
8.16
 Various other decoder circuits(Power saving, Decoder-powered)
8.17
 Tree style column decoder
8.18
 Sense Amplifier for SRAM
 Single differential stage의 전압이득 Av = gm·ro
gm : current/voltage(transducer gain) of M1, M2
ro : output impedance( = ro M1 ro M2)
 Av가 크기 위해서는 M1과 P1(M2와 P2)가 모두 saturation 영역에 있어야 함.

(
Sat. 영역에서 gm= ID
가 크고, ro도 크기때문)
V G s
 따라서 point X의 전압을 V D D 로 precharge해 두는 것이 response time을 짧게 하고,
2
signal swing을 크게하는데 유리.
8.19
 Single-ended amp를 두개 symmetric하게 연결함으로써 voltage gain
을 높인다. (다음 단에 latch나 another double-ended amp. Stage 혹
은 diff. Input을 갖는 output buffer를 달면 된다.)
8.20
 SRAM sense amp precharged to V D D
2
VD D
 SA의 출력점을 2 로 충전하여
SA의 high-gain 영역에서 동작토록
하는 회로.
1 : V1은 VDD로 prech 됨
power-down 상태
V
2 : WL이 access되면 V1을 D D 로
2
prech.
3 : BL, BL에 전압차가 생기면
high-gain SA 동작하면서
column decoder/switch인 pass
gate가 동작
data output bus로 신호전달
4 : power-down 상태
8.21
 2차구간에서 Static
전력소모가 있음
8.22
 SRAM circuit before sense Amp.
8.23
 Evolution of SRAM cells
i) 6- and 4-transistor SRAM cells
8.24
ii) Dual-port/double-ended access and dual-port/single access
8.25
iii) Content-addressable memory cell
8.26
 Evolution of DRAM cells
(a) basic bi-stable f/f w/o load
(b) 2C-2D(C:control lines, D:data lines)
8.27
( c) 1C-2D
(d) 2C-1D scheme
8.28
(e) 1C-1D
(f) 1C-1D(industry standard DRAM)
8.29
 DRAM read cycle
8.30
8.31
8.32
 Dummy word line scheme
8.33
8.34
 DRAM differential sense amp with dummy cell structure
8.35
 Cross-coupled Latch
Assume node 1 & 2 are precharged, and node 2 begins to drop.
When clk is on, node 3 pulls down. N2 strongly turns on, leaving n1 off.
주의) cross-coupled TR pair 의 layout이 대칭이어야 함.
threshold 전압차이에 의한 영향
8.36
 Charge transfer-based Circuit
8.37
 Charge-transfer Circuit(cont’d)
 Operation Sequence
 As clk goes high, node 1 & 2 are precharged;
V1 (Vref-Vth, n2), V2 min(VDD, Vclk-Vth, n3) > Vref
 n3 turns off.
 Cell(n1, Cc) is selected(Assume Vc was ‘0’)
Due to charge sharing between Cc & Clarge,
V1 becomes (V ref  V th )C larg e  V cC c
C c  C larg e
 V 1  V ref  V th  V 1 
(V ref  V th  V c )C c
C c  C larg e
 n2 is turned on until Q  V1 (Cc  Cl arg e ) is transferred from
Cout .i.e., until V1 reaches Vref-Vth.
 Voltage drop at node 2 due to charge transfer is
Q
C
C c  C l arg e
V 2 
 c (V ref  V th V c )
: amplif. factor
C o ut C o ut
C out
V2  V1 ( Cout  Cc  Cl arg e )
8.38
 Sense amplifier for single - Tr. DRAM cells.
dummy cell(Cd=Cc), dummy bit line
complete Symmetry
8.39
 Operation
1. Precharge 전에는 BL, DBL 모두
VD D
2
로 되어 있다.*
precharge(n1, n2 on)를 통해 node 1,2가 pull up 된다.
그리고 n1과 n2는 off된다.
2. Cc와 Cd가 select 되어 charge transfer에 의해 ( VC =0라 하자)
C
node 1의 전압은 node 2의 전압보다 많이 강하 된다.
V
( Cd는 D D 로 충전되어 있었기 때문)*
2

3. Clk1이 high가 되어 n4는 on, n5는 off(V1은 Vss로 됨)
n7이 다시 conduction 되어 BL이 Vss로 방전되어
Cc가 ‘0’으로 restore 된다.
4. Sel
‘0’로 하여 Cc를 isolate 한 후에 clk2를 on하여 BL과 DBL을
‘0’ 하여 Cd에
를 만들고 n3를
V D D 로 함. 그 후에 seld
VD D
off
2 시킴.
2
(Cc에 ‘1’이 저장되어 있는 경우도 비슷한 방식으로 동작한다.)
8.40
Column SA와 main SA를 사용한 SRAM SA 회로
매 column마다
n개의 colunm간에 multiplex
8.41
(input 신호)
(Column SA가 있는 경우)
8.42
(Column SA가 없는 경우)
8.43
 Resistive-load SRAM cells

Undoped polysilicon as resistors
with R  1 /

Just enough(10-12A) to
compensate for leakage current
of 10-15A

BL & BL precharged to VDD,
thus preventing slow charging of
BL, BL.
8.44
 TFT SRAM cell


Instead of traditional PMOS devices, pull-up transistors realized by PMOS
TFT(thin-film transistor) on top of the cell structure.
ON current : 10-8A, OFF current : 10-13A
Complementary CMOS Resistive Load
TFT cell
Number of transistors
6
4
4(+2 TFT)
Cell size
58.2m2
(0.7 m rule)
40.8 m2
(0.7 m rule)
41.1m2
(0.8 m rule)
Standby current(per cell)
10-15A
10-12A
10-13A
8.45
 Bipolar SRAM cells :

Very fast SRAMs are necessary for cache & microcode memory in highspeed computers.

SBD(Schottky Barrier Diode) bipolar SRAM
8.46
 3-T DRAM cell :

Resulted by removing the loads to obtain 4-T DRAM cell and further
removing redundemt complementary pull down device

Separate Read Word line(RWL) & Write word line(WWL)

Refreshing by writing the inverted BL2 signal onto BL1.
8.47
 1-T DRAM cell :
V  V B L V P R E C H  (V B IT V P R E C H
CC
C C C BL
CC
)
C C C BL
: charge transfer ratio
8.48
 1-T DRAM cell structure :
8.49
 Trench capacitor type & Stacked-capacitor type
8.50
 NOR-type address decoder
8.51
 NAND-type address decoder
8.52
 Reducing coupling noise bet. WL&BL : Folded bit line.
8.53
 Reducing coupling noise bet. BL & neighbor bit lines :
Transposed bit line
V cro ss
V cross
V sw ing
C cro ss
2
V
C cro ss  C B L S w in g
: worst-case variation on each bit line.
: signal swing on bit line.
8.54
4. PLA(Programmable Logic Array)
 Generally two classes exist for implementing control logic functions.

Multi-level logic through logic optimization on random logic

Regular structure type, i.e.,
ROM : firmware, mask-programmable
PLA : Customized logic to remove unnecessary
Product(AND) terms and sum(OR) terms.
8.55
 Sum of product form, F = ab+c d
i) NAND-NAND PLA
a
b
F  a b  c  d  a b c  d
F
c
d
F
a
c
b
d
AND
OR
이러한 2-level Boolean 식은 decoder를 2단 연속 붙인것으로 볼 수 있다.
8.56
i) NOR-NOR PLA
F  a  b  c  d  (a  b)  (c  d)
F  a  b  c  d
a
b
F
c
F
a
d
b
c

d
NOR-NOR is faster, but requires larger space
( 30% additional) than NAND-NAND.
8.57
 Various ways for decoding
(NOR형 decoder)
(NAND형 decoder)
NOR : fast
NAND : compact
: diffusion
: polysilicon
: metal
8.58
 Complementary형 decoder(CMOS-like)

저전력 소모

large area
8.59
 MOS ROM vs. MOS PLA
8.60
P5  x  y  z  x  y  z
P7  x  y  z  x  y  z
P4  x  y  z  x  y  z
f2  P5  P7  P2
f3  P4  P5  P7  P4  P2
 P2  P5  P7  x  z
(PLA) (ROM)
8.61
 Various Programmable Logic Devices(PLD’s)
FSM(Finite State Machine)
: PLA with latched feedback
FPLA(Field-Programmable PLA)
8.62
 PLA(Programmable Array Logic)
= FPLA where the OR array is not
programmable, AND array is field
programmable.
 ROM : (single)mask programmable
PLA:(multiple) mask programmable
FPLA:field programmable, bulky
PAL:field programmable, less bulky
8.63
MGA
(Multilevel Gate Array)
8.64
Associative Logic
Matrix
8.65
 Pseudo-NMOS PLA
8.66
 Dynamic NMOS PLA
NOR형
NAND형
T1 : product line precharge, input latch in
T2 : sum line precharge
T3 : product line evaluate
T4 : sum line evaluate, output latch out
8.67
 Dynamic CMOS PlA(2-phase) - I
8.68
T1 : product line precharge, latch input
T2 : product line evaluate, T2’:sum line precharge
T3 : sum line evaluate
T4 : latch output
 Dummy row는 모든 TR pair 중의 하나는 항상 ‘ON’ 상태이므로
큰 capacitance, C가 있는것과 같아 Vx 파형
1
C
은
1
파형이 delay 된 것과 같다.
8.69
 Dynamic CMOS PLA - II
T1 : product line precharge, latch input/output(master-slave 방식)
T2 : product line evaluate
T3 : AND-OR plane connect, sum line evaluate
T4 : sum line evaluate
8.70
 Dynamic CMOS PLA - V (NORA type)
AND plane : NMOS
OR plane : PMOS
T1
T2

T2(=low) : p-line precharge,
s-line predischarge latch
input
T1(=high) : p-line, s-line
evaluate, latch output
8.71
 Decoded PLA

partition input variables into multiple groups
8.72
 PLA folding(row & column folding)

row folding : partition inputs into
two groups such that one can find
an order of rows(product lines)
with one input group fed from
below while the other input group
fed from top.
8.73
 PPL(Programmable Path Logic)
merging of AND and OR plane.
Do=1 if
Ao  C  K  0
Ao  C  K  0
OR
i.e., D o  (A o  C  A o  C)  K ; two-level Boolean eq.
8.74
 Associative Logic Array(subset of Storage Logic Array)
y1  D  C
y 2  D  x1  x 2  C
y 3  x1  C
Ex.
x1  y1  y 2  y 3  D  C  x1  C
x 2  y 2  y 3  y 4  x 1  x 2  x1  C
y 4  x1  x 2
8.75
 MGA(Multiple Gate Array, or Multi-level PLA)
8.76
 MGA with three associative logic matrices
8.77
5. Gate Matrix
 Use regularly-spaced polysilicon lines for both gate electrode and
interconnect.
(a) : NMOS TR과 채널이 분리됨
(b) : 각 TR을 polysilicon grid 상에 배치
(c) : series(혹은 parallel)로 연결된 TR group
을 한 row에 배치하고 연결.
8.78
 Rule
1. Polysilicon은 일정간격으로 수직방향으로 달린다.
2. 인접한 column같은 row에 위치한 TR의 series 연결은 diffusion
butting으로 한다.
3. Metal은 parallel 연결, 인접되지 않은 TR의 series 연결 및 각 gate 간
의 연결을 하며, 수평 및 수직방향으로 달린다.
4. Transistor는 polysilicon column상에서만 존재한다.
5. Diffusion wire는 polysilicon grid 중간으로 수직방향으로(짧게) 달
릴 수 있다.
8.79
 Static CMOS layout in Gate Matrix
 L(f,h) is realizable if h is realizable.
 h is realizable if every diffusion
runs(vertical) it generates is legal.
8.80
 Automation of Gate Matrix Layout : Ref. O.Wing et.al. “Gate Matrix
Layout”, IEEE Trans. on CAD, Vol.4, July. 1985
Find a function f(gate assignment)
: assign the transistor gate and output terminal to each column(TR gates
connected to the same node must be assigned same column)
Find function(net assignment)
: assign the net(segment of horiz. Metal line) to each row.
Find layout L(f,h) which is realizable* & has min. rows.
8.81
 Problem Formulation for Gate Matrix Optimization
8.82
 Example : CMOS Half-Adder Circuit
8.83
Net Representation(Case I)
Gate
1
2
3
4
5
6
7
Nets
N1, N2
N1, N3
N4
N2, N4
N1, N2, N3, N5
N3, N4
N5
Net Representation(Case II)
Problem Statement ;
Given a set of nets which connect
at gates, find a permutation of
gates and an assignment of nets to
tracks, such that the number of
tracks is minimized.
8.84
6. ROM(Read Only Memory)
 ROM cells

Diode cell : consumes large power from WL

Transistor(BJT) cell : consumes less current(IB vs. IC)

MOSFET cell :
8.85
 Sharing supply voltage lines and mirroring cells
8.86
 NOR ROM with contact programming
8.87
 NOR ROM with Vth-raising implant or thick-oxide implants.
8.88
 NAND Rom
8.89
논문을
쓰기 전에
논문을 쓰려면 두 가지 중의
하나를 고르라. 현재 매우
실용적이거나, 당신의 시기에
파급효과가 큰 기술 분야를 고르든지,
아니면 매우 학문적, 이론적인
탁월성을 추구하라.
8.90
성공의 비결
힘든 일을 시작하라.
그러면 심각해 질 것이다.
물러서지만 않으면
성공할 것이다.
8.91