The Memory Hierarchy

Transcript The Memory Hierarchy

The Basic Memory Element - The Flip-Flop
• Up until know we have looked upon memory elements
as black boxes. The basic memory element is called the
flip-flop (‫ )דלגלג‬or latch. It is composed of 2 crosscoupled NOR gates. It's truth
R
table is:
Q
R S Q !Q
0 0 Q !Q
1 0 0 1
_
0 1 1 0
Q
1 1 unstable
S
• We can see that the flip-flop "stores" the contained value
as long as it's inputs are 0. The problem is that the flipflop isn't stable (‫)יציב‬.
1
The D Flip-Flop
• Adding 2 AND gates to the flip-flop turns it into a stable
memory element. The 2 inputs are C (clock) and D (data).
When C is asserted (set to 1) the ff is open, the value of D
is stored in it, and Q becomes D. When C is deasserted
the ff is closed and Q is the value of whatever was stored
the last time the ff was open. C
Q
• This type of flip-flop is
activated by the rising edge
_
of the clock. As the clock
Q
D
diagram shows:
D
C
Q
2
The Master-Slave Flip-Flop
• In order to implement a falling clock ff we use 2 D flipflops called a master-slave flip-flop. Now the value Q
changes on the falling edge of the clock.
D
D
C
D
latch
Q
D
Q
D
latch _
C
Q
Q
_
Q
C
D
C
Q
3
The Register File
Read register
number 1
Read
data 1
Read register
number 2
• The register-file has to enable us to read
Register file
Write
1 or 2 registers and write into 1 register.
register
Read
The read part is easy we use 2 MUXs
data
2
Write
data
Write
to choose the registers that are read.
• For the write part we use a decoder, given a number i
it asserts the ith line and deasserts all other lines. This line
together with the write signal provide the C input to each register.
Write
Read register
number 1
C
Register 0
0
Register 1
1
D
n-to-1
decoder
C
Register n – 1
M
u
x
Read data 1 Register number
n– 1
Register n
Register 0
Register 1
D
n
Read register
number 2
M
u
x
C
Register n – 1
D
Read data 2
C
Register n
Register data
D
4
SRAM (Static Random Access Memory)
• Larger amounts of memory than register-files are built with
SRAMs. A SRAM chip is a circuit that stores many memory
elements and enables access to one at a time. SRAMs have fixed
access time to any memory element.
• A 256K x 1 SRAM can hold 256 entries each of which is 1-bit
wide. Thus it will have 18 address lines (256K=218), 1 data input
line and 1 data output line. A 32K x 8 SRAM has the same amount
of total bits but has 15 address lines, 8 input and 8 output lines. The
15
number of possible
Address
addresses is called
the height and the Chip select
8
SRAM
number of bits in Output enable
Dout[7– 0]
32K

8
each location is
Write enable
called the width.
8
Typical access
Din[7– 0]
5
times are 5-25 ns.
The Three-State Buffer
• To build large memories we would need giant MUXs in order to
read their contents (look at the register-file diagram). For a 64K x1
SRAM we would need a 64K to 1 MUX.
• Instead large memories are implemented by using a shared output
line called a bit-line. To allow multiple source to set a single line
Select 0
Enable
we use the three-state buffer
In
Out
(or tri-state buffer). A three-state buffer
Data 0
has 2 inputs: a data-signal
Select 1
Enable
and an Output enable. The single
In
Out
Data
1
output is the data-signal if the Output
enable is set (1). If the Output enable is 0 Select 2
Output
Enable
the output of the buffer is in a
In
Out
Data 2
high-impedance state which enables
other three-state buffers to set the bit-line. Select 3
Enable
In
Out
• A 4-way MUX is composed of 4
Data 3
6
three-state buffers.
A 4 x 2 SRAM
Din[1]
Din[0]
D
C
Write enable
D
latch
D
Q
C
D
latch
Enable
Enable
D
D
Q
0
2-to-4
decoder
C
D
latch
Q
C
D
latch
Enable
Enable
D
D
Q
1
Address
C
D
latch
Q
C
D
latch
Enable
Enable
D
D
Q
2
C
D
latch
Q
C
Enable
D
latch
Q
Enable
3
Dout[1]
7 Dout[0]
A 32K x 8 SRAM
• We still need a large decoder and a large number of word lines (the
lines used to enable the individual cells).
• The solution is to organize memories as rectangular (‫ )מלבני‬arrays
and use a 2-step decoding process: The 1st decoder creates an
address for the eight 512x64 arrays. Then a set of MUXs selects 1
bit out of every 64 bit-wide array.
Address
[14– 6]
9-to-512
decoder 512
512  64 512  64 512  64 512  64 512  64 512  64 512  64 512  64
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
64
Address
[5– 0]
Mux
Mux
Mux
Mux
Mux
Mux
Mux
Dout7
Dout6
Dout5
Dout4
Dout3
Dout2
Dout1
Mux
8Dout0
DRAM (Dynamic Random Access Memory)
• In SRAM the value stored in a cell is kept on a pair of inverting
gates as long as the power is on. In Dynamic RAM the value kept
in a cell is stored as a charge (‫ )מטען‬in a capacitor (‫)קבל‬. A single
transistor is used to access this value.
• Because DRAMs use 1 transistor per bit they are cheaper and
denser (‫ )צפופים‬then SRAM which use 4-6 transistors per bit.
• In DRAM the charge must be refreshed by reading it and writing
it back again. This is why it is called dynamic RAM.
• The charge can be kept for several milliseconds which is close to a
million clock cycles.
• If we had to refresh each bit we would have no time left to read or
write data.
• Fortunately DRAMs use a 2-level decoding structure which
enables refreshing an entire row at once. Typically, refresh
operations take 1-2% of the active cycles of the DRAM.
9
A 4M x 1 DRAM
• DRAMs use a 2-level decoder: a row-access followed by a
column-acess. The row-access chooses a row and stores it in a set
of column latches. The column-access use a MUX to select the
data from the row. To save pins (‫ רגליים‬,‫ )פינים‬the same address
lines are used for both the row and column addresses: A pair of
signals RAS (Row Access Strobe) and CAS (Column Access
Strobe) are used to signal if a row or column address is being
supplied.
Row
• Refresh is performed by reading
2048  2048
decoder
array
11-to-2048
a row and writing the values back.
• In 1997, typical DRAM access
times are 60-110ns. The cost per Address[10– 0]
Column latches
bit makes DRAM ideal for main
Mux
memory. The speed of SRAM
makes it ideal for caches.
Dout
10
EDO and SDRAMs
• A 4M x1 DRAM reads 2048 bits on every row-access and throws
away 2047 bits. It is possible to provide higher bandwidth by
allowing a column address change without a row address change,
resulting in an access to bits in the column latches. This is called
page-mode or static-column mode.
• Nibble-mode RAMs internally create the next 3 column address
thus provding 4 bits (a nibble) for every row access.
• EDO (Extended Data Out) RAMS are the latest versions of pagemode style. They are becoming standard and have an access time
of 25ns.
• SDRAMs(Synchronous DRAMS) and SSRAMs enable reading
or writing a burst of data from sequential addresses in a row or in
a column. SDRAMs and SSRAMs are used mainly in caches
where blocks of data are transferred.
11
Error Detection and Correction
• Because of potential data corruption in large memories.
Most computers use some sort of error-checking code.
• One simple code that is used is a parity code. The
number of 1s in a word is counted. The word has odd
parity if the number of 1s is odd and even parity if the
number of 1s is even. The parity bit is written into
memory along with the word (1 for odd, 0 for even).
• When the word is read so is the parity bit, if the parity
stored doesn't match the parity read an error is signaled.
• A 1 bit parity can detect only 1 bit of error.
• Parity is only an error-detection scheme. There are more
complex error correction codes (ECC) which can correct
12
errors in several bits.

The Memory Hierarchy

Transcript The Memory Hierarchy

Directory