DRAM - KTU - Kompiuterių katedra

Download Report

Transcript DRAM - KTU - Kompiuterių katedra

COMPUTER
ARCHITECTURE
(P175B125)
Assoc.Prof. Stasys Maciulevičius
Computer Dept.
[email protected]
DRAM cell
Column (bit) line
Row (word) line
• Storing of one bit in dynamic memory
cell needs one transistor only (static
memory cell has 6-8 transistors).
• In order to reduce the number of chip
contacts, traditionally address has been
transferred in two steps: first are transferred higher bits – row address, later
– column address
• This results in a greater number of cycles in access.
• Information is stored in the form of load capacitor within an integrated
circuit. Since real capacitors leak charge, the information eventually
fades unless the capacitor charge is refreshed periodically
• DRAM works approximately 10 times slower than SRAM
2009-2013
©S.Maciulevičius
2
Structure of 1 M DRAM chip
CAS#
Column addr. A0-A8
buffer
N.2 clock
oscillator
Refresh
Column decoder
controller
A0-A8
Amplifiers and
write control
Row addr.
buffer
RAS#
N.1 clock
oscillator
2009-2013
A0-A8
Row decoder
Refresh
counter
I/O
control
and
data
buffers
OE#
WE#
Data
D0-D3
DRAM array
(matrix)
512  512  4
©S.Maciulevičius
3
DRAM roadmap
Ordinary
FPM
EDO
SDRAM
RDRAM, DDR,
DDR 2, …
BEDO
1987
1M
4M
94
95
96
98
99
2000
16M
64M
2009-2013
97
©S.Maciulevičius
256M
4
Ordinary DRAM
RAS#
CAS#
Addr
Data
Row 1
Col.1
Row 2
Data1
Col.2
Data 2
Every access - individual
2009-2013
©S.Maciulevičius
5
Fast Page Mode (FPM) DRAM
RAS#
CAS#
Addr
Data
Row 1
Col. 1
Col. 2
Data1
Col. 3
Data2
Data3
• For successive reads or writes within the row CAS#
should be repeated
• When CAS# H, data output lines  Z state
2009-2013
©S.Maciulevičius
6
Extended Data Output RAM
(EDO RAM)
RAS#
CAS#
Addr
Data
Row1
Col.1
Col.2
Data1
Col.3
Data2
Data3
• For transferring of burst CAS# should be repeated
• It differs from FPM with the additional feature that a new access
cycle can be started while keeping the data output of the previous
cycle active
• Therefore, it can achieve a smaller period (higher frequency)
2009-2013
©S.Maciulevičius
7
Burst Extended Data Output RAM
(BEDO)
RAS#
CAS#
Addr
Data
2009-2013
Row1
Col.2
Col.1
D10
©S.Maciulevičius
D11
D12
D13
D20
8
Burst Extended Data Output RAM
(BEDO)
• A pipelined stage was added allowing pageaccess cycle to be divided into two components
• An address counter on the chip was added to
keep track of the next address
• Quicker access time is achieved (up to 50% for
large blocks of data) than with traditional EDO
• Could process four memory addresses in one
burst, for a maximum of 5-1-1-1 , when EDO
RAM - 5-2-2-2
2009-2013
©S.Maciulevičius
9
Synchronous Dynamic RAM
(SDRAM)
• Traditionally DRAM has an asynchronous interface which
means that it responds as quickly as possible to changes
in control inputs
• SDRAM has a synchronous interface, meaning that it
waits for a clock signal before responding to control inputs
and is therefore synchronized with the computer's system
bus
• All of them are designed to work in burst mode, transfering
one portion of data each clock. Programmable burst
length - 1, 2, 4, 8 or 256
• Could process four portions in one burst for a maximum of
5-1-1-1
2009-2013
©S.Maciulevičius
10
Synchronous Dynamic RAM
(SDRAM)
DRAM
Register
Address
Register
This is realized by adding registers (latches) to fix the
address, data and control signals:
Data
Control
signals
Clock
2009-2013
©S.Maciulevičius
11
SDRAM
• Clock is used to drive an internal finite state machine that
pipelines incoming instructions
• Pipelining means that the chip can accept a new instruction
before it has finished processing the previous one. E.g. in a
pipelined read, the requested data appears after a fixed
number of clock pulses after the read instruction, and
additional instructions can be sent during this time
• For indicating DRAM speed two principles are used:
• Minimal interval between adjacent portions of the bundle (8 ns, 7 ns,
and 6 ns, etc.)
• Bus frequency (100 MHz corresponds to 8-ns, 133 MHz -to 6-ns, etc.).
• Don’t forget that the first portion can have significant latency!
2009-2013
©S.Maciulevičius
12
Timing of PC100 SDRAM
Instr
Data
Row
Col
Row
W
W
Col
W W
Row
R
W
R
W
R
Col
R
W
W
Bubbles
• 2 cycle addressing
• Bubbles increase latency, decrease bandwidth
2009-2013
©S.Maciulevičius
13
Timing of PC100 SDRAM
This diagram should be drawn with attention to
two SDRAM technology-driven issues:
•
•
In PC platform unbuffered SDRAM DIMMs require the
so-called '2-cycles addressing‘ - the row and column
addresses on the bus are retained two cycles. This is
necessary when several DIMM slots are on board. In
the case only 1 DIMM, just 1 cycle is sufficient
Changing of address (the selection of other column, by
reading, as well as by writting) needs for a small pause
('bubbles')
2009-2013
©S.Maciulevičius
14
Compare classical DRAMs
Type
Standard bus
speeds, MHz
Access rate
DRAM
access time
Ordinary
4.77 - 40
5-5-5-5
80-150 ns
FPM
16 - 66
5-3-3-3
60-80 ns
EDO
33 - 75
5-2-2-2
50-60 ns
BEDO
60 - 100
5-1-1-1
50-60 ns
SDRAM
60 - 100+
5-1-1-1
7-15 ns
2009-2013
©S.Maciulevičius
15
New DRAM types
If the above DRAM types may be considered as
relatively classic, in past years new types of
DRAMs were developed, which were and are
used into computers:



DDR SDRAM - Double Data Rate SDRAM
DDR2 SDRAM – twice faster than DDR
DDR3 SDRAM – four times faster than DDR
2009-2013
©S.Maciulevičius
16
DDR SDRAM
DDR - Double Data Rate SDRAM - It achieves
nearly twice the bandwidth of the preceding
single data rate (SDR) SDRAM by transferring
data on the rising and falling edges of the clock
signal
Bandwidth:
• 1 generation - with a bus frequency of 100 MHz,
DDR SDRAM gives a maximum transfer rate of 1600
MB/s
• later - 3.2 GB/s (= 200  2  8 B; frequency of 200
MHz)
2009-2013
©S.Maciulevičius
17
DDR SDRAM
DDR read operations can be explained using this
simplified scheme:
Data
register
(n-bit)
From
memory
array
n bits
D0
MUX Q
2n bits
Data
register
(n-bit)
2009-2013
n bits
©S.Maciulevičius
D1
n bits
18
DDR modules
Some DDR modules are specified here:
Standard
name
Mem.
clock
(MHz)
Cycle I/O bus
time
clock
(ns)
(MHz)
Data
transf.
rate
(MHz)
Module
Peak
name transfer rate
(MB/s)
DDR-200 100
10
100
200
PC-1600
1600
DDR-266 133
7.5
133
266
PC-2100
2100
DDR-333 166
6
166
333
PC-2700
2700
DDR-400 200
5
200
400
PC-3200
3200
2009-2013
©S.Maciulevičius
19
DDR2
DDR2 core performs read and write operations in
same frequency, as DDR or SDRAM
However :
• I/O buffers operating frequency is double
• Twice expanded bus that connects the core and the
buffers
Therefore the data are multiplexed and transmitted
at a double frequency using the normal width bus
Thus, DDR2 533 work in the same frequency as
DDR266 or PC133 SDRAM
2009-2013
©S.Maciulevičius
20
DDR2 SDRAM read operation
From
memory
array
4n bits
Data
register
(n-bit)
n bits
Data
register
(n-bit)
n bits
Data
register
(n-bit)
Data
register
(n-bit)
2009-2013
©S.Maciulevičius
D0
D1
D2 MUX Q
D1
n bits
n bits
n bits
21
DDR2 modules
Some DDR2 modules are specified here:
Standard
name
Mem. Cycle I/O bus
clock time
clock
(MHz) (ns)
(MHz)
Data
transf.
per sec
(Mln)
Module
name
Peak
transfer
rate
(MB/s)
DDR2-400 100
10
200
400
PC2-3200
3200
DDR2-533 133
7.5
266
533
PC2-4300
4266
DDR2-667 166
6
333
667
PC2-5300
5333
DDR2-800 200
5
400
800
PC2-6400
6400
DDR2-1066 266
3
533
1066
PC2-8500
8533
2009-2013
©S.Maciulevičius
22
DDR and DDR 2
Increased delay in clock periods, but data are transferred faster
2009-2013
©S.Maciulevičius
23
SDRAM, DDR, and DDR 2
As you can see, all the SDRAM
parts operate at the basic (core)
frequency, while the data is
transmitted once a clock
DDR parts operate at the basic
(core) frequency, while the data is
transmitted twice per clock
DDR 2 output buffers operate at
the double frequency, while the
data is transmitted twice per
buffers clock (four times per core
clock)
2009-2013
©S.Maciulevičius
24
DDR3
Core
Data buffer
frequency 100 MHz frequency 400 MHz
2009-2013
Memory
core
Data output
(cell array)
buffers
©S.Maciulevičius
Data output
rate 800 MHz
25
Benefits of DDR3
• First of all – less energy consumption (by 40%)
compared to the popular DDR2 (this is due to
reduction of supply voltage: 1,5 V - DDR3, 1,8 V DDR2, or 2,5 V – DDR)
• The higher working speed - DDR3 frequency range
800 МHz – 1600 МHz (clock frequency 400 МHz –
800 МHz); while the DDR2 frequency range 400
МHz - 1066 МHz (clock frequency 200 МHz - 533
МHz), and DDR – 200 МHz - 600 МHz only
• DDR3 drawback – increased latency (in clock
periods)
2009-2013
©S.Maciulevičius
26
DDR3 modules
Some DDR3 modules are specified here:
Standard
name
Mem. Cycle I/O bus
clock time
clock
(MHz) (ns)
(MHz)
Data Module name Peak
transf.
transfer
per sec
rate
(Mln)
(MB/s)
DDR3-800 100
10
400
800
PC3-6400
6400
DDR3-1066 133
7.5
533
1066
PC3-8500
8533
DDR3-1333 166
6
667
1333
PC3-10600 10667
DDR3-1600 200
5
800
1600
PC3-12800 12800
2009-2013
©S.Maciulevičius
27
DDR DDR2  DDR3 (market)
2009-2013
©S.Maciulevičius
28
DDR4
• DDR4 is the next evolution in DRAM, bringing even
higher performance and more robust control
features while improving energy economy
Feature/Option
DDR3
Voltage (core and I/O) 1.5V
DDR4
1.2V
Data rate (Mb/s)
800, 1066, 1333, 1600, 1866, 2133,
1600, 1866, 2133 2400, 2667, 3200
Densities
512Mb–8Gb
2Gb–16Gb
Internal banks
8
16
2009-2013
©S.Maciulevičius
29
Increasing DRAM speed
2009-2013
©S.Maciulevičius
30
DDR timing
Main DDR DRAM timing parameters are:
• tRCD - RAS to CAS delay – the number of clock cycles
needed between a row address strobe and a column
address strobe
• tCL - CAS delay (latency) – the number of clock cycles
required to access a specific column of data
• tRP - RAS precharge – the number of clock cycles needed
to close one row of memory and open another
• tRAS - active to precharge delay – The number of clock
cycles needed to access a specific row of data in RAM
E.g., “DDR2-800 5-5-5-15” shows the values of these four
parameters
2009-2013
©S.Maciulevičius
31
DDR timing
Typical values of these parameters for DDR
chips:
• RAS to CAS Delay: 2, 3, 4;
• CAS Latency: 2.0, 2.5, 3.0;
• RAS Precharge: 2, 3, 4
2009-2013
©S.Maciulevičius
32
SPD
In accordance with JEDEC standards in each
module must be small special ROM chip called
the SPD (Serial Presence Detect) with access
information about a computer memory module:
•
•
•
•
•
•
configuration and type
timing
producer (his code)
serial number
production date
other information
Total ROM size is 128 bytes
2009-2013
©S.Maciulevičius
33
SPD
E.g., CPU-Z
test extracts
such
information
from SPD:
2009-2013
©S.Maciulevičius
34
DRAM refresh
• Memory refresh is the process of periodically
reading information from an area of computer
memory, and immediately rewriting the read
information to the same area with no modifications
• Each memory refresh cycle refreshes a
succeeding area of memory
• Classic asynchronous DRAM is refreshed by
opening each row in turn
• For convenience, the refresh counter is
incorporated into RAM chips
2009-2013
©S.Maciulevičius
35
DRAM refresh
• In CAS-before-RAS (CBR) refresh the CAS# line
is driven low before RAS#, then the DRAM ignores
the address inputs and uses an internal counter to
select the row to open (refresh)
• Hidden refresh allows PC RAM refresh memory
cycles to take place in memory banks not used by
the CPU at the time, instead or together with the
normal refresh cycles
• Refresh period – Tref in first DRAMs was 2 ms, now
– 64 ms or even 128 ms
2009-2013
©S.Maciulevičius
36
Memory controller
The memory controller is a digital circuit which
manages the flow of data going to and from the
main memory:
D
A
Rd
CPU
Wr
D
A
DRAM
controller
DRAM
RAS#
CAS#
WE#
OE#
2009-2013
©S.Maciulevičius
37
Memory controller
 It can be a separate chip or integrated into another
chip
 Computers using Intel microprocessors
traditionally had a memory controller implemented
on their motherboard's northbridge (“northern” part
of chipset)
 AMD's Athlon 64 and Opteron processors, Intel
Core i7 have a memory controller on the
microprocessor die to reduce the memory latency.
This also adds some restrictions for using some
DRAM types
2009-2013
©S.Maciulevičius
38
Memory controller in chipset
 Computers using Intel Core 2 (Duo and Quad)
microprocessors had a memory controller
implemented on their motherboard's northbridge
(e.g., on P45 MCH - Memory Controllel Hub):
2009-2013
©S.Maciulevičius
39
Memory controller in Core i7
Integrated
Memory
Controller
2009-2013
©S.Maciulevičius
40
DRAM modules
SIPP – Single In-Line Pin Package
• 30 pins
• used in some 286-based computers
• often bent or broke during installation
SIMM – Single In-Line Memory Module
• “short” (90 mm) – 30 pins, 8 bits of data
• “long” (108 mm) – 72 pins, 4 bytes of data
• 32, 36 (with parity), ECC-36 and ECC-40 – with an errorcorrecting code
• some - with PD (Presence Detect, indicates size 4, 8, 16, 32
MB)
DIMM – Dual In-Line Memory Module
• 133,35 mm – 168-244 pins, 8 bytes
• 64 (ordinary) bit word, 72 or 80 bits (with parity or errorcorrecting code)
2009-2013
©S.Maciulevičius
41
SIMM modules
2009-2013
©S.Maciulevičius
42
SDRAM module
2009-2013
©S.Maciulevičius
43
DDR modules
Comparison
of memory
modules for
desktop PCs
(DIMM)
2009-2013
©S.Maciulevičius
44
Registered memory modules
 Registered (also called buffered) memory modules
have a register between the DRAM modules and the
system's memory controller
 They place less electrical load on the memory
controller and allow single systems to remain stable
with more memory modules than they would have
otherwise
 There is a performance penalty for using registered
memory. Each read or write is buffered for one cycle
between the memory bus and the DRAM, so the
registered RAM can be thought of as running one
clock cycle behind the equivalent unregistered DRAM
2009-2013
©S.Maciulevičius
45
Registered memory modules
2009-2013
©S.Maciulevičius
46
FB-DIMM
 Fully Buffered DIMM (or FB-DIMM) is a
memory technology which can be used to increase
reliability and density of memory systems
 Conventionally, data lines from the memory
controller have to be connected to data lines in
every DRAM module
 Fully buffered DIMM architecture introduces an
advanced memory buffer (AMB) between the
memory controller and the memory module
2009-2013
©S.Maciulevičius
47
FB-DIMM
2009-2013
©S.Maciulevičius
48
FB-DIMM
 FB-DIMM uses 10 pairs of lines carrying
commands and data from the processor to
memory and 14 bit lanes carrying data from
memory to the processor
 Each bit is carried over a differential pair (signal
and inversion), clocked at 12 times the basic
memory clock rate, 6 times the double-pumped
data rate
2009-2013
©S.Maciulevičius
49
FB-DIMM
 While Fully-Buffered DIMM was originally a good
idea, the industry soon found that it has
implementation problems
 First, the serial input frequency has to be 4 times
higher than the memory clock frequency. This puts it
into the microwave frequency range and is a whole
new page of technical difficulties
 The higher serial input frequency also increases the
heat generation to an unacceptable point.
 Smart engineers soon announced the alternative
approach, the LRDIMM
2009-2013
©S.Maciulevičius
50
LRDIMM
 LRDIMM (Load Reduced Dual-inline Memory
Module) is designed with a buffer chip to replace
the register to help minimize loading, it can increase
overall server system memory capacity and speed
 It is pin-compatible with existing DDR3 DIMM
sockets and LRDIMM is JEDEC standard
 LRDIMM can contain 72 modern 40nm 4 gigabit
DDR3 SDRAM
 Dual server can have at most 16 ordinary DIMMs,
but using LRDIMM – even 24 DIMMs
2009-2013
©S.Maciulevičius
51
LRDIMM and FBDIMM
2009-2013
©S.Maciulevičius
52