Memory - computing.northampton.ac.uk

Download Report

Transcript Memory - computing.northampton.ac.uk

Modern Computer
Architecture
My_notes
www.computing.northampton.ac.uk/~brian
[email protected]
Indicative Content:
1. Introduction
classification of computer architectures
architectural concepts
performance measurement and comparison
trends in technology
2. Instruction Set Architectures
memory addressing and operands
operations
control flow
3. Pipelining
implementation
hazards and solutions
4. Instruction-Level Parallelism (ILP)
concept
dynamic scheduling and out-of-order execution
branch prediction
speculation
limitations of ILP
5. Data-Level Parallelism (DLP)
SIMD principle
SIMD operations
SIMD programming
6. Thread-Level Parallelism (TLP)
simultaneous multi-threading
multi-core architectures and symmetric
multiprocessors
cache coherence protocols
7. Vector Processors
architecture
programming
8. Distributed Shared Memory
directory-based cache coherence
9. Distributed Memory Computers (DMC)
bus architectures
static and dynamic Interconnection networks
10. Alternative Architectures
System on Chip architectures (SoC)
asynchronous processor designs
FPGAs
11. Memory Hierarchy
design
performance
Read Only Memory
ROM
Bipolar
Mask
ROM’s
PROM’s
MOS
Mask
ROM’s
PROM’s EPROM’s
EEPROM’s
Flash
•ROM - mask programmed
•PROM – fusible links
•EPROM uv erasable
•EEPROM – electrically erasable
•Flash – block erasable - BIOS
Random Access Memory
RAM
Dynamic
Needs Refresh
Cheap-ish
Static
Fast
Expensive
•Fast Page Mode (FPM
•Extended Data Out (EDO)
•Burst EDO (BEDO)
•Synchronous DRAM (SDRAM)
•Double Data Rate SDRAM (DDR SDRAM)
•DIMM/SIMM
DRAM Cell
Capacitor_model
During a read operation, one of the row select lines is brought
high by decoding the row address (low-order address bits). The
activated row select line turns on the switch transistors for all
cells in the selected row. This causes the refresh amplifier
associated with each column to sense the voltage level on the
corresponding capacitor and interpret it as a 0 or a 1. If it is more
than 50 percent, it reads it as a 1; otherwise it reads it as a 0. The
column address (high-order address bits) enables one cell in the
selected row for the output. The read cycle is actually a read/write
cycle. If a '1' is read then the cell is re-written to with a '1' to
recharge it. However if a '0' is read no recharge is necessary.
•Access Time
•Memory Cycle Time
•Transfer Rate – 1/(Cycle time)
•Parity
•Error Checking
12 bit
column
address
12 bit
column
address
latch
CAS (column
Address strobe)
4096 x 4096 bit
memory array
12 bit row
address latch
12 bit row
address
I/O
Control
RAS (row
address strobe)
Dynamic Ram Organisation
DRAM Layout
DRAM Read Cycle
DRAM Organisation
4 bit module
Chip 1
Chip 2
Chip 3
Chip 4
DRAM
Module
•Precharge delays
•Precharge delays plus row access times (tRAC)
Access time
1.Row address placed on address bus
2. The /RAS pin is activated, placing row address into Row Address Latch
3. Row Address Decoder locates row to be sent to sense amplifiers
4. The Write Enable line is deactivated I.e a memory read
5. The column address is multiplexed on to the address pins
6. The /CAS pin is activated, placing column address into Column
Address Latch
7. The /CAS acts as Output Enable. Hence once /CAS has stabilised
the sense amplifiers can read the desired data.
8. /RAS and /CAS are deactivated.
Asynchronous Latency
•Access Time – the amount of time after address placed on bus and when
•the data appears on the data bus
•Cycle Time – The amount of time between successive read operations
Minimising these two are the goal of the memory designer, with access
time being the main target, to increase bus speed.
•A 2GHz processor can do more in 70ns than a 400MHz processor
•Need to insert wait states
Fast Page Mode
Cycle time
Fast Page Mode
•Activate /RAS then perform 4 /RAS cycles
•Removes /RAS delays
•Initial read is 6 cycles whereas next 3 are only 3 cycles
•This is called 6-3-3-3 DRAM using x-y-y-y notation
•However, you cannot activate the next column address
until the data from the previous read is gone
Note – Column addresses activated before data from previous read goes
Synchronous Dynamic RAM (SDRAM)
Features of SDRAM
•Controls with commands
•Activate, read, write etc.
•Multiple Bank configuration
•Can precharge one bank while reading/writing to another
•Adoption of control by Mode Register
•Can set burst length and CAS latency etc.
•Synchronous Operation
•Latches each control signal at the rising edge of basic clock
•Synchronised with system clock
•Selectable CAS latency
•Selectable burst length
•The number of words that can continuously be input or
output
Clock 1: ACTIVATE the row by turning on /CS and /RAS. The
row address is placed on the address bus to determine which row
to activate.
Clock 3: READ the column required from the activated row by
turning on /CAS while placing the column's address on the address
bus.
Clocks 5-10: The data from the row and column that you gave the
chip goes out onto the Data Bus, followed by a BURST of other
columns, the order of which depends on which BURST MODE
has been set.
Memory Latencies
On data sheets are written: 9-9-9-24 (2T) for a memory module.
What do the numbers mean ? Well this refers to CAStRCD-tRP-tRAS and CMD (respectively) and these values
are measured in clock cycles.
•
CAS Latency (1st number) - is the delay between the CAS
signal and the availability of valid data on the data pins. The
lower the latency, the better the performance.
•
tRCD (2nd number) - RAS to CAS delay. When memory is
accessed sequentially, the row is already active and tRCD
will not have much impact. However, if memory is not
accessed in a linear fashion, the current active row must be
deactivated and then a new row selected/activated.
•
tRP (3rd number) - is the time required to switch between
rows. Therefore, in conjunction with tRCD, the time
required (or clock cycles required) to switch banks (or rows)
and select the next cell for either reading, writing or
refreshing is a combination of tRP and tRCD.
•
tRAS (4th number) – time from receiving memory access
request to initiating RAS. This is why tRAS has little effect
on overall system performance but could impact system
stability if set incorrectly.
•
Command Rate - is the time needed between the chip
select signal and the when commands can be issued to the
RAM module IC. Typically, these are either 1 clock or 2.
Bank Interleaving
•
SDRAM divides memory into two to four banks for
simultaneous access to more data known as interleaving.
•
Using a notebook analogy, two-way interleaving is like
dividing each page in a notebook into two parts and
having two assistants to each retrieve a different part of
the page.
•
Even though each assistant must take a break (be
refreshed), breaks are staggered so that at least one
assistant is working at all times.
•
Therefore, they retrieve the data much faster than a single
assistant could get the same data from one whole page,
especially since no data can be accessed when a single
assistant takes a break.
•
This allows the processor to initiate a new memory access
before the previous access completes and results in
continuous data flow.
Memory Interleaving
Comparison of ordinary and interleaved memory access
•
•
•
An: Row Address
Bn: Column Address
D: Data Readout
Bandwidth
•
Memory Voltages - was originally 5 volts. However, as
cell geometries decreased, memory circuitry became
smaller and more sensitive. Today, computer memory
components can operate as low 1.5 volts, which allows
them to run faster and consume less power.
•
Bandwidth - the bandwidth capacity of the memory bus
increases with its width (in bits) and its frequency (in
MHz). By transferring 8 bytes (64 bits) at a time and
running at 100 MHz, SDRAM increases memory
bandwidth to 800 MB/s,
Error Correction and Detection
•Simple parity checking detects only single-bit errors.
•ECC (Error Correction Codes) uses a special algorithm to
generate values called check bits.
•ECC uses a special algorithm to generate values called check bits.
The algorithm adds the check bits together to calculate a checksum,
which it stores with the data. When data is read from memory, the
algorithm re-calculates the checksum and compares it with the
checksum of the written data.
•If the checksums are equal, then the data is valid and operation
continues.
•If they are different, the data has an error and the ECC memory logic
isolates the error and reports it to the system. In the case of a singlebit error, the ECC memory logic can correct the error and output the
corrected data so that the system continues to operate
Double Data Rate DDR
There are presently three generations of DDR memories:
1.DDR1 memory, with a maximum rated clock of 400 MHz and a
64-bit (8 bytes) data bus is now becoming obsolete and is not
being produced in massive quantities.
2.DDR2 memory is the second generation in DDR memory.
DDR2 starts with a speed of 400 MHz
3.DDR3 is the third generation in DDR memory. DDR3 memory
provides a reduction in power consumption of 30% compared to
DDR2 modules due to DDR3's 1.5 V supply voltage.The main
benefit of DDR3 comes from the higher bandwidth made possible
by DDR3's 8-burst-deep prefetch buffer in contrast to DDR2's 4burst-deep or DDR’s 2-burst-deep prefetch buffer.