Transcript uP Memory
ECE 371
Microprocessors
Chapter 1
Microprocessor Memory
Herbert G. Mayer, PSU
Status 10/12/2015
For use at CCUT Fall 2015
Parts gratefully taken with permission from Eric Krause @ PSU
1
Syllabus
Introduction
Latency and Bandwidth
Memory Hierarchy
Memory Low-Level View
Bibliography
2
Introduction
Key modules of any microprocessor (µP) are:
1. Central Processing Unit AKA CPU, includes ALU, Register
File, pc
2. Memory (AKA Main Memory)
3. Caches; L1 and sometimes L2 integrated on same silicon
die, but not logically part of CPU
4. Data-, address-, and control bus connecting CPU and
memory; AKA System Bus
5. Peripherals and their controllers, connected to system bus
6. IO devices and controller, connected to system bus; some
exceptions: e.g. DMA
7. Branch Prediction unit; invisible to API
Vast speed differences between CPU and memory
Great speed disparity between various controllers
Very slow are those accepting manual (human) input
3
Introduction: Generic µP
Generic Microprocessor System with Memory, Controllers
4
Introduction: Generic µP
Generic µP above shows coarse resolution of CPU
Leaves out caches and branch prediction, necessary
only to increase processing and data access speed
Lists program counter (PC, AKA instruction pointer)
separately from register file
And views all components other than CPU as “hanging
off” the central system bus
In reality, often multiple buses are used, with varying
speeds, width, functions, etc.
Some buses are proprietary, in order to maximize
transmission speed (latency, throughput)
Other buses for commodity peripherals (disks, thumb
drives, printers etc.) have standardized interfaces, at
times with much lower speeds
5
Introduction: Abstract µP
Abstract Microprocessor System with Memory, Controllers
6
Introduction: Abstract µP
Abstract µP above also exhibits coarse resolution of
complete system
But highlights multiple buses for control, addresses,
and actual data transmitted between memory and CPU
Only the box “Input and Output” is even more
abstracted than in earlier pictures
There are many ways of depicting the same idea,
depending on what detail is omitted from a more
complete µP system
See yet another model on the next page, where
memory is further partitioned into logical subsections
7
Introduction: Another Abstract µP
8
Latency and Bandwidth
Most system components, specifically memory,
can be characterized by fundamental metrics:
1. Latency
2. Bandwidth
The unit of latency is time I; latency often
measured in milliseconds or microseconds
The unit of bandwidth is number of data units
per time t; for example, bandwidth can be
gigabytes per second
9
Latency
Latency is the time elapsed between issuing a
specific system request and receiving the response
Def: memory latency is the time between 1.) issuing
a memory access –by executing a ld or st
instruction– and 2.) the time a next instruction can
be initiated
Measured in units of seconds * 10-3, 10-6 or 10-9
Very careful, the memory latency definition did not
specify the second point as being the time the
memory access completes
This distinction alludes to speculative execution,
discussed later
10
Latency
Latency is critical, as a µP generally stalls –ignoring
speculative execution for the moment– if its
memory subsystem fails to respond to an access
request swiftly
High memory latency is undesired, since it
increases the time of program completion
Low latency is attractive, as it shortens that time
Within one selected technology, latency cannot
widely be improved by spending more resources
If latency must be improved for a µP system,
generally some other memory access technology
should be selected; often more expensive
11
Latency
Bandwidth characterizes data throughput of a system
For example bandwidth of a memory subsystem is
the number information units transmitted per time
Often that information unit is the byte, an addressable
composite of 8 bits
On large mainframes unit of information is the word,
e.g. 60-bit words on Cyber systems, but these are not
microprocessors
High-performance µP can overlap multiple memory
requests to increase bandwidth, without changing
latency
Bandwidth can, up to some technology limit, be
improved by higher technology cost, i.e. spending
more $ for wider buses or other resources
12
Memory Hierarchy
Conventional to show memory in block diagrams as
one logical block
In reality there are several types of memory, with
differing attributes such as speed, size, cost, lifetime
etc.
One of those speed levels is cache memory, with the
best speed attribute!
Why would an architect not design main memory solely
with cache memory technology?
Rhetorical question: total cost would be prohibitive, yet
it would indeed speed up average memory accesses
dramatically
During the evolution of computer technology, the
speed discrepancy between processor sped and
memory access speed has grown worse!
13
Trend of Memory Speed
Performance
CPU
Intel® Pentium II
Processor:
Out of Order
Execution
~30%
DRAM
Multilevel
Caches Caches
Time
Intel® Xeon™ Processor:
Hyperthreading
Technology
~30%
Instruction
Level
Thread
Level
Processor CPU speed increases over time.
Memory access also speed increases of time,
but more slowly than CPU, hence the gap widens!
14
Ideal Memory
Has: unbounded capacity, to store any data set and
any program code
Exploits: infinite bandwidth, to move such large data
sets to/from the µP
Is: persistent, i.e. bits of information retain their
value between power cycles
Exhibits: no latency, so that the µP never has reason
to stall
Costs: Low cost, so that $ investment for very large
main memory does not dominate µP system expense
15
Memory Hierarchy
Memory pyramid shows HW resources that hold data
sorted in decreasing order of speed, top to bottom:
1. Registers, internal to µP, small in number, except on
newer architectures, such as Itanium Fastest!
2. L1 cache, often on chip, few tens of kB
3. L2 cache, on chip on newer µP, many tens of kB
4. L3 cache, common on servers, generally off-chip
5. Main memory, can be smaller than logical address
space; see VMM
6. SSD disk, known as solid state disk; AKA RAM disk,
is storage device w/o moving parts; so disk is not to
be interpreted literally
7. Old fashioned removable disk, with rotating
magentic storage
8. Back-up tape or disk; slowest
16
Memory Hierarchy
17
Memory Hierarchy
Various hardware resources hold information to be
processed by the CPU, specifically by a component
of the ALU
Ideally, all such data are present in HW registers
Since register to register operations often can be
completed in a single cycle
And on a superscalar µP sometimes multiple
instructions can be executed in a single cycle
Alas! There are only few registers available, thus data
also need to reside elsewhere
Generally, that is main memory
Until short before the advent of 64-bit computing,
physical memories were sufficiently large to hold all
data
18
Memory Hierarchy (from Wikipedia)
19
486 Memory Organization
20
Memory Characteristics
Memory Characteristic by Technology
1
2
3
4
5
6
7
8
Type of Memory
register
L1 cache on chip
L2 cache on chip
L3 cache off chip
main memory
SSD drive
rotaing disk
tape
Technology
SRAM multiported
SRAM
SRAM
SRAM
SRAM
SRAM
magnetic material
magnetic material
21
Bandwidth
200+ GB/s
400+ GB/s
20 GB/s
10 GB/s
2+ GB/s
20+ MB/s
10+ GHB/s
few MB/s
Latency
250+ ps
300+ ps
2+ ns
10+ ns
50+ ns
100+ ns
10+ ms
>1 s
Memory Low-Level View
(Section taken from Eric Krause PSU ECE Dept.)
22
Memory Terminology
Latch (one FF) stores 1 bit
Register stores a full machine word
Memory devices store >> 1 words; on 32-bit architectures it is
feasible to have all of memory as real, physcial memory
General usage of reading memory:
enable device/memory
supply address of word on address lines
addressed word arrives on data lines
Common Terms
word size = number of bits of natural computing unit, e.g. integer
Capacity
= 2address bits
Bus
= parallel lines connecting memory and µP
Volatility
= means: memory contents are lost on power-off
ROM
= Read Only Memory
RAM
= Random Access Memory
23
Memory Types
Types of memory:
MROM: Mask-programmed during manufacturing
PROM: Programmed by user, by blowing fuses
EPROM: Electrically programmable by user, erased by
exposing to ultraviolet light
EEPROM: Electrically Erasable Programmable ROM; can
be programmed or erased by user, one byte at a time
Flash EEPROM: A type of EEPROM programmable in
blocks rather than single bytes
Synchronous Flash EEPROM: Synchronous version of
the above
Memory attributes
All are non-volatile
Writes are significantly slower than reads!
Asynchronous (except for EEPROM)
24
Words of Memory
ROM: Read Only Memory
1. How many words can be
addressed? 2address bits = 215
2. What is the width of these
words? 8 bits
3. How is it activated/turned on?
Specific signal
4. Why does it have 3-state
outputs? Allow multiple of
same devices to be conencted
25
ROM Timing
Pins are: power, clock, address, output, CE and OE
26
ROM Timing
Assuming Address is asserted when supplied to inputs
There will be propogation delay before output appears
Called Address Access Time tACC, i.e. time to wait for valid
data available on outputs after address is applied
Assuming it was already powered up and outputs were
enabled
A key parameter when using memory
If address is already present on address inputs when CE#
is asserted, it takes some time to power on tCE
tOE is the time for output buffers to be turned on
27
ROM Timing
These parameters are major limiting factors for the
performance of a microprocessor and pose
important design considerations
What is happening on the data lines before the
output is valid? See chevrons; bus is in unknown
state!
Individual outputs may be low or high
What do valid data look like? Or invalid data?
How do we know if we clock in invalid data? We
generally don’t!
28
RAM Timing
Note that ROMs are also randomly addressable
Random access memory (RAM) can be read,
written, is volatile
Speed to read and write are equal
SRAM: has very fast access times, uses 6
transistors/bit. Data are static while powered
DRAM: Slower access, but only 1 transistor/bit.
Data must be refreshed regularly while powered
SDRAM: Pipelined, synchronous DRAM
DDR SDRAM: Double Data Rate SDRAM
29
RAM Timing
RAM: Random Access Memory, here 12
address lines:
How many words can be addressed? 4k
What is the width of these words? 8 bits
Is Asynchronous!
Benefit of synchronous would be: here µP
needs to keep address the whole time
waiting for data to arrive; only then can µP
move to next address; µP couldn't start
generating next address until the current
operation was complete; wasted time,
especially as processor speeds increase
Synchronous device can send address
into the data input for duration of HOLD
time, which in modern devices can be 0,
and clock it. Then µP can do something
else; enabling parallel operations
Pipelined SRAM can do this:
30
Pipelined SRAM
31
Pipelined SRAM
Actual Data Sheet for pipelined SRAM, shows:
input registers
output registers
address register
enable register (control signals)
control logic: see the gates on left
32
Bibliography
1. Shen, John Paul, and Mikko H. Lipasti: Modern
Processor Design, Fundamentals of Superscalar
Processors, McGraw Hill, ©2005
2. http://forums.amd.com/forum/messageview.cfm?cati
d=11&threadid=29382&enterthread=y
3. http://www.ece.umd.edu/~blj/papers/hpca2006.pdf
4. Kilburn, T., et al: “One-level storage systems, IRE
Transactions, EC-11, 2, 1962, p. 223-235
33