Transcript uP Memory

ECE 371
Microprocessors
Chapter 1
Microprocessor Memory
Herbert G. Mayer, PSU
Status 10/12/2015
For use at CCUT Fall 2015
Parts gratefully taken with permission from Eric Krause @ PSU
1
Syllabus





Introduction
Latency and Bandwidth
Memory Hierarchy
Memory Low-Level View
Bibliography
2
Introduction
 Key modules of any microprocessor (µP) are:
1. Central Processing Unit AKA CPU, includes ALU, Register
File, pc
2. Memory (AKA Main Memory)
3. Caches; L1 and sometimes L2 integrated on same silicon
die, but not logically part of CPU
4. Data-, address-, and control bus connecting CPU and
memory; AKA System Bus
5. Peripherals and their controllers, connected to system bus
6. IO devices and controller, connected to system bus; some
exceptions: e.g. DMA
7. Branch Prediction unit; invisible to API
 Vast speed differences between CPU and memory
 Great speed disparity between various controllers
 Very slow are those accepting manual (human) input
3
Introduction: Generic µP
Generic Microprocessor System with Memory, Controllers
4
Introduction: Generic µP
 Generic µP above shows coarse resolution of CPU
 Leaves out caches and branch prediction, necessary
only to increase processing and data access speed
 Lists program counter (PC, AKA instruction pointer)
separately from register file
 And views all components other than CPU as “hanging
off” the central system bus
 In reality, often multiple buses are used, with varying
speeds, width, functions, etc.
 Some buses are proprietary, in order to maximize
transmission speed (latency, throughput)
 Other buses for commodity peripherals (disks, thumb
drives, printers etc.) have standardized interfaces, at
times with much lower speeds
5
Introduction: Abstract µP
Abstract Microprocessor System with Memory, Controllers
6
Introduction: Abstract µP
 Abstract µP above also exhibits coarse resolution of
complete system
 But highlights multiple buses for control, addresses,
and actual data transmitted between memory and CPU
 Only the box “Input and Output” is even more
abstracted than in earlier pictures
 There are many ways of depicting the same idea,
depending on what detail is omitted from a more
complete µP system
 See yet another model on the next page, where
memory is further partitioned into logical subsections
7
Introduction: Another Abstract µP
8
Latency and Bandwidth
 Most system components, specifically memory,
can be characterized by fundamental metrics:
1. Latency
2. Bandwidth
 The unit of latency is time I; latency often
measured in milliseconds or microseconds
 The unit of bandwidth is number of data units
per time t; for example, bandwidth can be
gigabytes per second
9
Latency
 Latency is the time elapsed between issuing a
specific system request and receiving the response
 Def: memory latency is the time between 1.) issuing
a memory access –by executing a ld or st
instruction– and 2.) the time a next instruction can
be initiated
 Measured in units of seconds * 10-3, 10-6 or 10-9
 Very careful, the memory latency definition did not
specify the second point as being the time the
memory access completes
 This distinction alludes to speculative execution,
discussed later
10
Latency
 Latency is critical, as a µP generally stalls –ignoring
speculative execution for the moment– if its
memory subsystem fails to respond to an access
request swiftly
 High memory latency is undesired, since it
increases the time of program completion
 Low latency is attractive, as it shortens that time
 Within one selected technology, latency cannot
widely be improved by spending more resources
 If latency must be improved for a µP system,
generally some other memory access technology
should be selected; often more expensive
11
Latency
 Bandwidth characterizes data throughput of a system
 For example bandwidth of a memory subsystem is
the number information units transmitted per time
 Often that information unit is the byte, an addressable
composite of 8 bits
 On large mainframes unit of information is the word,
e.g. 60-bit words on Cyber systems, but these are not
microprocessors 
 High-performance µP can overlap multiple memory
requests to increase bandwidth, without changing
latency
 Bandwidth can, up to some technology limit, be
improved by higher technology cost, i.e. spending
more $ for wider buses or other resources
12
Memory Hierarchy
 Conventional to show memory in block diagrams as
one logical block
 In reality there are several types of memory, with
differing attributes such as speed, size, cost, lifetime
etc.
 One of those speed levels is cache memory, with the
best speed attribute!
 Why would an architect not design main memory solely
with cache memory technology?
 Rhetorical question: total cost would be prohibitive, yet
it would indeed speed up average memory accesses
dramatically
 During the evolution of computer technology, the
speed discrepancy between processor sped and
memory access speed has grown worse!
13
Trend of Memory Speed
Performance
CPU
Intel® Pentium II
Processor:
Out of Order
Execution
~30%
DRAM
Multilevel
Caches Caches
Time
Intel® Xeon™ Processor:
Hyperthreading
Technology
~30%
Instruction
Level
Thread
Level
Processor CPU speed increases over time.
Memory access also speed increases of time,
but more slowly than CPU, hence the gap widens!
14
Ideal Memory
 Has: unbounded capacity, to store any data set and
any program code
 Exploits: infinite bandwidth, to move such large data
sets to/from the µP
 Is: persistent, i.e. bits of information retain their
value between power cycles
 Exhibits: no latency, so that the µP never has reason
to stall
 Costs: Low cost, so that $ investment for very large
main memory does not dominate µP system expense
15
Memory Hierarchy
Memory pyramid shows HW resources that hold data
sorted in decreasing order of speed, top to bottom:
1. Registers, internal to µP, small in number, except on
newer architectures, such as Itanium  Fastest!
2. L1 cache, often on chip, few tens of kB
3. L2 cache, on chip on newer µP, many tens of kB
4. L3 cache, common on servers, generally off-chip
5. Main memory, can be smaller than logical address
space; see VMM
6. SSD disk, known as solid state disk; AKA RAM disk,
is storage device w/o moving parts; so disk is not to
be interpreted literally
7. Old fashioned removable disk, with rotating
magentic storage
8. Back-up tape or disk; slowest
16
Memory Hierarchy
17
Memory Hierarchy
 Various hardware resources hold information to be
processed by the CPU, specifically by a component
of the ALU
 Ideally, all such data are present in HW registers
 Since register to register operations often can be
completed in a single cycle
 And on a superscalar µP sometimes multiple
instructions can be executed in a single cycle
 Alas! There are only few registers available, thus data
also need to reside elsewhere
 Generally, that is main memory
 Until short before the advent of 64-bit computing,
physical memories were sufficiently large to hold all
data
18
Memory Hierarchy (from Wikipedia)
19
486 Memory Organization
20
Memory Characteristics
Memory Characteristic by Technology
1
2
3
4
5
6
7
8
Type of Memory
register
L1 cache on chip
L2 cache on chip
L3 cache off chip
main memory
SSD drive
rotaing disk
tape
Technology
SRAM multiported
SRAM
SRAM
SRAM
SRAM
SRAM
magnetic material
magnetic material
21
Bandwidth
200+ GB/s
400+ GB/s
20 GB/s
10 GB/s
2+ GB/s
20+ MB/s
10+ GHB/s
few MB/s
Latency
250+ ps
300+ ps
2+ ns
10+ ns
50+ ns
100+ ns
10+ ms
>1 s
Memory Low-Level View
(Section taken from Eric Krause PSU ECE Dept.)
22
Memory Terminology

Latch (one FF) stores 1 bit

Register stores a full machine word

Memory devices store >> 1 words; on 32-bit architectures it is
feasible to have all of memory as real, physcial memory

General usage of reading memory:


enable device/memory

supply address of word on address lines

addressed word arrives on data lines
Common Terms

word size = number of bits of natural computing unit, e.g. integer

Capacity
= 2address bits

Bus
= parallel lines connecting memory and µP

Volatility
= means: memory contents are lost on power-off

ROM
= Read Only Memory

RAM
= Random Access Memory
23
Memory Types
 Types of memory:






MROM: Mask-programmed during manufacturing
PROM: Programmed by user, by blowing fuses
EPROM: Electrically programmable by user, erased by
exposing to ultraviolet light
EEPROM: Electrically Erasable Programmable ROM; can
be programmed or erased by user, one byte at a time
Flash EEPROM: A type of EEPROM programmable in
blocks rather than single bytes
Synchronous Flash EEPROM: Synchronous version of
the above
 Memory attributes



All are non-volatile
Writes are significantly slower than reads!
Asynchronous (except for EEPROM)
24
Words of Memory
ROM: Read Only Memory
1. How many words can be
addressed? 2address bits = 215
2. What is the width of these
words? 8 bits
3. How is it activated/turned on?
Specific signal
4. Why does it have 3-state
outputs? Allow multiple of
same devices to be conencted
25
ROM Timing
Pins are: power, clock, address, output, CE and OE
26
ROM Timing
 Assuming Address is asserted when supplied to inputs
 There will be propogation delay before output appears
 Called Address Access Time tACC, i.e. time to wait for valid
data available on outputs after address is applied
 Assuming it was already powered up and outputs were
enabled
 A key parameter when using memory
 If address is already present on address inputs when CE#
is asserted, it takes some time to power on tCE
 tOE is the time for output buffers to be turned on
27
ROM Timing
 These parameters are major limiting factors for the
performance of a microprocessor and pose
important design considerations
 What is happening on the data lines before the
output is valid? See chevrons; bus is in unknown
state!
 Individual outputs may be low or high
 What do valid data look like? Or invalid data?
 How do we know if we clock in invalid data? We
generally don’t!
28
RAM Timing
 Note that ROMs are also randomly addressable
 Random access memory (RAM) can be read,
written, is volatile
 Speed to read and write are equal
 SRAM: has very fast access times, uses 6
transistors/bit. Data are static while powered
 DRAM: Slower access, but only 1 transistor/bit.
Data must be refreshed regularly while powered
 SDRAM: Pipelined, synchronous DRAM
 DDR SDRAM: Double Data Rate SDRAM
29
RAM Timing






RAM: Random Access Memory, here 12
address lines:
How many words can be addressed? 4k
What is the width of these words? 8 bits
Is Asynchronous!
Benefit of synchronous would be: here µP
needs to keep address the whole time
waiting for data to arrive; only then can µP
move to next address; µP couldn't start
generating next address until the current
operation was complete; wasted time,
especially as processor speeds increase
Synchronous device can send address
into the data input for duration of HOLD
time, which in modern devices can be 0,
and clock it. Then µP can do something
else; enabling parallel operations
Pipelined SRAM can do this:
30
Pipelined SRAM
31
Pipelined SRAM
Actual Data Sheet for pipelined SRAM, shows:





input registers
output registers
address register
enable register (control signals)
control logic: see the gates on left
32
Bibliography
1. Shen, John Paul, and Mikko H. Lipasti: Modern
Processor Design, Fundamentals of Superscalar
Processors, McGraw Hill, ©2005
2. http://forums.amd.com/forum/messageview.cfm?cati
d=11&threadid=29382&enterthread=y
3. http://www.ece.umd.edu/~blj/papers/hpca2006.pdf
4. Kilburn, T., et al: “One-level storage systems, IRE
Transactions, EC-11, 2, 1962, p. 223-235
33