Chapter 12 Introduction The Memory Hierarchy

Download Report

Transcript Chapter 12 Introduction The Memory Hierarchy

1
COMPUTER SYSTEMS
ARCHITECTURE
A NETWORKING APPROACH
CHAPTER 12 INTRODUCTION
THE MEMORY HIERARCHY
CS 147
Nathaniel Gilbert
Levels of Performance – You Get What
You Pay For
2
Recall:
 Dynamic Random Access Memory (DRAM)
 Capacitors
to store state (0 or 1)
 Periodically refreshed
 Relatively cheap

Static Random Access Memory (SRAM)
 Transistors
to store state
 Doesn’t need to be refreshed, faster, and uses less
power than DRAM
 More expensive than DRAM
Levels of Performance cont.
3
Currently, one Pound
is about 2 US Dollars.
R = removable media
Levels of Performance cont.
4
Storage Hierarchy – fastest
CPU registers at top, slowest
tape drives at bottom
Pre-fetching – Data
transferred between layers is
usually bigger than
requested. This is to
anticipate using the extra
blocks of data.
Localization of Access – exploiting
repetition
5



Computers tend to access the
same locality of memory.
This is partly due to the
programmer organizing data
in clusters along with the
compiler attempting to
organize code efficiently.
This localization can be
exploited in memory
hierarchy.
Localization of Access cont.
6

Exploiting localization of memory access
 Keep
related data in smaller groups (try not to store all
input and output to a single array when reading
from/writing to disk)
 Only the portion of data the CPU is using should be
loaded into faster memory.
Localization of Access cont.
7
The following
code was used by
the author to
demonstrate
cache action
(exploiting
localization of
memory access)
Localization of Access cont.
8

On a sun workstation (200 MHz CPU, 256 Mbyte
main memory, 256 kbyte cache, 4 Gbyte local hard
drive), the output was:
(Time is system
clock ticks)
Localization of Access cont.
9

The reason for the doubling of time is the movement
of data up and down the data hierarchy.
 The
array is sent to higher memory in blocks because
the 256 kbytes of cache memory cannot hold the whole
object.
Instruction and Data Caches – Matching
Memory to CPU Speed
10



A 2 GHz Pentium CPU accesses program memory
an average off 0.5 ns just for fetching instructions
DDO DRAM responds within 10 ns. If the CPU only
used DRAM, it would result in 20x loss in speed
This is where using SRAM (cache) comes into play
 Downfall
 Misses
of cache:
(if the desired code is not in the memory segment)
may take longer because the memory has to be reloaded
 Negative cache – (depending on architecture) where
negative results (failures) are stored
Instruction and Data Caches cont.
11



Cache is built from SRAM chips, and
ideally are made to match the system
clock speed of a CPU
The Cache Controller Unit (CCU) and
cache memory, are inserted between the
CPU and the main memory.
Level 1 and Level 2 cache are different
by placement.


Level 1 is on the CPU chip.
Level 2 was generally located off the CPU
chip and was slowed down by the system
bus. Intel successfully integrated a 128
kbyte L2 cache memory onto the CPU and
continues to offer integrated chips.
Instruction and Data Caches cont.
12

Generic System Architecture

Level 1 is the microprocessor with three
forms of cache:

D-cache – (Data) Fast buffer
containing application data

I-cache – (Instruction) Speed up
executable instruction

TLB – (Translation Lookaside Buffer)
Stores a map of translated virtual
page addresses

Level 2 is Unified cache

Memory – DRAM

CPU and Register file reside in Level 1

Register file – Small amount of
memory closest to CPU where data
is manipulated
Thank You
13