Lecture Notes 5

Download Report

Transcript Lecture Notes 5

Lecture 5:
Memory Performance
Types of Memory
Registers
L1 cache
L2 cache
L3 cache
Main Memory
Local Secondary Storage (local disks)
Remote Secondary Storage
(distributed file system, web servers)
Memory Hierarchy
Random Access Memory (RAM)


•
•
•
•
•
•
•
•
•
DRAM (Dynamic RAM)
Must be refreshed periodically
1 transistor per bit
Unavailable when it is being refreshed
Slower
Less expensive
SRAM (Static RAM)
Does not require periodic refreshes
5-6 transistors per bit
Faster and more complex
More expensive
Processor-Memory Problem

Processors issue instructions roughly every
nanosecond

DRAM can be accessed roughly every 100
nanoseconds

The gap is growing:
•
•
processors getting faster by 60% per year
DRAM getting faster by 7% per year
Processor-Memory Problem
Locality of Reference
Principle of locality is the tendency of a program to
reference data items that are near other recently
referenced data items or that are recently
referenced themselves.
Programs with good locality run faster.
Locality of Reference
Locality has two distinct forms:

Temporal Locality: A memory location that is
referenced once is likely to be referenced again
multiple times in the near future.

Spatial Locality: If a memory location is
referenced once, the program is likely to
reference a nearby location in the near future.
Cache Performance

Trashing: Cache is repeatedly loading and
evicting the same cache blocks

Padding: Extra bytes at the end of an array
Cache Performance

Intel Core i7
Cache Performance

Read throughput (read bandwidth): The rate that a
program reads data from memory (MB/s)
Cache Performance
Writing cache-friendly code:
1. Focus on the inner loops where most of the computation and
memory accesses occur.
2. Maximize spatial locality by reading data sequentially with stride-1
•
Stride-1 reference pattern is good because data is stored in caches
as contiguous blocks
3. Maximize temporal locality by using data as often as possible once
it has been read from memory.
•
Repeated references to local variables are good because compiler
can cache them in the register file
Cache Performance

Matrix Multiply Performance
jki, kji
ijk, jik
kij, ikj
Memory Interleaving
Memory Interleaving
Virtual Memory