Transcript Slide 1

Lecture 1: Introduction
CprE 585 Advanced Computer
Architecture, Fall 2004
Zhao Zhang
Traditional “Computer Architecture”
The term architecture is used here to
describe the attribute of a system as
seen by the programmer, i.e., the
conceptual structure and functional
behavior as distinct from the
organization of the data flow and
controls, the logic design, and the
physical implementation.

Gene Amdahl, IBM Journal R&D, April
1964
Contemporary “Computer Architecture”
Instruction set architecture: program-visible
instruction set


Instruction format, memory addressing modes,
architectural registers, endian type, alignment, …
EX: RISC, CISC, VLIW, EPIC
Organization: high-level aspects of a
computer’s design

Pipeline structure, instruction scheduling, cache,
memory, disks, buses, etc.
Implementations: the specifics of a machine

Logic design, packaging technology
Fundamentals
ISA design principles and
performance evaluation
The impacts of technology trends and
market factors
Performance evaluation methodologies
High Performance Computer Architecture
Given a huge number of transistors, how
to run programs as rapid as possible?
Sequential Programs
Parallel and multiprogramming
programs
Instruction Level Parallelism
Sequential program performance:
Execution Time = #inst × CPI × Cycle time
Pipelining works well for sequential
programs


But best Performance limited by CPI >= 1.0
Pipeline hazards draws back performance
Multi-issue Pipeline
Naïve extension to multi-issue
IF
IF
IF
IF
IF
ID
ID
ID
ID
ID
EX
EX
EX
EX
EX
MEM
MEM
MEM
MEM
MEM
WB
WB
WB
WB
WB
Pipeline Efficiency
for (i=0; i<N; i++)
X[i] = a*X[i];
// let R3=&X[0],R4=&X[N]
// and F0=a
LOOP:
LD.D F2, 0(R3)
MUL.D F2, F2, F0
S.D F2, 0(R3)
DADD R3, R3, 8
BNE R3, R4, LOOP
How much parallelism
exist in the
program?
What’s the problem
with the naïve multiissue pipeline?


Data hazards
Control hazards
How to Exploit ILP?
Find independent instructions through
dependence analysis
Hardware approaches => Dynamically
scheduled superscalar

Most commonly used today: Intel Pentium,
AMD, Sun UltraSparc, and MIPS families
Software approaches => (1) Static
scheduled superscalar, or (2) VLIW
Dynamically Scheduled Superscalar
Important features:
Multi-issue and Deep pipelining
Dynamic scheduling
Speculative execution
Branch prediction
Memory dependence speculation
Non-blocking caches
High bandwidth caches
Dynamically Scheduled Superscalar
Challenges: Complexity!!!
Key issues:
Understand why it is correct


Know dependences
Will prove that dynamic execution is “correct”
Understand how it brings high performance


Will see wield designs
Will use Verilog, simulation to help understanding
Have big pictures
Memory System Performance
A typical memory hierarchy today:
Proc/Regs
Bigger
L1-Cache
L2-Cache
Faster
L3-Cache (optional)
Memory
Disk, Tape, etc.
Here we focus on L1/L2/L3 caches, virtual
memory and main memory
Memory System Performance
Memory Stall CPI
= Miss per inst × miss penalty
= % Mem Inst × Miss rate × Miss Penalty
Assume 20% memory instruction, 2% miss rate,
400-cycle miss penalty. How much is memory
stall CPI?
Cache Design
Many applications are memory-bound

CPU speeds increases fast; memory speed cannot
match up
Cache hierarchy: exploits program locality




Basic principles of cache designs
Hardware cache optimizations
Application cache optimizations
Prefetching techniques
Also talk about virtual memory
High Performance Storage Systems
What limits the performance of web
servers? Storage!
Storage technology trends
RAID: Redundant array of inexpensive
disks
Multiprocessor Systems
Must exploit thread-level parallelism for
further performance improvement
Shared-memory multiprocessors: Cooperating
programs see the same memory address
How to build them?


Cache coherence
Memory consistency
Other Topics
VLIW basics and modern VLIW processors
Simultaneous multithreading and chip-level
multiprocessing
Low power processor design
Circuit issues in high-performance processor
Other selected topics
Why Study Computer Architecture
As a hardware designer/researcher – know how
to design processor, cache, storage,
graphics, interconnect, and so on
As a system designer – know how to build a
computer system using the best components
available
As a software designer – know how to get the
best performance from the hardware