Transcript Slide 1

Lecture 1: Introduction
CprE 581 Computer Systems
Architecture, Fall 2005
Zhao Zhang
Traditional “Computer Architecture”
The term architecture is used here to
describe the attribute of a system as
seen by the programmer, i.e., the
conceptual structure and functional
behavior as distinct from the
organization of the data flow and
controls, the logic design, and the
physical implementation.

Gene Amdahl, IBM Journal R&D, April
1964
Contemporary “Computer Architecture”
Instruction set architecture
Microarchitecture:


Pipeline structures
Cache memories
Implementations

Logic design and synthesis
Fundamentals
Technology trends
Performance evaluation methodologies
Instruction Set Architecture
Technology Drives for High-Performance
VLSI technology: faster transistors and
larger transistor budget
CPU Performance
For sequential program:
CPU time = #Inst  CPI  Clock cycle
time
To improve performance
 Faster clock time
 Reduce #inst
 Reduce CPI or increase IPC
How to use one billion transistors?
Bit-level parallelism

Move from 32-bit to 64-bit
Instruction-level parallelism


Deep pipeline
Execute multiple instructions per cycle
Program locality

Large caches, more branch prediction
resouces
Thread-level parallelism
Instruction-Level Parallelism
Pipeline + Multi-issue
IF
IF
IF
IF
IF
ID
ID
ID
ID
ID
EX
EX
EX
EX
EX
MEM
MEM
MEM
MEM
MEM
WB
WB
WB
WB
WB
Instruction-level Parallelism
for (i=0; i<N; i++)
X[i] = a*X[i];
// let R3=&X[0],R4=&X[N]
// and F0=a
LOOP:
LD.D F2, 0(R3)
MUL.D F2, F2, F0
S.D F2, 0(R3)
DADD R3, R3, 8
BNE R3, R4, LOOP
What instructions are
parallel?
How to schedule those
instructions?
Instruction-Level Parallelism
Find independent instructions through
dependence analysis
Hardware approaches => Dynamically
scheduled superscalar

Most commonly used today: Intel Pentium,
AMD, Sun UltraSparc, and MIPS families
Software approaches => (1) Static
scheduled superscalar, or (2) VLIW
Modern Superscalar Processors
Example: Intel Pentium, IBM
Power/PowerPC, Sun UltraSparc, SGI
MIPS …



Multi-issue and Deep pipelining
Dynamic scheduling and speculative
execution
High bandwidth L1 caches and large L2/L3
caches
Modern Superscalar Processor
Challenges: Complexity!!!


How
Understand how it brings high
performance
 Will see wield designs
 Will use Verilog, simulation to help
understanding
Have big pictures
Modern Superscalar Processor
Maintain register data flow


Register renaming
Instruction scheduling
Maintain control flow


Branch prediction
Speculative execution and recovery
Maintain memory data flow


Load and store queues
Memory dependence speculation
Memory System Performance
Memory Stall CPI
= Miss per inst × miss penalty
= % Mem Inst × Miss rate × Miss Penalty
Assume 20% memory instruction, 2% miss rate,
400-cycle miss penalty. How much is memory
stall CPI?
Memory System Performance
A typical memory hierarchy today:
Proc/Regs
Bigger
L1-Cache
L2-Cache
Faster
L3-Cache (optional)
Memory
Disk, Tape, etc.
Here we focus on L1/L2/L3 caches, virtual
memory and main memory
Cache Design
Many applications are memory-bound

CPU speeds increases fast; memory speed cannot
match up
Cache hierarchy: exploits program locality




Basic principles of cache designs
Hardware cache optimizations
Application cache optimizations
Prefetching techniques
Also talk about virtual memory
High Performance Storage Systems
What limits the performance of web
servers? Storage!
Storage technology trends
RAID: Redundant array of inexpensive
disks
Multiprocessor Systems
Must exploit thread-level parallelism for
further performance improvement
Shared-memory multiprocessors: Cooperating
programs see the same memory address
How to build them?


Cache coherence
Memory consistency
Emerging Techniques
Low power design
Multicore and multithreaded
processors
Secure processor
Reliable design
Why Study Computer Architecture
As a hardware designer/researcher – know how
to design processor, cache, storage,
graphics, interconnect, and so on
As a system designer – know how to build a
computer system using the best components
available
As a software designer – know how to get the
best performance from the hardware
Class Web Site
www.ece.iastate.edu/~zzhang/cpre585/
Syllabus
Schedule
Homework assignments
Readings
WebCT: Grades, Assignments and Discussions