CIS 570 Advanced Computer Systems - IC
Download
Report
Transcript CIS 570 Advanced Computer Systems - IC
CIS 570
Advanced Computer Systems
University of Massachusetts Dartmouth
Instructor: Dr. Michael Geiger
Fall 2008
Lecture 1: Fundamentals of Computer
Design
Outline
Syllabus & course policies
Changes in computer architecture
What is computer architecture?
Design principles
9/3/08
M. Geiger
CIS 570
Lec. 1
2
Syllabus notes
Course web site (still under construction):
http://www.cis.umassd.edu/~mgeiger/
cis570/f08.htm
TA: To be determined
My info:
Office: Science & Engineering, 221C
Office hours: M 1:30-2:30, T 2-3:30, Th 2:30-4
E-mail: [email protected]
Course text: Hennessy & Patterson’s
Computer Architecture: A Quantitative
Approach, 4th ed.
9/3/08
M. Geiger
CIS 570
Lec. 1
3
Course objectives
9/3/08
To understand the operation of modern
microprocessors at an architectural level.
To understand the operation of memory and
I/O subsystems and their relation to overall
system performance.
To understand the benefits of
multiprocessor systems and the difficulties
in designing and utilizing them.
To gain familiarity with simulation
techniques used in research in computer
architecture.
M. Geiger
CIS 570
Lec. 1
4
Course policies
Prereqs: CIS 273 & 370 or equivalent
Academic honesty
9/3/08
All work individual unless explicitly stated
otherwise (e.g., final projects)
May discuss concepts (e.g., how does Tomasulo’s
algorithm work) but not solutions
Plagiarism is also considered cheating
Any assignment or portion of an assignment
violating this policy will receive a grade of 0
More severe or repeat infractions may incur
additional penalties, up to and including a failing
grade in the class
M. Geiger
CIS 570
Lec. 1
5
Grading policies
Assignment breakdown:
Problem sets: 20%
Simulation exercises: 10%
Research project (including report &
presentation): 20%
Midterm exam: 15%
Final exam: 25%
Quizzes & participation: 10%
Late assignments: 10% per day
9/3/08
M. Geiger
CIS 570
Lec. 1
6
Topic schedule
Computer design fundamentals
Basic ISA review
Architectural simulation
Uniprocessor systems
Advanced pipelining—exploiting ILP & TLP
Memory hierarchy design
Storage & I/O
Multiprocessor systems
9/3/08
Memory in multiprocessors
Synchronization
Interconnection networks
M. Geiger
CIS 570
Lec. 1
7
Changes in computer architecture
Old Conventional Wisdom: Power is free, Transistors expensive
New Conventional Wisdom: “Power wall” Power expensive, Xtors
free
(Can put more on chip than can afford to turn on)
Old CW: Sufficiently increasing Instruction Level Parallelism via
compilers, innovation (Out-of-order, speculation, VLIW, …)
New CW: “ILP wall” law of diminishing returns on more HW for
ILP
Old CW: Multiplies are slow, Memory access is fast
New CW: “Memory wall” Memory slow, multiplies fast
(200 clock cycles to DRAM memory, 4 clocks for multiply)
Old CW: Uniprocessor performance 2X / 1.5 yrs
New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall
Uniprocessor performance now 2X / 5(?) yrs
Sea change in chip design: multiple “cores”
(2X processors per chip / ~ 2 years)
9/3/08
More simpler processors are more power efficient
M. Geiger
CIS 570
Lec. 1
8
Uniprocessor performance
Performance (vs. VAX-11/780)
10000
From Hennessy and Patterson, Computer
Architecture: A Quantitative Approach, 4th
edition, October, 2006
??%/year
1000
52%/year
100
10
25%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
• VAX
: 25%/year 1978 to 1986
• RISC + x86: 52%/year 1986 to 2002
• RISC + x86: ??%/year 2002 to present
9/3/08
M. Geiger
CIS 570
Lec. 1
9
Chip design changes
Intel 4004 (1971): 4-bit
processor,
2312 transistors, 0.4 MHz,
10 micron PMOS, 11 mm2 chip
RISC II (1983): 32-bit, 5 stage
pipeline, 40,760 transistors, 3
MHz,
3 micron NMOS, 60 mm2 chip
125 mm2 chip, 0.065 micron
CMOS
= 2312 RISC
II+FPU+Icache+Dcache
9/3/08
M. Geiger
CIS 570
Lec. 1
10
From ILP to TLP & DLP
(Almost) All microprocessor companies
moving to multiprocessor systems
Single processors gain performance by
exploiting instruction level parallelism (ILP)
Multiprocessors exploit either:
Embedded domain is the lone holdout
Thread level parallelism (TLP), or
Data level parallelism (DLP)
What’s the problem?
9/3/08
M. Geiger
CIS 570
Lec. 1
11
From ILP to TLP & DLP (cont.)
We’ve got tons of infrastructure for single-processor
systems
Multiprocessor design: not as simple as creating a
chip with 1000 CPUs
Algorithms, languages, compilers, operating systems,
architectures, etc.
These don’t exactly scale well
Task scheduling/division
Communication
Memory issues
Even programming moving from 1 to 2 CPUs is
extremely difficult
Not strictly computer architecture, but it can’t
happen without architects
9/3/08
M. Geiger
CIS 570
Lec. 1
12
CIS 570 Approach
How are we going to address this change?
Start by going through single-processor systems
Then, we’ll look at multiprocessor systems
9/3/08
Study ILP and ways to exploit that
Delve into memory hierarchies for single processors
Talk about storage and I/O systems
We may touch on embedded systems at this point
Discuss TLP and DLP
Talk about how multiprocessors affect memory design
Cover interconnection networks
M. Geiger
CIS 570
Lec. 1
13
What is computer architecture?
software
instruction set
hardware
Classical view: instruction set architecture (ISA)
9/3/08
Boundary between hardware and software
Provides abstraction at both high level and low level
M. Geiger
CIS 570
Lec. 1
14
ISA vs. Computer Architecture
Modern issues aren’t in instruction set design
Computer architecture now encompasses a
larger range of technical issues
Modern view: ISA + design of computer
organization & hardware to meet goals and
functional requirements
“Architecture is dead” … or is it?
Organization: high-level view of system
Hardware: specifics of a given system
Function of complete system now the issue
9/3/08
M. Geiger
CIS 570
Lec. 1
15
The roles of computer architecture
… as David Patterson sees it, anyway
Other fields borrow ideas from architecture
Anticipate and exploit advances in technology
Develop well-defined, thoroughly tested
interfaces
Quantitative comparisons to determine when
goals are reached
Quantitative principles of design
9/3/08
M. Geiger
CIS 570
Lec. 1
16
Goals and requirements
What goals might we want to meet?
Performance
Power
Price
Dependability
We’ll talk about how to quantify these as
needed throughout the semester
9/3/08
Primarily focus on performance (both
uniprocessor & multiprocessor systems) and
dependability (mostly storage systems)
M. Geiger
CIS 570
Lec. 1
17
Design principles
1.
2.
3.
4.
5.
9/3/08
Take advantage of parallelism
Principle of locality
Focus on the common case
Amdahl’s Law
Generalized processor performance
M. Geiger
CIS 570
Lec. 1
18
1. Take advantage of parallelism
Increasing throughput of server computer via multiple processors
or multiple disks
Detailed HW design
Carry lookahead adders uses parallelism to speed up computing
sums from linear to logarithmic in number of bits per operand
Multiple memory banks searched in parallel in set-associative
caches
Pipelining: overlap instruction execution to reduce the total time
to complete an instruction sequence.
Not every instruction depends on immediate predecessor
executing instructions completely/partially in parallel possible
Classic 5-stage pipeline:
1) Instruction Fetch (Ifetch),
2) Register Read (Reg),
3) Execute (ALU),
4) Data Memory Access (Dmem),
5) Register Write (Reg)
9/3/08
M. Geiger
CIS 570
Lec. 1
19
2. Principle of locality
The Principle of Locality:
Program access a relatively small portion of the address space at
any instant of time.
Two Different Types of Locality:
Temporal Locality (Locality in Time): If an item is referenced, it
will tend to be referenced again soon (e.g., loops, reuse)
Spatial Locality (Locality in Space): If an item is referenced, items
whose addresses are close by tend to be referenced soon
(e.g., straight-line code, array access)
Last 30 years, HW relied on locality for memory perf.
Guiding principle behind caches
To some degree, guides instruction execution, too (90/10 rule)
P
9/3/08
MEM
$
M. Geiger
CIS 570
Lec. 1
20
3. Focus on the common case
In making a design trade-off, favor the frequent case over the
infrequent case
E.g., Instruction fetch and decode unit used more frequently than
multiplier, so optimize it 1st
E.g., If database server has 50 disks / processor, storage
dependability dominates system dependability, so optimize it 1st
Frequent case is often simpler and can be done faster than the
infrequent case
E.g., overflow is rare when adding 2 numbers, so improve
performance by optimizing more common case of no overflow
May slow down overflow, but overall performance improved by
optimizing for the normal case
What is frequent case and how much performance improved by
making case faster => Amdahl’s Law
9/3/08
M. Geiger
CIS 570
Lec. 1
21
4. Amdahl’s Law
Fractionenhanced
ExTimenew ExTimeold 1 Fractionenhanced
Speedup
enhanced
Speedupoverall
ExTimeold
ExTimenew
1
1 Fractionenhanced
Fractionenhanced
Speedupenhanced
Best you could ever hope to do:
Speedupmaximum
9/3/08
1
1 - Fractionenhanced
M. Geiger
CIS 570
Lec. 1
22
5. Processor performance
CPU time
= Seconds
= Instructions x
Program
Program
Program
Inst Count
X
CPI
Compiler
X
(X)
Inst. Set.
X
X
Organization
X
x Seconds
Instruction
Cycle
Clock Rate
X
Technology
9/3/08
Cycles
X
M. Geiger
CIS 570
Lec. 1
23
Next week
Review of ISAs (Appendix B)
Review of pipelining basics (Appendix A)
Discussion of architectural simulation
9/3/08
M. Geiger
CIS 570
Lec. 1
24
Acknowledgements
This lecture borrows heavily from David
Patterson’s lecture slides for EECS 252:
Graduate Computer Architecture, at the
University of California, Berkeley
Many figures and other information are taken
from Hennessy and Patterson, Computer
Architecture: A Quantitative Approach, 4th ed
unless otherwise noted
9/3/08
M. Geiger
CIS 570
Lec. 1
25