Computer Architecture

Download Report

Transcript Computer Architecture

Computer Architecture
Chapter 1
Fundamentals
Prof. Jerry Breecher
CSCI 240
Fall 2001
Chapter 1 - Fundamentals
1
Introduction
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy
Chapter 1 - Fundamentals
2
Art and
Architecture
What’s the difference
between
Art
and
Architecture?
Lyonel Feininger,
Marktkirche in Halle
Chapter 1 - Fundamentals
3
Art and Architecture
Notre Dame
de Paris
What’s the difference between Art and Architecture?
Chapter 1 - Fundamentals
4
What’s Computer Architecture?
The attributes of a [computing] system as seen by the
programmer, i.e., the conceptual structure and functional
behavior, as distinct from the organization of the data
flows and controls the logic design, and the physical
implementation.
Amdahl, Blaaw, and Brooks, 1964
SOFTWARE
Chapter 1 - Fundamentals
5
What’s Computer Architecture?
• 1950s to 1960s: Computer Architecture Course
Computer Arithmetic.
• 1970s to mid 1980s: Computer Architecture Course
Instruction Set Design, especially ISA appropriate for
compilers. (What we’ll do in Chapter 2)
• 1990s to 2000s: Computer Architecture Course
Design of CPU, memory system, I/O system,
Multiprocessors. (All evolving at a tremendous rate!)
Chapter 1 - Fundamentals
6
The Task of a
Computer Designer
1.1 Introduction
1.2 The Task of a Computer
Designer
1.3 Technology and Computer
Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting
Performance
1.6 Quantitative Principles of
Computer Design
1.7 Putting It All Together: The
Concept of Memory
Hierarchy
Implementation
Complexity
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
Implement Next
Generation System
Simulate New
Designs and
Organizations
Workloads
Chapter 1 - Fundamentals
7
Technology and
Computer Usage Trends
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design
1.7 Putting It All Together: The Concept of
Memory Hierarchy
When building a Cathedral numerous
very practical considerations need to
be taken into account:
• available materials
• worker skills
• willingness of the client to pay the
price.
Similarly, Computer Architecture is about
working within constraints:
• What will the market buy?
• Cost/Performance
• Tradeoffs in materials and processes
Chapter 1 - Fundamentals
8
Trends
Gordon Moore (Founder of Intel) observed in 1965 that the number of
transistors that could be crammed on a chip doubles every year.
This has CONTINUED to be true since then.
Transistors Per Chip
1.E+08
Pentium 3
Pentium Pro
1.E+07
Pentium II
Power PC G3
Pentium
486
1.E+06
Power PC 601
386
80286
1.E+05
8086
1.E+04
4004
1.E+03
1970
1975
1980
1985
1990
Chapter 1 - Fundamentals
1995
2000
2005
9
Trends
Processor performance, as measured by the SPEC benchmark has
also risen dramatically.
5000
Alpha 6/833
4000
3000
2000
Chapter 1 - Fundamentals
2000
99
98
97
96
95
94
93
92
91
IBM
RS/
6000
90
89
88
0
Sun MIPS
-4/
M
260 2000
87
1000
DEC Alpha 5/500
DEC
AXP/
DEC Alpha 21264/600
500 DEC Alpha 4/266
10
Trends
Memory Capacity (and Cost) have changed dramatically in the last 20
years.
size
1000000000
year
1980
1983
1986
1989
1992
1996
2000
100000000
Bits
10000000
1000000
100000
10000
1000
1970
1975
1980
1985
1990
1995
size(Mb)
cyc time
0.0625 250 ns
0.25
220 ns
1
190 ns
4
165 ns
16
145 ns
64
120 ns
256
100 ns
2000
Year
Chapter 1 - Fundamentals
11
Trends
Based on SPEED, the CPU has increased dramatically, but memory
and disk have increased only a little. This has led to dramatic
changed in architecture, Operating Systems, and Programming
practices.
Capacity
Speed (latency)
Logic
2x in 3 years
2x in 3 years
DRAM
4x in 3 years
2x in 10 years
Disk
4x in 3 years
2x in 10 years
Chapter 1 - Fundamentals
12
Measuring And
Reporting Performance
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design
1.7 Putting It All Together: The Concept of
Memory Hierarchy
This section talks about:
1. Metrics – how do we describe
in a numerical way the
performance of a computer?
2. What tools do we use to find
those metrics?
Chapter 1 - Fundamentals
13
Metrics
Plane
DC to Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
BAD/Sud
Concodre
3 hours
1350 mph
132
178,200
• Time to run the task (ExTime)
– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns …
(Performance)
– Throughput, bandwidth
Chapter 1 - Fundamentals
14
Metrics - Comparisons
"X is n times faster than Y" means
ExTime(Y)
--------ExTime(X)
=
Performance(X)
--------------Performance(Y)
Speed of Concorde vs. Boeing 747
Throughput of Boeing 747 vs. Concorde
Chapter 1 - Fundamentals
15
Metrics - Comparisons
Pat has developed a new product, "rabbit" about which she wishes to determine
performance. There is special interest in comparing the new product, rabbit to the
old product, turtle, since the product was rewritten for performance reasons. (Pat
had used Performance Engineering techniques and thus knew that rabbit was
"about twice as fast" as turtle.) The measurements showed:
Performance Comparisons
Product
Turtle
Rabbit
Transactions / second
30
60
Seconds/ transaction
0.0333
0.0166
Seconds to process transaction
3
1
Which of the following statements reflect the performance comparison of rabbit and
turtle?
o Rabbit is 100% faster than turtle.
o Rabbit is twice as fast as turtle.
o Rabbit takes 1/2 as long as turtle.
o Rabbit takes 1/3 as long as turtle.
o Rabbit takes 100% less time than turtle.
o Rabbit takes 200% less time than turtle.
o Turtle is 50% as fast as rabbit.
o Turtle is 50% slower than rabbit.
o Turtle takes 200% longer than rabbit.
o Turtle takes 300% longer than rabbit.
Chapter 1 - Fundamentals
16
Metrics - Throughput
Application
Answers per month
Operations per second
Programming
Language
Compiler
ISA
(millions) of Instructions per second: MIPS
(millions) of (FP) operations per second: MFLOP/s
Datapath
Control
Function Units
Transistors Wires Pins
Megabytes per second
Cycles per second (clock rate)
Chapter 1 - Fundamentals
17
Methods For Predicting
Performance
• Benchmarks, Traces, Mixes
• Hardware: Cost, delay, area, power estimation
• Simulation (many levels)
– ISA, RT, Gate, Circuit
• Queuing Theory
• Rules of Thumb
• Fundamental “Laws”/Principles
Chapter 1 - Fundamentals
18
Benchmarks
SPEC: System Performance Evaluation
Cooperative
•
•
•
First Round 1989
– 10 programs yielding a single number (“SPECmarks”)
Second Round 1992
– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)
• Compiler Flags unlimited. March 93 of DEC 4000 Model 610:
spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=
memcpy(b,a,c)”
wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200
nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
Third Round 1995
– new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating
point)
– “benchmarks useful for 3 years”
– Single flag setting for all programs: SPECint_base95, SPECfp_base95
Chapter 1 - Fundamentals
19
Benchmarks
CINT2000 (Integer Component of SPEC CPU2000):
Program
164.gzip
175.vpr
176.gcc
181.mcf
186.crafty
197.parser
252.eon
253.perlbmk
254.gap
255.vortex
256.bzip2
300.twolf
Language
C
C
C
C
C
C
C++
C
C
C
C
C
What Is It
Compression
FPGA Circuit Placement and Routing
C Programming Language Compiler
Combinatorial Optimization
Game Playing: Chess
Word Processing
Computer Visualization
PERL Programming Language
Group Theory, Interpreter
Object-oriented Database
Compression
Place and Route Simulator
http://www.spec.org/osg/cpu2000/CINT2000/
Chapter 1 - Fundamentals
20
Benchmarks
CFP2000 (Floating Point Component of SPEC
CPU2000):
Program
168.wupwise
171.swim
172.mgrid
173.applu
177.mesa
178.galgel
179.art
183.equake
187.facerec
188.ammp
189.lucas
191.fma3d
200.sixtrack
301.apsi
Language
Fortran 77
Fortran 77
Fortran 77
Fortran 77
C
Fortran 90
C
C
Fortran 90
C
Fortran 90
Fortran 90
Fortran 77
Fortran 77
What Is It
Physics / Quantum Chromodynamics
Shallow Water Modeling
Multi-grid Solver: 3D Potential Field
Parabolic / Elliptic Differential Equations
3-D Graphics Library
Computational Fluid Dynamics
Image Recognition / Neural Networks
Seismic Wave Propagation Simulation
Image Processing: Face Recognition
Computational Chemistry
Number Theory / Primality Testing
Finite-element Crash Simulation
High Energy Physics Accelerator Design
Meteorology: Pollutant Distribution
http://www.spec.org/osg/cpu2000/CFP2000/
Chapter 1 - Fundamentals
21
Sample Results For
SpecINT2000
Benchmarks
http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc
Benchmarks
Base
Base
Base
Ref Time Run Time Ratio
164.gzip
1400
175.vpr
1400
176.gcc
1100
181.mcf
1800
186.crafty
1000
197.parser
1800
252.eon
1300
253.perlbmk
1800
254.gap
1100
255.vortex
1900
256.bzip2
1500
300.twolf
3000
SPECint_base2000
SPECint2000
277
419
275
621
191
500
267
302
249
268
389
784
505*
334*
399*
290*
522*
360*
486*
596*
442*
710*
386*
382*
438
Peak
Peak
Peak
Ref Time Run Time Ratio
1400
1400
1100
1800
1000
1800
1300
1800
1100
1900
1500
3000
270
417
272
619
191
499
267
302
248
264
375
776
518*
336*
405*
291*
523*
361*
486*
596*
443*
719*
400*
387*
Intel OR840(1 GHz
Pentium III processor)
442
Chapter 1 - Fundamentals
22
Benchmarks
Performance Evaluation
• “For better or worse, benchmarks shape a field”
• Good products created when have:
– Good benchmarks
– Good ways to summarize performance
• Given sales is a function in part of performance relative to
competition, investment in improving product as reported by
performance summary
• If benchmarks/summary inadequate, then choose between
improving product for real programs vs. improving product to get
more sales;
Sales almost always wins!
• Execution time is the measure of computer performance!
Chapter 1 - Fundamentals
23
Benchmarks
How to Summarize Performance
Management would like to have one number.
Technical people want more:
1.
They want to have evidence of reproducibility – there should be enough
information so that you or someone else can repeat the experiment.
2.
There should be consistency when doing the measurements multiple
times.
How would you report these results?
Computer A
Computer B
Computer C
Program P1 (secs)
1
10
20
Program P2 (secs)
1000
100
20
Total Time (secs)
1001
110
40
Chapter 1 - Fundamentals
24
Quantitative Principles
of Computer Design
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design
1.7 Putting It All Together: The Concept of
Memory Hierarchy
Make the common case fast.
Amdahl’s Law:
Relates total speedup of a
system to the speedup of some
portion of that system.
Chapter 1 - Fundamentals
25
Quantitative
Design
Amdahl's Law
Speedup due to enhancement E:
Speedup( E ) 
Execution _ Time _ Without _ Enhancement
Performance _ With _ Enhancement

Execution _ Time _ With _ Enhancement
Performance _ Without _ Enhancement
This fraction enhanced
Suppose that enhancement E accelerates a fraction F
of the task by a factor S, and the remainder of the
task is unaffected
Chapter 1 - Fundamentals
26
Quantitative
Design
Amdahl's Law
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
Speedupoverall =
ExTimeold
ExTimenew
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
This fraction enhanced
ExTimeold
ExTimenew
Chapter 1 - Fundamentals
27
Quantitative
Design
Amdahl's Law
• Floating point instructions improved to run 2X; but only
10% of actual instructions are FP
ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold
Speedupoverall =
1
0.95
Chapter 1 - Fundamentals
=
1.053
28
Quantitative
Design
Cycles Per
Instruction
CPI = (CPU Time * Clock Rate) / Instruction Count
= Cycles / Instruction Count
n
CPU _ Time  Cycle _ Time *  CPIi * I i
i 1
“Instruction Frequency”
n
CPI   CPIi * Fi
i 1
where
Number of
instructions of
type I.
Fi 
Ii
Instruction _ Count
Invest Resources where time is Spent!
Chapter 1 - Fundamentals
29
Quantitative
Design
Cycles Per
Instruction
Suppose we have a machine where we can count the frequency with which
instructions are executed. We also know how many cycles it takes for
each instruction type.
Base Machine (Reg / Reg)
Op
Freq Cycles CPI(i)
ALU
50%
1
.5
Load
20%
2
.4
Store
10%
2
.2
Branch
20%
2
.4
Total CPI
1.5
(% Time)
(33%)
(27%)
(13%)
(27%)
How do we get CPI(I)?
How do we get %time?
Chapter 1 - Fundamentals
30
Quantitative
Design
Locality of
Reference
Programs access a relatively small portion of the address space at
any instant of time.
There are two different types of locality:
Temporal Locality (locality in time): If an item is referenced, it will
tend to be referenced again soon (loops, reuse, etc.)
Spatial Locality (locality in space/location): If an item is referenced,
items whose addresses are close by tend to be referenced soon
(straight line code, array access, etc.)
Chapter 1 - Fundamentals
31
The Concept of
Memory Hierarchy
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design
1.7 Putting It All Together: The Concept of
Memory Hierarchy
Fast memory is expensive.
Slow memory is cheap.
The goal is to minimize the
price/performance for a
particular price point.
Chapter 1 - Fundamentals
32
Memory Hierarchy
Registers
Level 1
cache
Level 2
Cache
Memory
Disk
Typical
Size
4 - 64
<16K bytes
<2 Mbytes
<16
Gigabytes
>5
Gigabytes
Access
Time
1 nsec
3 nsec
15 nsec
150 nsec
5,000,000
nsec
Bandwidth
(in MB/sec)
10,000 –
50,000
2000 - 5000
500 - 1000
500 - 1000
100
Managed
By
Compiler
Hardware
Hardware
OS
OS/User
Chapter 1 - Fundamentals
33
Memory Hierarchy
• Hit: data appears in some block in the upper level (example:
Block X)
– Hit Rate: the fraction of memory access found in the upper level
– Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
• Miss: data needs to be retrieve from a block in the lower level
(Block Y)
– Miss Rate = 1 - (Hit Rate)
– Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
• Hit Time << Miss Penalty (500 instructions on 21264!)
Chapter 1 - Fundamentals
34
Memory Hierarchy
Registers
Level 1
cache
Level 2
Cache
Memory
Disk
What is the cost of executing a program if:
• Stores are free (there’s a write pipe)
• Loads are 20% of all instructions
• 80% of loads hit (are found) in the Level 1 cache
• 97 of loads hit in the Level 2 cache.
Chapter 1 - Fundamentals
35
Wrap Up
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy
Chapter 1 - Fundamentals
36