Chap.2 Performance
Download
Report
Transcript Chap.2 Performance
Gordon Moore
Gordon Moore,
cofounder of Intel
摩爾定律
1965:
2 x trans. per chip/year
After 1970:
2 x trans. per chip/1.5year
Growth in CPU transistor count
Consequences of Moore’s law
Cost of a chip remains unchanged during the
growth of in density => cost down
Electrical path length is shortened =>
increase operating speed
Computer becomes smaller
Reduction in power
More circuitry on each chip => fewer interchip connections => more reliable
Chap.4 The Role of Performance
Jen-Chang Liu, Spring 2005
Hardware performance is often key to the
effectiveness of an entire system of
hardware and software.
What do we mean by saying one computer
has better performance than another?
Example: performance of
airplanes
Performance of a hardware
system
What do we mean by better performance?
Fast speed ?
Response time (execution time): the time
between the start and completion of a task 完成
工作所需的時間
Throughput : the total amount of work done in a
given time 單位時間完成的工作
Ex. multi-user system
Performance measure
Quantitative relation of performance and execution time
on machine x:
1
Performance X =
Execution time x
* Relative performance: Machine A is n times faster than B
Performance
Performance
A
B
=n=
Execution time
Execution time
B
A
Ex. machine A runs a program in 10 sec.,
machine B runs a program in 15 sec.,
Performance
Performance
A
B
Execution time
=
Execution time
B
A
15
= 1.5
=
10
Problem with previous definition
of performance
The definition of execution time
How about multiple tasks run concurrently?
Use which programs to evaluate the
performance of a computer ?
Execution Time ?
執行時間的定義
The total time to complete a task – response
time, elapsed time 使用者觀點
In a timeshared system, such as Unix, a
processor work on several programs
Including disk access, memory access, I/O, OS
overhead…
Program A swap Prog. B
I/O Program A
Response time for A
CPU time
CPU execution time
Does not include waiting for I/O, running other
programs
不含 I/O, 執行其他程式時間
CPU exec. time = user CPU time + system CPU time
user CPU time
CPU time spent in the program
system CPU time
CPU time spent in the OS about our program
Example : CPU time
Unix command : time
90.7u 12.9s 2:39 65%
user CPU
system CPU
elapsed time
90.7+12.9
159
= 0.65
We will discuss CPU performance, i.e. user CPU time in the
following discussion
Unit of time
Seconds
Clock cycle
Ex. Clock cycle time = 2ns
Clock rate =
1
2x10-6
= 500 MHz
CPU time
CPU clock cycles
Clock cycle time
for a program = for a program x
=
Instructions
Average clock cycle Clock cycle
x
for a program x
per instruction
time
(CPI)
Example 1
Machine A,B has the same ISA, for the same
program
Machine A: clock cycle = 1ns, CPI = 2
Machine B: clock cycle = 2ns, CPI = 1.2
CPU timeA = Inst. count x CPI x clock cycle time
=
I
x 2 x 1
= 2I
CPU timeB = I x 1.2 x 2 = 2.4 I
Performance
Performance
A
B
=
Execution time
Execution time
B
A
2.4I
= 1.2
=
2I
A is 1.2 times faster than B
Example 2
Instruction class
CPI
A
1
B
2
C
3
Compiler generate 2 different code sequences
A
B
C
Total inst.
faster?
5
Code 1:
2
1
2
6
Code 2:
4
1
1
CPU clock cycle1 = 2x1 + 1x2 + 2x3 = 10 cycles
CPU clock cycle2 = 4x1 + 1x2 + 1x3 = 9 cycles
faster
Short conclusion
Computer Performance
software
hardware
Response time
I/O, other prog.s
Instruction
count
CPU time
CPI
Clock cycle
length
How to optimize them in a hardware design?
Problem with previous definition
of performance
The definition of execution time
How about multiple tasks run concurrently?
Use which programs to evaluate the
performance of a computer ?
Choose programs to evaluate
performance
Benchmarks: programs chosen to measure
performance
SPEC (System Performance Evaluation
Cooperative) suit of benchmarks
Started in 1989
http://open.specbench.org/
SPEC95 in textbook is retired…
SPECx contains a set of benchmark programs
SPEC – money…
SPEC95 benchmarks
Integer benchmarks
written in C
floating-pt benchmarks
written in Fortran 77
Summarize performance
Which is faster?
Computer A Computer B
Program 1(sec)
1
10
Program 2(sec)
1000
100
Total time(sec)
1001
110
Performance
Performance
B
A
=
Execution time
A
Execution time
B
=
1001
110
= 9.1
* Assume the programs occur in equal probability.
SPEC ratio
The execution time of a benchmark program
is normalized (compared to a baseline
system)
SPEC ratio =
Exec. Time on Sun SPARCstation 10/40
Exec. Time on the measured machine
SPECint95, SPECfp95
SPECint95 = geometric mean of SPEC ratios
Example: SPECint95 for
Pentium and Pentium Pro
10
1 Performance
improvement
9
8
1
SPECint
7
6
2 Clock rate x2
SPECint x 1.7 ?
5
4
3
2
2
1
0
50
100
150
Clock rate (MHz)
200
250
Pentium
Pentium Pro
Amdahl’s law in computing
CPU time
CPU clock cycles
for a program = for a program x
Clock rate
2 => CPU time 2
1
Clock rate
as in previous example
部分的改進
* Improvement of one aspect of a machine does not increase
performance by the same ratio
* Ex. The bottleneck in the memory system does not improve
Exec. time affected by improve.
Exec. time
Exec. time
=
+
Amount of improvement
after improve.
unaffected
Example: Amdahl’s law
A program takes 100s to run
20% multiplication, 50% memory op., 30%
others
What’s the speed up for
Multiply speed 4
Memory access 2
100
=1.18
20/4 + 50 + 30
100
=1.33
20 + 50/2 + 30
MIPS as a measurement (not
good…)
MIPS = Million Instructions Per Second
MIPS=
Instruction count
Execution time x 106
High MIPS => faster ?
Pitfalls:
MIPS cannot be used to compare computers with
different instruction sets => inst. count differs
MIPS varies between programs on the same
computer => no single MIPS for a machine
Example: MIPS ?
Example: 500 MHz machine
2 compilers for the same source program:
Inst. Count(x109) for each inst. class
B
C
A
Code 1
5
1
1
Code 2
10
1
1
Instruction class
A
B
C
CPI
1
2
3
Example: MIPS?
(5x1+1x2+1x3)x109 cycles
Exec. Time1 =
= 20 sec.
6
500x10 cycles/sec
(10x1+1x2+1x3)x109 cycles
Exec. Time2 =
= 30 sec.
500x106 cycles/sec
Exec. time1
<
Inst. count
MIPS1 = Exec timex106
Exec. time2
=
(5+1+1)x109
=350
6
20x10
9
(10+1+1)x10
MIPS2 =
=400
6
30x10
MIPS1
<
MIPS2