Chap.2 Performance

Download Report

Transcript Chap.2 Performance

Gordon Moore
Gordon Moore,
cofounder of Intel
摩爾定律
1965:
2 x trans. per chip/year
After 1970:
2 x trans. per chip/1.5year
Growth in CPU transistor count
Consequences of Moore’s law

Cost of a chip remains unchanged during the
growth of in density => cost down
Electrical path length is shortened =>
increase operating speed
Computer becomes smaller

Reduction in power



More circuitry on each chip => fewer interchip connections => more reliable
Chap.4 The Role of Performance
Jen-Chang Liu, Spring 2005


Hardware performance is often key to the
effectiveness of an entire system of
hardware and software.
What do we mean by saying one computer
has better performance than another?
Example: performance of
airplanes
Performance of a hardware
system

What do we mean by better performance?



Fast speed ?
Response time (execution time): the time
between the start and completion of a task 完成
工作所需的時間
Throughput : the total amount of work done in a
given time 單位時間完成的工作

Ex. multi-user system
Performance measure
Quantitative relation of performance and execution time
on machine x:
1
Performance X =
Execution time x
* Relative performance: Machine A is n times faster than B
Performance
Performance
A
B
=n=
Execution time
Execution time
B
A
Ex. machine A runs a program in 10 sec.,
machine B runs a program in 15 sec.,
Performance
Performance
A
B
Execution time
=
Execution time
B
A
15
= 1.5
=
10
Problem with previous definition
of performance

The definition of execution time


How about multiple tasks run concurrently?
Use which programs to evaluate the
performance of a computer ?
Execution Time ?

執行時間的定義
The total time to complete a task – response
time, elapsed time 使用者觀點


In a timeshared system, such as Unix, a
processor work on several programs
Including disk access, memory access, I/O, OS
overhead…
Program A swap Prog. B
I/O Program A
Response time for A
CPU time

CPU execution time

Does not include waiting for I/O, running other
programs
不含 I/O, 執行其他程式時間
CPU exec. time = user CPU time + system CPU time

user CPU time


CPU time spent in the program
system CPU time

CPU time spent in the OS about our program
Example : CPU time

Unix command : time
90.7u 12.9s 2:39 65%
user CPU
system CPU
elapsed time
90.7+12.9
159
= 0.65
We will discuss CPU performance, i.e. user CPU time in the
following discussion
Unit of time


Seconds
Clock cycle

Ex. Clock cycle time = 2ns
Clock rate =
1
2x10-6
= 500 MHz
CPU time
CPU clock cycles
Clock cycle time
for a program = for a program x
=
Instructions
Average clock cycle Clock cycle
x
for a program x
per instruction
time
(CPI)
Example 1

Machine A,B has the same ISA, for the same
program


Machine A: clock cycle = 1ns, CPI = 2
Machine B: clock cycle = 2ns, CPI = 1.2
CPU timeA = Inst. count x CPI x clock cycle time
=
I
x 2 x 1
= 2I
CPU timeB = I x 1.2 x 2 = 2.4 I
Performance
Performance
A
B
=
Execution time
Execution time
B
A
2.4I
= 1.2
=
2I
A is 1.2 times faster than B
Example 2
Instruction class
CPI
A
1
B
2
C
3
Compiler generate 2 different code sequences
A
B
C
Total inst.
faster?
5
Code 1:
2
1
2
6
Code 2:
4
1
1
CPU clock cycle1 = 2x1 + 1x2 + 2x3 = 10 cycles
CPU clock cycle2 = 4x1 + 1x2 + 1x3 = 9 cycles
faster
Short conclusion
Computer Performance
software
hardware
Response time
I/O, other prog.s
Instruction
count
CPU time
CPI
Clock cycle
length
How to optimize them in a hardware design?
Problem with previous definition
of performance

The definition of execution time


How about multiple tasks run concurrently?
Use which programs to evaluate the
performance of a computer ?
Choose programs to evaluate
performance


Benchmarks: programs chosen to measure
performance
SPEC (System Performance Evaluation
Cooperative) suit of benchmarks




Started in 1989
http://open.specbench.org/
SPEC95 in textbook is retired…
SPECx contains a set of benchmark programs
SPEC – money…
SPEC95 benchmarks
Integer benchmarks
written in C
floating-pt benchmarks
written in Fortran 77
Summarize performance

Which is faster?
Computer A Computer B
Program 1(sec)
1
10
Program 2(sec)
1000
100
Total time(sec)
1001
110
Performance
Performance
B
A
=
Execution time
A
Execution time
B
=
1001
110
= 9.1
* Assume the programs occur in equal probability.
SPEC ratio

The execution time of a benchmark program
is normalized (compared to a baseline
system)
SPEC ratio =

Exec. Time on Sun SPARCstation 10/40
Exec. Time on the measured machine
SPECint95, SPECfp95
SPECint95 = geometric mean of SPEC ratios
Example: SPECint95 for
Pentium and Pentium Pro
10
1 Performance
improvement
9
8
1
SPECint
7
6
2 Clock rate x2
SPECint x 1.7 ?
5
4
3
2
2
1
0
50
100
150
Clock rate (MHz)
200
250
Pentium
Pentium Pro
Amdahl’s law in computing
CPU time
CPU clock cycles
for a program = for a program x
Clock rate
2 => CPU time 2
1
Clock rate
as in previous example
部分的改進
* Improvement of one aspect of a machine does not increase
performance by the same ratio
* Ex. The bottleneck in the memory system does not improve
Exec. time affected by improve.
Exec. time
Exec. time
=
+
Amount of improvement
after improve.
unaffected
Example: Amdahl’s law



A program takes 100s to run
20% multiplication, 50% memory op., 30%
others
What’s the speed up for
Multiply speed 4
Memory access 2
100
=1.18
20/4 + 50 + 30
100
=1.33
20 + 50/2 + 30
MIPS as a measurement (not
good…)

MIPS = Million Instructions Per Second
MIPS=
Instruction count
Execution time x 106
High MIPS => faster ?

Pitfalls:


MIPS cannot be used to compare computers with
different instruction sets => inst. count differs
MIPS varies between programs on the same
computer => no single MIPS for a machine
Example: MIPS ?

Example: 500 MHz machine
2 compilers for the same source program:
Inst. Count(x109) for each inst. class
B
C
A
Code 1
5
1
1
Code 2
10
1
1
Instruction class
A
B
C
CPI
1
2
3
Example: MIPS?
(5x1+1x2+1x3)x109 cycles
Exec. Time1 =
= 20 sec.
6
500x10 cycles/sec
(10x1+1x2+1x3)x109 cycles
Exec. Time2 =
= 30 sec.
500x106 cycles/sec
Exec. time1
<
Inst. count
MIPS1 = Exec timex106
Exec. time2
=
(5+1+1)x109
=350
6
20x10
9
(10+1+1)x10
MIPS2 =
=400
6
30x10
MIPS1
<
MIPS2