Transcript Document

Lecture 2:
Performance
Measurement
Performance Evaluation

The primary duty of software developers is to
create functionally correct programs

Performance evaluation is a part of software
development for well-performing programs
Performance Analysis Cycle
Code Development
Functionally complete
and correct program
Measure
Analyze
Modify / Tune
Complete, correct and
well-performing program
Usage

Have an optimization phase
just like testing and
debugging phase
Goals of Performance Analysis
The goal of performance analysis is to provide
quantitative information about the performance
of a computer system
Goals of Performance Analysis






Compare alternatives
•
When purchasing a new computer system, to provide quantitative information
Determine the impact of a feature
•
In designing a new system or upgrading, to provide before-and-after compa
System tuning
•
To find the best parameters that produce the best overall performance
Identify relative performance
•
To quantify the performance relative to previous generations
Performance debugging
•
To identify the performance problems and correct them
Set expectations
•
To determine the expected capabilities of the next generation
Performance Evaluation
Performance Evaluation steps:
1.
•
•
2.
Measurement / Prediction
What to measure? How to measure?
Modeling for prediction
•
•
Simulation
Analytical Modeling
Analysis & Reporting
•
Performance metrics
Performance Measurement
Interval Timers
•
Hardware Timers
•
Software Timers
Performance Measurement
Hardware Timers
Tc
Clock
Counter
n bits
to processor memory bus
•
Counter value is read from a memory location
•
Time is calculated as
Time = (x2 - x1) x Tc
Performance Measurement
Software Timers
•
Interrupt-based
Tc
Clock
Prescaling
Counter
T’c
to processor interrupt input
•
•
When interrupt occurs, interrupt-service routine
increments the timer value which is read by a program
Time is calculated as
Time = (x2 - x1) x T’c
Performance Measurement
Timer Rollover

Occurs when an n-bit counter undergoes a transition from
its maximum value 2n – 1 to zero
T’c

32-bit
64-bit
10 ns
42 s
5850 years
1 ms
1.2 hour
0.5 million years
1 ms
49 days
0.5 x 109 years
There is a trade-off between roll over time and accuracy
Timers
Solution:
1.
2.
Use 64-bit integer (over half a million year)
Timer returns two values:
• One represents seconds
• One represents microseconds since the last second
With 32-bit, the roll over is over 100 years
Performance Measurement
Interval Timers
T0  Read current time
Event being timed ();
T1  Read current time
Time for the event is: T1-T0
Performance Measurement
Timer Overhead
Initiate read_time
T1
Current time is read
Measured time:
Tm = T2 + T3 + T4
Event begins
Desired measurement:
Te = Tm – (T2 + T4)
= Tm – (T1 + T2) since T1 = T4
Event ends; Initiate read_time
Timer overhead:
Tovhd = T1 + T2
Current time is read
Te should be 100-1000 times
greater than Tovhd .
T2
T3
T4
Performance Measurement
Timer Resolution
Resolution is the smallest change that can be detected by an interval timer.
nT’c < Te < (n+1)T’c
If Tc is large relative to the event being measured, it may be
impossible to measure the duration of the event.
Performance Measurement
Measuring Short Intervals
Te < Tc
Tc
1
Te
Tc
0
Te
Performance Measurement
Measuring Short Intervals
Solution: Repeat measurements n times.
Average execution time: T’e = (m x Tc) / n
m: number of 1s measured
Average execution time: T’e = (Tt / n ) – h
Tt : total execution time of n repetitions
h: repetition overhead
T
c
Te
Tt
Performance Measurement

Time
•
•
Elapsed time / wall-clock time / response time
• Latency to complete a task, including disk access,
memory access, I/O, operating system overhead, and
everything (includes time consumed by other programs in a
time-sharing system)
CPU time
• The time CPU is computing, not including I/O time or
waiting time
•
User time / user CPU time
•
System time / system CPU time
• CPU time spent in the program
• CPU time spent in the operating system performing tasks
requested by the program
Performance Measurement

UNIX time command
90.7u 12.9s 2:39 65%
User time
Elapsed time
System time
Percentage of
elapsed time
Drawbacks:
• Resolution is in milliseconds
• Different sections of the code can not be timed
Timers
Timer is a function, subroutine or program that can be used
to return the amount of time spent in a section of code.
t0 = timer();
…
< code segment >
…
t1 = timer();
time = t1 – t0;
zero = 0.0;
t0 = timer(&zero);
…
< code segment >
…
t1 = timer(&t0);
time = t1;
Timers
Read Wadleigh, Crawford pg 130-136 for:
time, clock, gettimeofday, etc.
Timers
Measuring Timer Resolution
main() {
}
foo(n){
}
. . .
zero = 0.0;
t0 = timer(&zero);
t1 = 0.0;
j=0;
while (t1 == 0.0) {
j++;
zero=0.0;
t0 = timer(&zero);
foo(j);
t1 = timer(&t0);
}
printf (“It took %d iterations for a nonzero time\n”, j);
if (j==1) printf (“timer resolution <= %13.7f seconds\n”, t1);
else
printf (“timer resolution is %13.7f seconds\n”, t1);
. . .
i=0;
for (j=0; j<n; j++)
i++;
return(i);
Timers
Measuring Timer Resolution
Using clock():
It took 682 iterations for a nonzero time
timer resolution is 0.0200000 seconds
Using times():
It took 720 iterations for a nonzero time
timer resolution is 0.0200000 seconds
Using getrusage():
It took 7374 iterations for a nonzero time
timer resolution is 0.0002700 seconds
Timers
Spin Loops
For codes that take less time to run than the resolution of the timer
First call to a function may require an inordinate amount of time.
Therefore the minimum of all times may be desired.


main() {
}
foo(n){
}
. . .
zero = 0.0;
t2 = 100000.0;
for (j=0; j<n; j++) {
t0 = timer(&zero);
foo(j);
t1 = timer(&t0);
t2 = min(t2, t1);
}
t2 = t2 / n;
printf (“Minimum time is %13.7f seconds\n”, t2);
. . .
< code segment >
Profilers

A profiler automatically insert timing calls into applications to
generate calls into applications

It is used to identify the portions of the program that consumes
the largest fraction of the total execution time.

It may also be used to find system-level bottlenecks in a
multitasking system.

Profilers may alter the timing of a program’s execution
Profilers

Data collection techniques
•
•
Sampling-based
•
•
•
Event-based
•
•
•

This type of profilers use a predefined clock; every multiple of this clock
tick the program is interrupted and the state information is recorded.
They give the statistical profile of the program behavior.
They may miss some important events.
Events are defined (e.g. entry into a subroutine) and data about these
events are collected.
The collected information shows the exact execution frequencies.
It has substantial amount of run-time overhead and memory requirement.
Information kept
•
•
Trace-based: The compiler keeps all information it collects.
Reductionist: Only statistical information is collected.
Performance Evaluation
Performance Evaluation steps:
1.
•
•
2.
Measurement / Prediction
What to measure? How to measure?
Modeling for prediction
•
•
Simulation
Analytical Modeling
•
Queuing Theory
Analysis & Reporting
•
Performance metrics
Predicting Performance

Performance of simple kernels can be predicted to a high
degree

Theoretical performance and peak performance must be close

It is preferred that the measured performance is over 80% of
the theoretical peak performance
Performance Evaluation
Performance Evaluation steps:
1.
•
•
2.
Measurement / Prediction
What to measure? How to measure?
Modeling for prediction
•
•
Simulation
Analytical Modeling
•
Queuing Theory
Analysis & Reporting
•
Performance metrics
Performance Metrics
Measurable characteristics of a computer system:
• Count of an event
• Duration of a time interval
•
Size of a parameter
Rate:
•
Operations executed per second
Performance Mertrics
Clock Speed
Clock speed/frequency (f): the rate of clock pulses (ex: 1GHz)
Cycle time (Tc): time between two clock pulses (Tc = 1/f)
Tc
Performance Mertrics
Instruction Execution Rate
CPI =

n
i 1
( CPI i  I i )

n
i 1
CPIi: number of cycles required for instruction i
Ii: number of executed instructions of type i
Ii
Cycles per Instruction (CPI):

is an average

depends on the design of micro-architecture
(hardwired/microprogrammed, pipelined)
Number of instructions:

is the number of instructions executed at runtime

Depends on
•
•
instruction set architecture (ISA)
compiler
Performance Metrics
CPU Performance
CPU time of a program (T) = instructions x cycles x time
program
instruction cycle
CPI
(cycles per instruction)
T = instruction count x CPI x
1
f
Performance Metrics
CPU Performance
Drawbacks:

In modern computers, no program runs without some operating
system running on the hardware

Comparing performance between machines with different operating
systems will be unfair
Performance Metrics
Execution time
•
•
Elapsed time / wall-clock time / response time
• Latency to complete a task, including disk access,
memory access, I/O, operating system overhead, and
everything (includes time consumed by other programs in a
time-sharing system)
CPU time
• The time CPU is computing, not including I/O time or
waiting time
•
User time / user CPU time
•
System time / system CPU time
• CPU time spent in the program
• CPU time spent in the operating system performing tasks
requested by the program
Performance Metrics
Performance Comparison
Performancex =
1
.
Execution timeX
Performance Ratio = PerformanceX = Execution timeY
PerformanceY
Execution timeX
Relative performance
Performance Metrics
Relative Performance
•
•
If workload consists of more than one program, total
execution time may be used.
If there are more than one machine to be compared, one
of them must be selected as a reference.
Performance Metrics
Throughput
•
•
•
Total amount of work done in a given time
Measured in tasks per time unit
Can be used for
•
•
•
Operating system performance
Pipeline performance
Multiprocessor performance
Performance Metrics

MIPS (Million instructions per second)
MIPS =
•
•
•
Instruction count
= Clock rate
6
Execution time x 10
CPI x 106
Includes both integer and floating point performance
Number of instructions in a program varies between different
computers
Number of instructions varies between different programs on
the same computer
Performance Metrics

MFLOPS
(Million floating point operations per second)
•
•
Give performance of only floating-point operations
Different mixes of integer and floating-point operations
may have different execution times:
•
•
Integer and floating-point units work independently
Instruction and data caches provide instruction and data
concurrently
Performance Metrics

Utilization
Utilization =

Busy time .
Total time
Speciality ratio
Speciality ratio =
•
Maximum performance .
Minimum performance
1  general purpose
Performance Metrics

Asymptotic and Half performance
•
•
r – asymptotic performance
n1/2 – half performance
T = r (n + n1/2)
r = 1/t
n1/2 = t0/t
Slope = r-1
2t0
t0
-n1/2
n1/2
Performance Metrics
Speedup
Performancex =
1
= 1
Execution timeX
TX
Speedup2,1 = Performance2 = T1
Performance1
T2
•
•
Express how much faster is system 2 than system 1
Calculated directly from execution time
Performance Metrics
Relative Change
Performancex =
1
= 1
Execution timeX
TX
Relative change2,1 = Performance2 - Performance1 = T1 - T2
Performance1
T2
•
= Speedup2,1 - 1
It expresses the performance of system 2 relative to system 1
Performance Metrics
Statistical Analysis
•
Used to compare performance
•
Workload consists of many programs
•
Depends on the nature of the data as well as
distribution of the test results
Performance Metrics
Indices of Central Tendency
Used to summarize multiple measurements
•
Mean
•
Median
•
Mode
Performance Metrics
• Mean (average)
Arithmetic mean =
Sx
i
,1≤i≤n
n
Gives equal weight to all measurements
Measurement
Execution time
X1
10
X2
20
X3
15
X4
18
X5
16
Mean
15.8
Performance Metrics
• Median
1.
2.
Order all n measurements
The middle value is the median. If n is even, median is the mean of the middle 2 values
Using Median instead of Mean reduces the skewing effect of the outliers.
Measurement
Execution time
Measurement
Execution time
X1
10
X1
10
X2
20
X3
15
X3
15
X5
16
X4
18
X4
18
X5
16
X2
20
X6
200
X6
200
Mean
46.5
Median =
X4 X5
2
= 17
Performance Metrics
• Mode
Mode is the value that occurs most frequently
•
•
Measurement
Execution time
X1
10
X2
20
X3
36
X4
20
X5
20
X6
20
Mode = 20
If all values occur once, there is no mode
If there are several samples that all have the same value, there would
be several modes
Mean, Median, Mode
Mean
 Incorporates information from the entire measured values
 Sensitive to outliers
Median and Mode
 Less sensitive to outliers
 Do not effectively use all information
ex
Performance Metrics
Arithmetic mean (average)
Arithmetic mean =
Sx
i
,1≤i≤n
n
•
May be misleading if the data are skewed or scattered
MA
MB
MC
Prog1
50
100
500
Prog2
400
800
800
Prog3
5550
5100
4700
Average
2000
2000
2000
Performance Metrics
Weighted average
•
weight is the frequency of each program in daily processing
Weighted average =
1≤i≤n
weight
MA
MB
MC
Prog1
60%
50
100
500
Prog2
30%
400
800
800
Prog3
10%
5550
5100
4700
705
810
1010
Average
•
∑ w i . xi
Results may change with a different set of execution frequencies
Performance Metrics
Geometric mean
•
Results are stated in relation to the performance of a reference
machine
Geometric mean =
(  xi )1/n , 1 ≤ i ≤ n
MA
Normalized
to MA
MB
(reference)
Normalized
to MB
MC
Normalized
to MC
Prog1
50
2
100
1
500
0.2
Prog2
400
2
800
1
800
1
Prog3
5550
0.92
5100
1
4700
1.085
Average
•
1.54
1
0.60
Results are consistent no matter which system is chosen as
reference
Performance Metrics
Harmonic mean
Harmonic mean =
•
•
n
,1≤i≤n
∑ 1/xi
Used to compare performance results that are expressed as a
rate (e.g. operations per second, throughput, etc.)
Slowest rates have the greatest influence on the result
It identifies areas where performance can be improved
Performance Metrics
Characteristics of a good performance metric
•
•
If the time values are averaged, then the resulting mean
value must be directly proportional to the total time.
If the rate values are averaged, then the resulting mean
value must be inversely proportional to the total time.
Performance Metrics
Ex
•
•
•
•
n benchmark programs
Ti is the execution time of program i
F floating-point operations in each program
Mi = F / Ti is the execution rate of program i (MFLOP/s)
Arithmetic mean
TA =
n
1
T

n
i
TA is directly proportional to the total execution time
i1
MA =
•
1
n
n

i1
Mi =
F
n

n
i1
1
Ti
MA is inversely proportional to
the total execution time
Inappropriate for summarizing rates
Performance Metrics
Harmonic mean
TH =
n
TH is not directly proportional to the total execution time
n
 1/T
i
i 1
MH =
n
n
 1/M
i1
•
•
=
i
Fn
MH is inversely proportional to
the total execution time
n
T
i
i1
Inappropriate for summarizing execution times
Appropriate for summarizing rates
Performance Metrics
Ex
Performance Metrics
Geometric mean
n
TG =
n
T
i
TG is not directly proportional to the total execution time
i1
n
MH = n

i1
•
•
n
Mi = n
 F/T
i1
i
MH is not inversely proportional
to the total execution time
Inappropriate for summarizing execution times
Inappropriate for summarizing rates
Performance Metrics
Geometric mean
M1
M2
M3
Prog1
417
244
134
Prog2
83
70
70
Prog3
66
153
135
Prog4
39499
33527
66000
Prog5
772
368
369
Geometric mean (Normalized wrt S1)
1.0
0.86
0.84
Geometric mean (Normalized wrt S2)
1.17
1.0
0.99
Rank
3
Total time
Arithmetic mean
Rank
•
2
2
1
40787
34362
66798
8157
6872
13342
1
3
Produces a consistent ordering of the systems but it is the wrong ordering
Performance Metrics
Histogram
•
•
Used to display the distribution of a set of measured
values (variability)
First find the minimum and maximum values.
Then divide the range into b subranges, called cells.
Histogram
Message size
(kbytes)
Network A
Network B
0 < xi ≤ 5
11
39
5 < xi ≤ 10
27
25
10 < xi ≤ 15
41
18
15 < xi ≤ 20
32
5
20 < xi ≤ 25
21
19
25 < xi ≤ 30
12
42
30 < xi ≤ 35
4
0
Performance Metrics
Index of Dispersion
Index of dispersion is used to compare the spread of
measurements around the mean value
•
Range is the simplest metric for an index of dispersion
R max  max
•
i
( x i )  min
i
( xi )
Range is sensitive to a few extreme values
Performance Metrics
Index of Dispersion
•
Maximum of the absolute values of the difference of
each measurement from the mean
 max  max
•
i
(| x i  x |)
It is also sensitive to extreme values
Performance Metrics
Index of Dispersion
•
Sample variance is the simplest metric for an index of
dispersion
s
s
2
2



n
i 1
( xi  x )
2
Requires 2 passes through the data
to calculate first x and then s2
n 1
n
n
i 1
xi  (
2
n ( n  1)
n
i 1
2
x)
Requires 1 pass
Performance Metrics
Index of Dispersion
•
Standard deviation
s 
•
s
2


n
i 1
( xi  x )
2
n 1
Coefficient of variance (COV): normalizes standard
deviation wrt the mean
COV
 s/x
Performance Evaluation
Methods




Benchmarking
Monitoring
Analytical Modeling
Queuing Theory
Benchmarking
Benchmark is a program that is run on a computer to measure its
performance and compare it with other machines

Best benchmark is the users’ workload – the mixture of
programs and operating system commands that users run on a
machine.
 Not practical

Standard benchmarks
Benchmarking
Types of Benchmarks

Synthetic benchmarks

Toy benchmarks

Microbenchmarks

Program Kernels

Real Applications
Benchmarking
Synthetic benchmarks
Artificially created benchmark programs that represent
the average frequency of operations (instruction mix)
of a large set of programs
•
•
•
Whetstone benchmark
Dhrystone benchmark
Rhealstone benchmark
Benchmarking
•
Synthetic benchmarks
Whetstone benchmark
• First written in Algol60 in 1972, today Fortran, C/C++,
•
•
•
•
Java versions are available
Represents the workload of numerical applications
Measures floating point arithmetic performance
Unit is Millions of Whetstone instructions per second
(MWIPS)
Shortcommings:
•
•
Does not represent constructs in modern languages, such
as pointers, etc.
Does not consider cache effects
Benchmarking
•
Synthetic benchmarks
Dhrystone benchmark
•
•
•
•
•
•
First written in Ada in1984, today
Represents the workload of C version is available
Statistics are collected on system software, such as operating
system, compilers, editors and a few numerical programs
Measures integer and string performance, no floating-point
operations
Unit is the number of program iteration completions per
second
Shortcommings:
•
•
•
Does not represent real life programs
Compiler optimization overstates system performance
Small code that may fit in the instruction cache
Benchmarking
•
Synthetic benchmarks
Rhealstone benchmark
•
•
•
Multi-tasking real-time systems
Factors are:
•
•
•
•
•
•
Task switching time
Pre-emption time
Interrupt latency time
Semaphore shuffling time
Dead-lock breaking time
Datagram throughput time
Metric is Rhealstones per second
6
∑ wi . (1/ ti)
i=1
Benchmarking
Toy benchmarks
10-100 lines of code that the result is known before running the toy
program
• Quick sort
• Sieve of Eratosthenes
Finds prime numbers
http://upload.wikimedia.org/wikipedia/commons/8/8c/New_Animation_Sieve_of_Eratosthenes.gif
func sieve( var N )
var PrimeArray as array of size N
initialize PrimeArray to all true
for i from 2 to N
for each j from i + 1 to N, where i divides j
set PrimeArray( j ) = false
Benchmarking
Microbenchmarks
Small, specially designed programs used to test some specific
function of a system (eg. Floating-point execution, I/O subsystem,
processor-memory interface, etc.)
•
•
Provide values for important parameters of a system
Characterize the maximum performance if the overall
performance is limited by that single component
Benchmarking
Kernels
Key pieces of codes from real applications.
•
LINPACK and BLAS
•
Livermore Loops
•
NAS
Benchmarking
•
Kernels
LINPACK and BLAS Libraries
•
•
LINPACK – linear algebra package
•
•
•
•
•
Measures floating-point computing power
Solves system of linear equations Ax=b with Gaussian
elimination
Metric is MFLOP/s
DAXPY - most time consuming routine
Used as the measure for TOP500 list
BLAS – Basic linear algebra subprograms
•
LINPACK makes use of BLAS library
Benchmarking
•
Kernels
LINPACK and BLAS Libraries
•
SAXPY – Scalar Alpha X Plus Y
•
•
•
Y = a X + Y, where X and Y are vectors, a is a scalar
SAXPY for single and DAXPY for double precision
Generic implementation:
for (int i = m; i < n; i++) {
y[i] = a * x[i] + y[i];
}
Benchmarking
•
Kernels
Livermore Loops
•
•
•
Developed at LLNL
Originally in Fortran, now also in C
24 numerical application kernels, such as:
• hydrodynamics fragment,
• incomplete Cholesky conjugate gradient,
• inner product,
• banded linear systems solution, tridiagonal linear systems solution,
• general linear recurrence equations,
• first sum, first difference,
• 2-D particle in a cell, 1-D particle in a cell,
• Monte Carlo search,
• location of a first array minimum, etc.
• Metrics are arithmetic, geometric and harmonic mean of
CPU rate
Benchmarking
•
Kernels
NAS Parallel Benchmarks
•
•
•
Developed at NASA Advanced Supercomputing division
Paper-and-pencil benchmarks
11 benchmarks, such as:
• Discrete Poisson equation,
• Conjugate gradient
• Fast Fourier Transform
• Bucket sort
• Embarrassingly parallel
• Nonlinear PDE solution
• Data traffic, etc.
Benchmarking
Real Applications
Programs that are run by many users
• C compiler
• Text processing software
• Frequently used user applications
• Modified scripts used to measure particular aspects of
system performance, such as interactive behavior, multiuser
behavior
Benchmarking
Benchmark Suites



Desktop Benchmarks
•
SPEC benchmark suite
Server Benchmarks
•
•
SPEC benchmark suite
TPC
Embedded Benchmarks
•
EEMBC
Benchmarking
SPEC Benchmark Suite

Desktop Benchmarks
•
•

CPU-intensive
•
SPEC CPU2000
•
•
11 integer (CINT2000) and 14 floating-point (CFP2000) benchmarks
Real application programs:
• C compiler
• Finite element modeling
• Fluid dynamics, etc.
Graphics intensive
•
•
SPECviewperf
•
Measures rendering performance using OpenGL
SPECapc
•
•
•
Pro/Engineer – 3D rendering with solid models
Solid/Works – 3D CAD/CAM design tool, CPU-intensive and I/O intensive tests
Unigraphics – solid modeling for an aircraft design
Server Benchmarks
•
•
SPECWeb – for web servers
SPECSFS – for NFS performance, throughput-oriented
Benchmarking
TPC Benchmark Suite



Server Benchmark
Transaction processing (TP) benchmarks
Real applications
•
•
•
•


TPC-C: simulates a complex query environment
TPC-H: ad hoc decision support
TPC-R: business decision support system where users run a
standard set of queries
TPC-W: business-oriented transactional web server
Measures performance in transactions per second. Throughput
performance is measured only when response time limit is met.
Allows cost-performance comparisons
Benchmarking
EEMBC Benchmarks

for embedded computing systems

34 benchmarks from 5 different application classes:
•
•
•
•
•
Automotive/industrial
Consumer
Networking
Office automation
Telecommunications
Benchmarking
Benchmarking Strategies

Fixed-computation benchmarks

Fixed-time benchmarks

Variable-computation and variable-time benchmarks
Benchmarking
Benchmarking Strategies

Fixed-computation benchmarks

Fixed-time benchmarks

Variable-computation and variable-time benchmarks
Benchmarking
Fixed-Computation benchmarks
W: fixed workload (number of instructions,
number of floating-point operations, etc)
T: measured execution time
R: speed
R 
W
T
Compare Speedup

R1
R2

W / T1
W / T2

T2
T1
Benchmarking
Fixed-Computation benchmarks
Amdahl’s Law
Benchmarking
Fixed-Time benchmarks
On a faster system, a larger workload can be processed in
the same amount of time
T: fixed execution time
W: workload
W
R: speed R 
T
Compare
Sizeup 
R1
R2

W1 / T
W2 / T

W1
W2
Benchmarking
Fixed-Time benchmarks
Scaled Speedup
Benchmarking
Variable-Computation and Variable-Time
benchmarks
In this type of benchmark, quality of the solution is
improved.
Q: quality of the solution
T: execution time
Quality improvements per second:
Q
T
Quality of Measurement
Characteristics of a measurement tool (timer)

Accuracy: Absolute difference of a measured value and
the corresponding standard reference value (such as the
duration of a second).

Precision: Reliability of the measurements made with the
tool. Highly precise measurements are tightly clustered
around a single value.

Resolution: Smallest incremental change that can be
detected. Ex: interval between clock ticks
Quality of Measurement
accuracy
precision
mean value
true value
Quality of Measurement
The uncertainties in the measurements are called
errors or noise
Sources of errors:
 Accuracy, precision, resolution of the measurement tool
 Time required to read and store the current time value
 Time-sharing among multiple programs
 Processing of interrupts
 Cache misses, page faults
Quality of Measurement
Types of errors:

Systematic errors
•
•
Are the result of some experimental mistake
Usually constant across all measurements
Ex: temperature may effect clock period

Random errors
•
•
Unpredictable, nondeterministic
Effect the precision of measurement
Ex: timer resolution ±T , effects measurements with equal probability
Quality of Measurement
Experimental measurements follow Gaussian (normal) distribution
Ex:
x measured value
±E random error
Two sources of errors, each having 50% probability
Error 1
Error 2
Measured value
Probability
Pg 48 -E
-E
x-2E
1/4
-E
+E
x
1/4
+E
-E
x
1/4
+E
+E
x+2E
1/4
Actual value of x is measured half of the time.
Confidence Intervals
Used to find a range of values that has a given probability of
including the actual value.
Case 1: number of measurements is large (n≥30)
{x1, x2, … xn} - Samples
Gaussian distribution
m – mean
s – standard deviation
Confidence interval: [ c1, c2 ]
Confidence level: (1-a)×100
Pr[ c1 ≤ x ≤ c2 ] = 1-a
Pr[ x < c1 ] = Pr[ x > c2] = a/2
Confidence Intervals
Case 1: number of measurements is large (n≥30)
Confidence interval: [ c1, c2 ]
c1  x  z 1  a / 2
c 2  x  z1 a / 2
s
x
- Sample mean
s
- Standard deviation
n
s
n
z1 a / 2
is obtained from the
precomputed table
Confidence Intervals
Case 2: number of measurements is small (n<30)
Sample variances s2 can vary significantly.
t distribution:
c1  x  t 1  a / 2 ; n  1
c 2  x  t1  a / 2 ; n  1
x
- Sample mean
s
- Standard deviation
s
n
s
t1  a / 2 ; n 1 is obtained from the
precomputed table
n
Confidence Intervals
Ex: number of measurements is large (n<30)
Pg 51
90% confidence interval means that there is a 90% chance
that the actual mean is within that interval.
Confidence Intervals
90%
95%
99%
c1= 6.5 c2= 9.4
c1= 6.1 c2= 9.7
c1= 5.3 c2=10.6
Wider interval  Less precise knowledge about the mean
Confidence Intervals
Determining the Number of measurements Needed
( c1 , c 2 )  (( 1  e ) x , (1  e ) x )
c1  (1  e ) x  x  z 1 a / 2
 z1 a / 2 s 
n 

 ex

2
s
n
Confidence Intervals
Determining the Number of measurements Needed
Estimating s:
1. Make small number of measurements.
2. Estimate standard deviation s.
3. Calculate n.
4. Make n measurements.
 z1 a / 2 s 
n 

 ex

2
Confidence Intervals
Ex:
Pg 53
Confidence Intervals
Confidence Intervals for Proportions
When we are interested in the number of times events occur.
Bimonial distribution:
If np≥10 it approximates Gaussian distribution with mean p and
variance p(1-p)/n
c1  p  z 1  a / 2
c 2  p  z1 a / 2
p (1  p )
n
n
m
p (1  p )
n
p  m/
- Total events recorded
- Number of times desired
outcome occurs
n is the sample proportion
Confidence Intervals
Confidence Intervals for Proportions
Determining the number of measurements needed:
(1  e ) p  p  z 1  a / 2
( z 1  a / 2 ) p (1  p )
2
n 
p (1  p )
( pe)
2
n
Confidence Intervals
Ex:
Pg 55
Comparing Alternatives
Three different cases:

Before-and-after comparison

Comparison of non-corresponding (impaired)
measurements

Comparisons involving proportions
Comparing Alternatives
Before-and-after comparison
Used to determine whether some change made to a system
has statistically significant impact on its performance.
1. Find a confidence interval for the mean of the differences
of the paired observations
2. If this interval includes 0, then measured differences are
not statistically significant.
Comparing Alternatives
Before-and-after comparison
Before measurements: b1, … bn
After measurements: a1, … an
Differences:
d1= a1, - b1
d2= a2, - b2 …
c1  d  z 1  a / 2
c 2  d  z1 a / 2
sd
n
sd
n
d
- Arithmetic mean
sd
- Standard deviation
n ≥ 30
Comparing Alternatives
Before-and-after comparison
Ex:
pg 65
Comparing Alternatives
Non-corresponding Measurements
There is no direct corresponding between pairs of measurements.
1. First system: n1 measurements, find x1 and s1
2. Second system: n2 measurements, find x2 and s2
3. Calculate the difference of means and standard deviation of
the difference of means
x  x1  x 2
sx 
s1
2
n1

s2
2
n2
4. If confidence interval includes 0, then no significant difference
Comparing Alternatives
Non-corresponding Measurements
c1  x  z1 a / 2 s x
c 2  x  z1 a / 2 s x
n1 ≥ 30 and n2 ≥ 30
Comparing Alternatives
Non-corresponding Measurements
c1  x  t1  a / 2 ; n d f s x
n1 < 30 or n2 < 30
c 2  x  t1  a / 2 ; n d f s x
 s1
s2


 n
n2
 1
2
n df 
s
2
1

2
n1
( n 1  1)
2

s





2
2
2

2
n2
( n 2  1)
Comparing Alternatives
Non-corresponding Measurements
Ex: pg 67
Comparing Alternatives
Comparing Proportions
m1 is the number of times the event occurs in system 1 out of a
total of n1 events measured.
p 1  m 1 / n1
p2  m 2 / n2
If m1>10 and m2>10 the it approximates normal distribution with
means p 1 and p 2
variance p 1 (1  p 1 ) / n 1 and p 2 (1  p 2 ) / n 2
Comparing Alternatives
Comparing Proportions
Confidence intervals
c1  p  z 1  a / 2 s p
c 2  p  z1 a / 2 s p
where
p  p1  p 2
Standard deviation
sp 
p 1 (1  p 1 )
n1

p 2 (1  p 2 )
n2
Comparing Alternatives
Comparing more than Two Alternatives
Comparing Alternatives
Comparing more than Two Alternatives
Comparing Alternatives
Comparing more than Two Alternatives
Comparing Alternatives
Comparing more than Two Alternatives
Timers
Roll Over




Suppose a timer returns 32-bit integer data and measures
microseconds.
It rolls over after 232 microseconds (= 1.2 hours)
Timers that measure milliseconds and use 32-bit data roll
over after 232 milliseconds (= 49 days)
There is a trade-off between roll over time and accuracy.
Performance Evaluation
Performance Evaluation steps:
1.
•
•
2.
Measurement / Prediction
What to measure? How to measure?
Modeling for prediction
Analysis & Reporting
•
Performance metrics