Transcript Document

What Do All of These Means
Mean?

Indices of central tendency




Other means




Sample mean
Median
Mode
Arithmetic
Harmonic
Geometric
Quantifying variability
Copyright 2004 David J. Lilja
1
Why mean values?

Desire to reduce performance to a single
number

Makes comparisons easy



Mine Apple is faster than your Cray!
People like a measure of “typical” performance
Leads to all sorts of crazy ways for
summarizing data


X = f(10 parts A, 25 parts B, 13 parts C, …)
X then represents “typical” performance?!
Copyright 2004 David J. Lilja
2
The Problem

Performance is multidimensional





CPU time
I/O time
Network time
Interactions of various components
Etc, etc
Copyright 2004 David J. Lilja
3
The Problem

Systems are often specialized



Performs great on application type X
Performs lousy on anything else
Potentially a wide range of execution times
on one system using different benchmark
programs
Copyright 2004 David J. Lilja
4
The Problem


Nevertheless, people still want a single
number answer!
How to (correctly) summarize a wide range of
measurements with a single value?
Copyright 2004 David J. Lilja
5
Index of Central Tendency



Tries to capture “center” of a distribution of
values
Use this “center” to summarize overall
behavior
Not recommended for real information, but …

You will be pressured to provide mean values


Understand how to choose the best type for the
circumstance
Be able to detect bad results from others
Copyright 2004 David J. Lilja
6
Indices of Central Tendency

Sample mean


Sample median


Common “average”
½ of the values are above, ½ below
Mode

Most common
Copyright 2004 David J. Lilja
7
Indices of Central Tendency

“Sample” implies that



Values are measured from a random process on
discrete random variable X
Value computed is only an approximation of
true mean value of underlying process
True mean value cannot actually be known

Would require infinite number of measurements
Copyright 2004 David J. Lilja
8
Sample mean

Expected value of X = E[X]



“First moment” of X
xi = values measured
pi = Pr(X = xi) = Pr(we measure xi)
n
E[ X ]   xi pi
i 1
Copyright 2004 David J. Lilja
9
Sample mean

Without additional information, assume



pi = constant = 1/n
n = number of measurements
Arithmetic mean

Common “average”
1 n
x   xi
n i 1
Copyright 2004 David J. Lilja
10
Potential Problem with Means



Sample mean gives equal weight to all
measurements
Outliers can have a large influence on the
computed mean value
Distorts our intuition about the central
tendency of the measured values
Copyright 2004 David J. Lilja
11
Potential Problem with Means
Mean
Mean
Copyright 2004 David J. Lilja
12
Median

Index of central tendency with



Sort n measurements
If n is odd



½ of the values larger, ½ smaller
Median = middle value
Else, median = mean of two middle values
Reduces skewing effect of outliers on the
value of the index
Copyright 2004 David J. Lilja
13
Example

Measured values: 10, 20, 15, 18, 16



Obtain one more measurement: 200



Mean = 15.8
Median = 16
Mean = 46.5
Median = ½ (16 + 18) = 17
Median give more intuitive sense of central
tendency
Copyright 2004 David J. Lilja
14
Potential Problem with Means
Median
Mean
Mean
Median
Copyright 2004 David J. Lilja
15
Mode



Value that occurs most often
May not exist
May not be unique

E.g. “bi-modal” distribution

Two values occur with same frequency
Copyright 2004 David J. Lilja
16
Mean, Median, or Mode?

Mean



Median



If the sum of all values is meaningful
Incorporates all available information
Intuitive sense of central tendency with outliers
What is “typical” of a set of values?
Mode

When data can be grouped into distinct types,
categories (categorical data)
Copyright 2004 David J. Lilja
17
Mean, Median, or Mode?







Size of messages sent on a network
Number of cache hits
Execution time
MFLOPS, MIPS
Bandwidth
Speedup
Cost
Copyright 2004 David J. Lilja
18
Yet Even More Means!




Arithmetic
Harmonic?
Geometric?
Which one should be used when?
Copyright 2004 David J. Lilja
19
Arithmetic mean
n
1
x A   xi
n i 1
Copyright 2004 David J. Lilja
20
Harmonic mean
xH 
n
1
i 1 x
i
n
Copyright 2004 David J. Lilja
21
Geometric mean
xG  n x1 x2  xi  xn
1/ n


   xi 
 i 1 
n
Copyright 2004 David J. Lilja
22
Which mean to use?

Mean value must still conform to
characteristics of a good performance metric







Linear
Reliable
Repeatable
Easy to use
Consistent
Independent
Best measure of performance still is
execution time
Copyright 2004 David J. Lilja
23
What makes a good mean?

Time–based mean (e.g. seconds)



Rate–based mean (e.g. operations/sec)



Should be directly proportional to total weighted
time
If time doubles, mean value should double
Should be inversely proportional to total weighted
time
If time doubles, mean value should reduce by
half
Which means satisfy these criteria?
Copyright 2004 David J. Lilja
24
Assumptions

Measured execution times of n benchmark
programs


Total work performed by each benchmark is
constant



Ti, i = 1, 2, …, n
F = # operations performed
Relax this assumption later
Execution rate = Mi = F / Ti
Copyright 2004 David J. Lilja
25
Arithmetic mean for times
Produces a mean value
that is directly
proportional to total
time
→ Correct mean to
summarize execution
time

Copyright 2004 David J. Lilja
1 n
TA   Ti
n i 1
26
Arithmetic mean for rates


Produces a mean value that
is proportional to sum of
inverse of times
But we want inversely
proportional to sum of times
1 n
M A   Mi
n i 1
n
F / Ti

n
i 1
F

n
Copyright 2004 David J. Lilja
n
1

i 1 Ti
27
Arithmetic mean for rates
Produces a mean value that
is proportional to sum of
inverse of times
 But we want inversely
proportional to sum of times
→ Arithmetic mean is not
appropriate for summarizing
rates

Copyright 2004 David J. Lilja
1 n
M A   Mi
n i 1
n
F / Ti

n
i 1
F

n
n
1

i 1 Ti
28
Harmonic mean for times

Not directly
proportional to sum of
times
Copyright 2004 David J. Lilja
TH 
n
1
i1 T
i
n
29
Harmonic mean for times
Not directly proportional
to sum of times
→ Harmonic mean is not
appropriate for
summarizing times

Copyright 2004 David J. Lilja
TH 
n
1
i1 T
i
n
30
Harmonic mean for rates

Produces
(total number of ops)
÷ (sum execution times)
Inversely proportional
to total execution time
→ Harmonic mean is
appropriate to
summarize rates

Copyright 2004 David J. Lilja
MH 

n
1
i 1 M
i
n
n
Ti
i 1 F
Fn
n


n
T
i 1 i
31
Harmonic mean for rates
109
FLOPs
321
130
Sec
MFLOPS
5
1
1
1
1 
 1






 405 367 405 419 388
 396
405 M H 
436
160
367
284
115
405
601
252
419
482
187
388
844 109
MH 
 396
2124
Copyright 2004 David J. Lilja
32
Geometric mean




Correct mean for averaging normalized
values, right?
Used to compute SPECmark
Good when averaging measurements with
wide range of values, right?
Maintains consistent relationships when
comparing normalized values

Independent of basis used to normalize
Copyright 2004 David J. Lilja
33
Geometric mean with times
System 1
Geo mean
Rank
System 2
System 3
417
244
134
83
70
70
66
153
135
39,449
33,527
66,000
772
368
369
587
503
499
3
2
1
Copyright 2004 David J. Lilja
34
Geometric mean normalized to
System 1
System 1
Geo mean
Rank
System 2
System 3
1.0
0.59
0.32
1.0
0.84
0.85
1.0
2.32
2.05
1.0
0.85
1.67
1.0
0.48
0.45
1.0
0.86
0.84
3
2
1
Copyright 2004 David J. Lilja
35
Geometric mean normalized to
System 2
System 1
Geo mean
Rank
System 2
System 3
1.71
1.0
0.55
1.19
1.0
1.0
0.43
1.0
0.88
1.18
1.0
1.97
2.10
1.0
1.0
1.17
1.0
0.99
3
2
1
Copyright 2004 David J. Lilja
36
Total execution times
Total
Arith mean
Rank
System 1
417
83
66
39,449
772
40,787
8157
2
System 2
244
70
153
33,527
368
34,362
6872
1
Copyright 2004 David J. Lilja
System 3
134
70
135
66,000
369
66,798
13,342
3
37
What’s going on here?!
System 1
Geo mean wrt 1
Rank
Geo mean wrt 2
Rank
Arith mean
Rank
System 2
System 3
1.0
0.86
0.84
3
2
1
1.17
1.0
0.99
3
2
1
8157
6872
13,342
2
1
3
Copyright 2004 David J. Lilja
38
Geometric mean for times

Not directly proportional
to sum of times
1/ n


TG    Ti 
 i 1 
n
Copyright 2004 David J. Lilja
39
Geometric mean for times
Not directly proportional
to sum of times
→ Geometric mean is not
appropriate for
summarizing times

Copyright 2004 David J. Lilja
1/ n


TG    Ti 
 i 1 
n
40
Geometric mean for rates

Not inversely
proportional to sum of
times
1/ n


TG    M i 
 i 1

n
1/ n
 n F
   
 i 1 Ti 
Copyright 2004 David J. Lilja
41
Geometric mean for rates
Not inversely
proportional to sum of
times
→ Geometric mean is not
appropriate for
summarizing rates

Copyright 2004 David J. Lilja
1/ n


TG    M i 
 i 1

n
1/ n
 n F
   
 i 1 Ti 
42
Summary of Means

Avoid means if possible


Arithmetic



When sum of raw values has physical meaning
Use for summarizing times (not rates)
Harmonic


Loses information
Use for summarizing rates (not times)
Geometric mean

Not useful when time is best measure of perf
Copyright 2004 David J. Lilja
43
Geometric mean

Does provide consistent rankings



Independent of basis for normalization
But can be consistently wrong!
Value can be computed

But has no physical meaning
Copyright 2004 David J. Lilja
44
Normalization

Averaging normalized values doesn’t make
sense mathematically



Gives a number
But the number has no physical meaning
First compute the mean

Then normalize
Copyright 2004 David J. Lilja
45
Weighted means
n
w
i 1
i
1

n
x A   wi xi
i 1
xH 
1

Standard definition of
mean assumes all
measurements are
equally important
Instead, choose
weights to represent
relative importance of
measurement i
wi
i 1 x
i
n
Copyright 2004 David J. Lilja
46
Quantifying variability




Means hide information about variability
How “spread out” are the values?
How much spread relative to the mean?
What is the shape of the distribution of
values?
Copyright 2004 David J. Lilja
47
Histograms
40
35
40
35
30
25



30
25
20
20
15
15
10
10
5
5
0
0
Similar mean values
Widely different distributions
How to capture this variability in one number?
Copyright 2004 David J. Lilja
48
Index of Dispersion


Quantifies how “spread out” measurements
are
Range


Maximum distance from the mean


(max value) – (min value)
Max of | xi – mean |
Neither efficiently incorporates all available
information
Copyright 2004 David J. Lilja
49
Sample Variance

i1 xi  x 
n
s2 
n 1
n x 
n


2
2
i 1 i
 x 
n(n  1)
2
n
i 1 i
Second moment of
random variable X
Second form good for
calculating “on-the-fly”


One pass through data
(n-1) degrees of
freedom
Copyright 2004 David J. Lilja
50
Sample Variance



Gives “units-squared”
Hard to compare to mean
Use standard deviation, s


s = square root of variance
Units = same as mean
Copyright 2004 David J. Lilja
51
Coefficient of Variation (COV)


s
COV 
x
Dimensionless
Compares relative size
of variation to mean
value
Copyright 2004 David J. Lilja
52
Important Points

“Average” metrics are dangerous




Hides multidimensional aspects of performance
Hides variability in group of measurements
But often forced to report a “typical” value
Know what others are telling you!
Copyright 2004 David J. Lilja
53
Important Points

Sample mean



Sample median


First moment of random process
Run-of-the-mill average
Middle value
Sample mode

Most common value
Copyright 2004 David J. Lilja
54
Important Points

Arithmetic mean


Harmonic mean


Use to summarize rates
Geometric mean


Use to summarize times
Don’t use for times or rates
Variance, standard deviation, coefficient of
variation

Quantify variability
Copyright 2004 David J. Lilja
55