#### Transcript Lies, Damn Lies, and Statistics

```Don't Compare Averages
WEA 2005
May 10 – May 13, Santorini Island, Greece
Holger Bast
Max-Planck-Institut für Informatik (MPII)
Saarbrücken, Germany
joint work with Ingmar Weber
Two famous quotes
There are three kinds of lies: lies, damn
lies, and statistics
Benjamin Disraeli, 1804 – 1881
(reported by Mark Twain)
Never believe any statistics you haven‘t
forged yourself
Winston Churchill, 1874 – 1965
A typical figure
Y-axis: some cost measure
Each point represents an average
over a number of iterations
4
Theirs
Ours
3
X-axis: input size
Changing the cost measure ...
 … by a monotone function, say from c to 2c
c
4
2c
15
10
3
This is from authentic data!
No deep mathematics here
 Even for strict monotone f
– certainly E f(X) ≠ f(E X) in general
– but also E X ≤ E Y does not in general imply E f(X) ≤ E f(Y)
 Example
– X : 4 , 4 → average 4
– Y : 1 , 5 → average 3
– 2X : 24 , 24 → average 16
– 2Y : 21 , 25 → average 17
Examples of multiple cost measures
 Language modeling
– for a given probability distribution p1,…, pn
– find distribution q1,…, qn from a constrained class that
• minimizes cross-entropy Σ pi log (pi/qi)
• minimizes perplexity
π (pi/qi)
pi
cross-entropy
=2
 Algorithm A uses algorithm B as a subroutine
– B produces result of average quality q
– complexity of A depends on, say, q2
Can this also happen with error bars?
 error bars for c don't overlap, yet reversal for f(c)?
c
f(c)
Yes, this can also happen!
Can this also happen with error bars?
 complete reversal with error bars?
c
f(c)
Can this also happen with error bars?
 complete reversal with error bars?
c
f(c)
Can this also happen with error bars?
 complete reversal with error bars?
f(c)
c
E f(Y) – δ f(Y)
EX–δX
E f(X) + δ f(X)
EY+δY
δ Z = E |Z – E Z|
≤ σ Z = sqrt E (Z – E Z)2
absolute deviation
standard deviation
Can this also happen with error bars?
 complete reversal with error bars?
f(c)
c
if E X – δ X
≥ EY+δY
then E f(X) – δ f(X)
≥ E f(Y) + δ f(Y)
Theorem: complete reversal can never happen!
Can this also happen with error bars?
 complete reversal with error bars?
f(c)
c
if E X – δ X
≥ EY+δY
then E f(X) – δ f(X)
≥ E f(Y) + δ f(Y)
if only one of the four δ is dropped,
the theorem no longer holds in general
Our first proof
The canonical proof
1. The medians M X and M Y do commute with f …

Prob(X ≤ M X) = ½ = Prob( f(X) ≤ f(M X) )

f(M X) = M f(X) and f(M Y) = M f(Y)
2. … and hence cannot reverse their order

M X ≤ M Y → f(M X) ≤ f(M Y)
→ M f(X) ≤ M f(Y)
because f is monotone
because M and f commute
3. Expectation and median are related as
 |E X – M X| ≤ δ X = E |X – E X|
 |E Y – M Y| ≤ δ Y = E |Y – E Y|
nothing new, but hardly
any computer scientist
seems to know
The canonical proof
 now assume this would happen
f(c)
c
E f(Y) – δ f(Y)
EX–δX
E f(X) + δ f(X)
EY+δY
then M Y ≤ M X
yet M f(Y) > M f(X)
contradicts the fact that the medians cannot reverse
Conclusion
 Average comparison is a deceptive thing
– even with error bars!
 There are more effects of this kind …
– e.g. non-overlapping error bars
are not statistically significant
for a particular order of the
expectations (or medians)
Y
X
– e.g. for normally distributed X, Y
Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8%
Better always look at the complete histogram
and at least check maximum and minimum
Conclusion
 Average comparison is a deceptive thing
– even with error bars!
 There are more effects of this kind …
– e.g. non-overlapping error bars
are not statistically significant
for a particular order of the
expectations (or medians)
– e.g. for normally distributed X, Y
Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8%
Ευχαριστώ!
```