#### Transcript Lies, Damn Lies, and Statistics

Don't Compare Averages WEA 2005 May 10 – May 13, Santorini Island, Greece Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber Two famous quotes There are three kinds of lies: lies, damn lies, and statistics Benjamin Disraeli, 1804 – 1881 (reported by Mark Twain) Never believe any statistics you haven‘t forged yourself Winston Churchill, 1874 – 1965 A typical figure Y-axis: some cost measure Each point represents an average over a number of iterations 4 Theirs Ours 3 X-axis: input size Changing the cost measure ... … by a monotone function, say from c to 2c c 4 2c 15 10 3 This is from authentic data! No deep mathematics here Even for strict monotone f – certainly E f(X) ≠ f(E X) in general – but also E X ≤ E Y does not in general imply E f(X) ≤ E f(Y) Example – X : 4 , 4 → average 4 – Y : 1 , 5 → average 3 – 2X : 24 , 24 → average 16 – 2Y : 21 , 25 → average 17 Examples of multiple cost measures Language modeling – for a given probability distribution p1,…, pn – find distribution q1,…, qn from a constrained class that • minimizes cross-entropy Σ pi log (pi/qi) • minimizes perplexity π (pi/qi) pi cross-entropy =2 Algorithm A uses algorithm B as a subroutine – B produces result of average quality q – complexity of A depends on, say, q2 Can this also happen with error bars? error bars for c don't overlap, yet reversal for f(c)? c f(c) Yes, this can also happen! Can this also happen with error bars? complete reversal with error bars? c f(c) Can this also happen with error bars? complete reversal with error bars? c f(c) Can this also happen with error bars? complete reversal with error bars? f(c) c E f(Y) – δ f(Y) EX–δX E f(X) + δ f(X) EY+δY δ Z = E |Z – E Z| ≤ σ Z = sqrt E (Z – E Z)2 absolute deviation standard deviation Can this also happen with error bars? complete reversal with error bars? f(c) c if E X – δ X ≥ EY+δY then E f(X) – δ f(X) ≥ E f(Y) + δ f(Y) Theorem: complete reversal can never happen! Can this also happen with error bars? complete reversal with error bars? f(c) c if E X – δ X ≥ EY+δY then E f(X) – δ f(X) ≥ E f(Y) + δ f(Y) if only one of the four δ is dropped, the theorem no longer holds in general Our first proof The canonical proof 1. The medians M X and M Y do commute with f … Prob(X ≤ M X) = ½ = Prob( f(X) ≤ f(M X) ) f(M X) = M f(X) and f(M Y) = M f(Y) 2. … and hence cannot reverse their order M X ≤ M Y → f(M X) ≤ f(M Y) → M f(X) ≤ M f(Y) because f is monotone because M and f commute 3. Expectation and median are related as |E X – M X| ≤ δ X = E |X – E X| |E Y – M Y| ≤ δ Y = E |Y – E Y| nothing new, but hardly any computer scientist seems to know The canonical proof now assume this would happen f(c) c E f(Y) – δ f(Y) EX–δX E f(X) + δ f(X) EY+δY then M Y ≤ M X yet M f(Y) > M f(X) contradicts the fact that the medians cannot reverse Conclusion Average comparison is a deceptive thing – even with error bars! There are more effects of this kind … – e.g. non-overlapping error bars are not statistically significant for a particular order of the expectations (or medians) Y X – e.g. for normally distributed X, Y Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8% Better always look at the complete histogram and at least check maximum and minimum Conclusion Average comparison is a deceptive thing – even with error bars! There are more effects of this kind … – e.g. non-overlapping error bars are not statistically significant for a particular order of the expectations (or medians) – e.g. for normally distributed X, Y Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8% Ευχαριστώ!