Transcript Statistics

Statistics
Large Systems
 Macroscopic systems involve
large numbers of particles.
 Microscopic determinism
 Macroscopic phenomena
 The basis is in mechanics from
individual molecules.
 Classical and quantum
 Consider 1 g of He as an ideal
gas.
 N = 1.5  1023 atoms
 Use only position and
momentum.
 3 + 3 = 6 coordinates / atom
 Total 9  1023 variables
 Requires about 4  109 PB
 Find the total kinetic energy.
 Statistical thermodynamics
provides the bridge between
levels.




K = (px2 + py2 + pz2)/2m
About 100 ops / collision
At 100 GFlops, 9  1014 s
1 set of collisions in 3  107 yr
Ensemble
 Computing time averages for
large systems is infeasible.
 Imagine a large number of
similar systems.
 Prepared identically
 Independent
 This ensemble of systems can
be used to derive theoretical
properties of a single system.
Probability
 Probability is often made as a statement before the fact.
 A priori assertion - theoretical
 50% probability for heads on a coin
 Probability can also reflect the statistics of many events.
 25% probability that 10 coins have 5 heads
 Fluctuations where 50% are not heads
 Probability can be used after the fact to describe a
measurement.
 A posteriori assertion - experimental
 Fraction of coins that were heads in a series of samples
Head Count
trial
#heads
trial
#heads
1
5
11
5
2
8
12
1
3
6
13
5
4
5
14
5
5
6
15
6
6
6
16
6
7
1
17
2
8
5
18
4
9
7
19
6
10
4
20
6
 Take a set of experimental
trials.




N number of trials
n number of values (bins)
i a specific trial (1 … N)
j a specific value (1 … n)
 Use 10 coins and 20 trials.
Distribution
f(x)
 Sorting trials by value forms a
distribution.
7
6
5
 Distribution function f counts
occurrences in a bin
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
x
 The mean is a measure of the
center of the distribution.
 Mathematical average
 Coin distribution <x> = 4.95
N
 Median - midway value
i 1
 Coin median = 5
f ( x)     xi  x 
1
x 
N
N
 xi
i 1
 Mode - most frequent value
 Coin mode = 6
Probability Distribution
P(x)
 The distribution function has a
sum equal to the number of
trials N.
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
x
1 N
1 N n
x   xi   x j xi  x j 
N i 1
N i 1 j 1
n
x   Pj x j
j 1
 A probability distribution p
normalizes the distribution
function by N.
 Sum is 1
 The mean can be expressed in
terms of the probability.
Subsample
trial
#heads
trial
#heads
1
5
11
5
2
8
12
1
3
6
13
5
4
5
14
5
5
6
15
6
6
6
16
6
7
1
17
2
8
5
18
4
9
7
19
6
10
4
20
6
 Subsamples of the data may
differ on their central value.




First five trials
Mean 6.0
Median 6
Mode 5 and 6, not unique
 Experimental probability
depends on the sample.
 Theoretical probability predicts
for an infinitely large sample.
Deviation
 Individual trials differ from the
mean.
xi  xi  x
 The deviation is the difference
of a trial from the mean.
1 N
x   xi  x
N i 1
N
 x 
x 0
N
 mean deviation is zero
 The fluctuation is the mean of
the squared deviations.
 Fluctuation is the variance
 Standard deviation squared
x 
2
1

N
 x2  x

 x  x 
N
i 1
2
2
i
Correlation
 Events may not be random,
but related to other events.
 Time measured by trial
 The correlation function
measures the mean of the
product of related deviations.
 Autocorrelation C0
 Different variables can be
correlated.
1
Ck 
N k
N k
 x  x x
i 1
ik
i
 x
Ck  xixi  k
1
Ck 
N k
1
C xy 
N
N k
 xi xi k  x
i 1
 x  x y
N
i 1
C xy  xy
2
i
i
 y


Independent Trials
trial
#heads
trial
#heads
1
5
11
5
2
8
12
1
3
6
13
5
4
5
14
5
5
6
15
6
6
6
16
6
7
1
17
2
8
5
18
4
9
7
19
6
10
4
20
6
 Autocorrelation within a
sample is the variance.
 Coin experiment C0 = 3.147
 Nearest neighbor correlation
tests for randomness.
 Coin experiment C1 = -0.345
 Much less than C0
 Ratio C1 / C0 = -0.11
 Periodic systems have Ct peak
for some period t.
Correlation Measure
 Independent trials should peak
strongly at 0.
 No connection to subsequent
events
 No periodic behavior
 “This sample autocorrelation
plot shows that the time series
is not random, but rather has a
high degree of autocorrelation
between adjacent and nearadjacent observations.”
nist.gov
Continuous Distribution
 Data that is continuously
distributed is treated with an
integral.
 Probability still normalized to 1
 The mean and variance are
given as the moments.
 First moment mean
 Second moment variance
 Correlation uses a time
integral.
N   dxf  x
f x 
P( x) 
N
x   dxP xx
C0   dxPxx2  x
C t    dtxt xt  t 
2
Joint Probability
 The probabilities of two
systems may be related.
A
 The intersection A  B
indicates that both conditions
are true.
C
B
C=AB
 Independent probability →
 P(A  B) = P(A)P(B)
 The union A  B indicates that
either condition is true.
 P(A  B) =P(A)+P(B)-P(A  B)
 P(A) + P(B), if exclusive
Joint Tosses
x
P(x)
0
0
1
0.10
2
0.05
3
0
4
0.10
5
0.30
6
0.35
7
0.05
8
0.05
9
0
10
0
 Define two classes from the
coin toss experiment.
 A={x<5}
 B={2<x<8}
 Individual probabilities are a
union of discrete bins.
 P(A) = 0.25, P(B) = 0.80
 P(A  B) = 0.95
 Dependent sets don’t follow
product rule.
 P(A  B) = 0.1  P(A)P(B)
Conditional Probability
 The probability of an
occurrence on a subset is a
conditional probability.
A
 Probability with respect to
subset.
 P(A | B) =P(A  B) / P(B)
C
B
 Use the same subsets for the
coin toss example.
 P(A | B) = 0.10 / 0.80 = 0.13
C=A|B
Combinatorics
 The probability that n specific
occurrences happen is the
product of the individual
occurrences.
 Other events don’t matter.
 Separate probability for
negative events
 Arbitrary choice of events
require permutations.
 Exactly n specific events
happen at p:
n
P p
 No events happen except the
specific events:
N n
Pq
 Select n arbitrary events from
a pool of N identical types.
N
N!
  
 n  n!( N  n)!
Binomial Distribution
 Treat events as a Bernoulli
process with discrete trials.




N separate trials
Trials independent
Binary outcome of trial
Probability same for all trials
mathworld.wolfram.com
 The general form is the
binomial distribution.
 Terms same as binomial
expansion
 Probabilities normalized
 N  n N n
Pn    p q
n
N
P
n 0
n
 ( p  q) N  1
Mean and Standard Deviation
 The mean m of the binomial
distribution:
N
N
 N  n N n
m   nPn   n  p q
n 0
n 0  n 
 Consider an arbitrary x, and
differentiate, and set x = 1.
 N  n n N n
N
( px  q)     p x q
n 0  n 
N
Np( px  q )
N 1
N
  nx n 1 Pn
n 0
 The standard deviation s of
the binomial distribution:
N
s   n  m 2 Pn
2
n 0
s 2   (n2 Pn  2mnPn  m 2 Pn )
s 2   n2 Pn  2m  nPn  m 2  Pn
s 2  [ N ( N  1) p 2  m ]  2m 2  m 2
s 2  N 2 p 2  Np 2  Np  N 2 p 2
N
Np   nPn  m
n 0
s  Np(1  p)  Npq
Poisson Distribution
 Many processes are marked
by rare occurrences.
 Large N, small n, small p
N
N!
Nn
  

 n  n!( N  n)! n!
q N n  q N  (1  p) N
N ( N  1) 2
p 
2!
( Np) 2
 1  Np 
   e  Np
2!
q N  n  1  Np 
q N n
 This is the Poisson
distribution.
 Probability depends on only
one parameter Np
 Normalized when summed
from n =0 to .
 N  n N  n ( Np) n  Np
Pn    p q

e
n!
n
Poisson Properties
 The mean and standard deviation are simply related.
 Mean m = Np, standard deviation s2 = m, s  m
 Unlike the binomial distribution the Poisson function has
values for n > N.
Poisson Away From Zero
 The Poisson distribution is
based on the mean m = Np.
 Assumed N >> 1, N >> n.
 Now assume that n >> 1, m
large and Pn >> 0 only over a
narrow range.
 This generates a normal or
Gaussian distribution.
Let x = n – m.
m m  xe m
m m m xe m
Px 

( m  x)! m![( m  x)! / m!]
Use Stirling’s formula.
m! 2mm m e m
Px 
Px 
mx
2m[( m  1)...( m  x)]
1
2m[(1  1 / m )...(1  x / m )]
e x / 2m
Px 

1/ m
x/m
2m[(e )...(e )]
2m
1
2
Normal Distribution
 The full normal distribution
separates mean m and
standard deviation s
parameters.
P(x)
1
 x  m 2 / 2s 2
f ( x) 
e
2 s
 Tables provide the integral of
the distribution function.
 Useful benchmarks:
 P(|x - m| < 1 s = 0.683
 P(|x - m| < 2 s = 0.954
 P(|x - m| < 3 s = 0.997
m
0
x