Probability and Statistics, part II

Download Report

Transcript Probability and Statistics, part II

Binomial Probability Distribution
For the binomial distribution P is the probability of m successes out of N trials.
Here p is probability of a success and q=1-p is probability of a failure  only two choices in a binomial process.
Tossing a coin N times and asking for m heads is a binomial process.
The binomial coefficient keeps track of the number of ways (“combinations”) we can get the desired outcome.
2 heads in 4 tosses: HHTT, HTHT, HTTH, THHT, THTH, TTHH
Does this formula make sense, e.g. if we sum over all possibilities do we get 1?
P(m, N ,
To show that this distribution is normalized properly, first remember the Binomial
Theorem:
k
k 
(a  b)k    a k l bl
l 0 l 
For this example a = q = 1 - p and b = p, and (by definition) a +b = 1.
N
N

N p m q N m  ( p  q)N  1
P(m,
N,
p)



Thus the distribution is normalized properly.
m0
m  0 m 
What is the mean of this distribution?
N

 mP(m, N, p)
m 0
N
 P(m, N, p)
N
  mP(m,N, p) 
m 0
N
N
 mmp
m
q N m
p) 
N!
p m q N m
m!( N  m)!
N
C N ,m   
m
binomial coefficien t :
m0
m0
A cute way of evaluating the above sum is to take the derivative:
N
N  m N m 
N  m 1 N  m N 

  N 


N p m (N  m)(1 p)N  m 1
p
q

0

m
p
q





 p 
m  0m

m 0 m
m 0 m
N
N  m 1 N m

N

m
p
q

 m
 mp m (N  m)(1  p) N m 1
m 0
m0
N
p
1
N
N
N  m N m
N  m
N



1
Nm
1


 mpm (1  p) N  m
m
p
q

(1

p)
N
p
(1

p)

(1
p)



m  0 m
m  0 m
m  0 m
N
1
1
1
 p   (1 p) N(1)  (1  p) 
880.P20 Winter 2006
=Np
Richard Kass
Binomial Probability Distribution
What’s the variance of a binomial distribution?
Using a trick similar to the one used for the average we find:
N
2
 (m   ) P(m, N , p)
 2  m 0
N
 Npq
 P(m, N , p)
m 0
Detection efficiency and its “error”:
Suppose you observed m special events (or successes) in a sample of N events. The
measured probability (sometimes called “efficiency”) for a special event to occur is
  m / N . What is the error ( standard deviation or ) in  ? Since N is a fixed
quantity it is plausible (we will show it soon) that the error in  is related to the error (
standard deviation or m) in m by:
  m / N .
This leads to:
  m / N  Npq / N  N (1 ) / N   (1  ) / N
This is sometimes called "error on the efficiency".
Thus you want to have a sample (N) as large as possible to reduce the uncertainty in the
probability measurement!
Note: , the “error in the efficiency” 0 as  0 or   1.
(This is NOT a gaussian  so don’t stick it into a Gaussian pdf to calculate probability)
880.P20 Winter 2006
G
G
x
x
Richard Kass
Binomial Probability Distributions
When a -ray goes though material there is chance that it will convert into an
electron-positron pair,  e+e-.
Let’s assume the probability for conversion is 10%.
If 100 ’s go through this material on average how many will convert to e+e-?
= Np = 100(0.1) = 10 conversions
Consider the case where the ’s come from 0’s.   most (98.8%) of the time.
We can ask the following:
What is the probability that both ’s will convert?
P(2)=Probability of 2/2 = (0.1)2 =0.01= 1%
What is the probability that one will convert?
P(1)=Probability of 1/2
= [2!/(1!1!)](0.1)1(0.9)1 = 18%
What is the probability that both ’s will not convert?
P(0)=Probability of 0/2
=[2!/(0!2!)](0.1)0(0.9)2 = 81%
Note: P(2)+P(1)+P(0)=100%
Finally, the probability of at least one conversion is:
P(1)=1- P(0) = 19%
880.P20 Winter 2006
Richard Kass
Poisson Probability Distribution
Another important discrete distribution is the Poisson distribution. Consider the following conditions:
a) p is very small and approaches 0.
For example suppose we had a 100 sided dice instead of a 6 sided dice.
Here p = 1/100 instead of 1/6. Suppose we had a 1000 sided dice, p = 1/1000...etc
b) N is very large, it approaches .
For example, instead of throwing 2 dice, we could throw 100 or 1000 dice.
aradioactive decay
c) The product Np is finite.
A good example of the above conditions occurs when one considers radioactive decay. anumber of Prussian soldiers kicked
to death by horses per year per army corps!
Suppose we have 25 mg of an element. This is  1020 atoms.
aquality control, failure rate predictions
Suppose the lifetime () of this element = 1012 years  5x1019 seconds.
The probability of a given nucleus to decay in one second = 1/ = 2x10-20/sec.
For this example:
N = 1020 (very large) p = 2x10-20 (very small)
Np = 2 (finite!)
We can derive an expression for the Poisson distribution by taking the appropriate limits of the binomial distribution.
P(m, N, p) 
Using condition b) we obtain:
q
N m
 (1 p)
N m
N!
m Nm
p q
m!(N  m)!
N!
N(N  1) (N  m  1)(N  m)!
N>>m
m

N
(N  m)!
(N  m)!
p2 (N  m)(N  m  1)
( pN)2
 pN
 1 p(N  m) 
... 1  pN 
 e
2!
2!
Putting this altogether we obtain:
P(m, N, p) 
N m p me p N e    m

m!
m!
Here we've let  = pN.
It is easy to show that:  = Np = mean of a Poisson distribution
2 = Np =  = variance of a Poisson distribution.
Note: m is always an integer 0 however,  does not have to be an integer.
In a counting experiment if you observe m events:     m
880.P20 Winter 2006
Richard Kass
Poisson Probability Distribution
Radioactivity Example:
a) What’s the probability of zero decays in one second if the average = 2 decays/sec?
e 2 20 e2 1
P(0,2) 

 e 2  0.135 13.5%
0!
1
b) What’s the probability of more than one decay in one second if the average = 2 decays/sec?
e 2 20 e 2 21
P( 1,2)  1 P(0,2)  P(1,2) 1 

 1 e 2  2e 2  0.594  59.4%
0!
1!
c) Estimate the most probable number of decays/sec?

P(m,  ) m*  0
We want:
m
To solve this problem its convenient to maximize lnP(m, ) instead of P(m, ).
e   m
ln10!=15.10
10ln10-10=13.03 14%
ln P(m,  )  ln(
)    mln   lnm!
m!
ln50!=148.48 50ln50-50=145.601.9%
In order to handle the factorial when take the derivative we use Stirling's Approximation:
ln(m!)  mln(m)-m



lnP(m,  ) 
(  m * ln   lnm *!) 
(  m * ln  m * lnm * m*)  ln   lnm * 1 1  0
m
m
m
m* = 
In this example the most probable value for m is just the average of the distribution. Therefore if you observed m
events in an experiment, the error on m is   m .
Caution: The above derivation is only approximate since we used Stirlings Approximation which is only
valid for large m. Another subtle point is that strictly speaking m can only take on integer values
0.4
while  is not restricted to be an integer.
0.5
poisson
binomialN=3, p=1/3
0.3
0.2
binomialN=10,p=0.1
poisson
0.25
0.2
0.15
0.1
0.1
0.05
0
0
880.P20 Winter 2006
0.3
Probability
0.4
Probability
Comparison of
Binomial and Poisson
distributions with mean
=1.
0.35
0
1
2 m 3
4
5
0.0
1.0
2.0
3.0
m
4.0
5.0
6.0
7.0
Richard Kass
Not much
difference
between
them here!
Poisson Probability Distribution
Counting the numbers of cosmic rays that pass through a detector in a 15 sec interval
Data is compared with a poisson using the measured average number of cosmic
rays passing through the detector in eighty one 15 sec. intervals (=5.4)
Error bars are (usually) calculated using ni (ni=number in a bin) Why?
Assume we have N total counts and the probability to fall in bin i is pi.
For a given bin we have a binomial distribution (you’re either in or out).
The expected average number in a given bin is: Npi and the variance is Npi(1-pi)=ni(1-pi)
If we have a lot of bins then the probability of a event falling into a bin is small so (1-pi) 1
20
poisson with =5.4
15
number of
occurrences 10
5
counts
occurrences
0
0
1
2
2
9
3
11
4
8
5
10
6
17
7
6
8
8
9
6
12
10
3
number of cosmic rays in a 15 sec. interval
11
0
12
0
13
1
2
880.P20 Winter 2006
4
6
8
10
In our example the largest
pi =17/81=0.21
correction=(1-.21)1/2=0.88
Richard Kass
Gaussian Probability Distribution
The Gaussian probability distribution (or “bell shaped curve” or Normal
distribution) is perhaps the most used distribution in all of science. Unlike the
binomial and Poisson distribution the Gaussian is a continuous distribution. It is
given by:
1
p(y) 
e
 2
 y   
2 2
2
with = mean of distribution (also at the same place as mode and median)
2 = variance of distribution
y is a continuous variable (-y 
The probability (P) of y being in the range [a, b] is given by an integral:
2
( y   )
1 b 22
P(a  y  b) 
dy
e
 2 a
Since this integral cannot be evaluated in closed form for arbitrary a and b (at least
no one's figured out how to do it in the last couple of hundred years) the values of
the integral have to be looked up in a table.
The total area under the curve is normalized to one.
In terms of the probability integral we have:
2
  ( y 2 )
1
P(   y  ) 
dy  1
 e 2
 2  
Quite often we talk about a measurement being a certain number of
standard deviations () away from the mean () of the Gaussian.
We can associate a probability for a measurement to be
|- n| from the mean just by calculating the area outside of this region.
n
Prob. of exceeding ±n
0.67
0.5
1
0.32
It is very unlikely (<0.3%) that a
2
0.05
measurement taken at random from a
3
0.003
gaussian pdf will be more than 3from
4
0.00006
880.P20 Winter 2006
the true mean of the distribution.
x
Richard Kass
Central Limit Theorem
Why is the gaussian pdf so important ?
“Things that are the result of the addition of lots of small effects tend to become Gaussian”
The above is a crude statement of the Central Limit Theorem:
A more exact statement is:
Let Y1, Y2,...Yn be an infinite sequence of independent random variables each with the
2
same probability distribution. Suppose that the mean () and variance ( ) of this
distribution are both finite. Then for any numbers a and b:
Actually, the Y’s can
 Y  Y2  Yn  n

lim P a  1
 b 
n 

 n

1 b 1/2y2
dy
e
2 a
be from different pdf’s!
Thus the C.L.T. tells us that under a wide range of circumstances the probability
distribution that describes the sum of random variables tends towards a Gaussian
distribution as the number of terms in the sum .
Alternatively,
Y 
Y 
lim P a 
 b lim P a 
 b
  / n
 n  
n 

m
1 b  1/2 y2
dy
e
2 a
Note: m is sometimes called “the error in the mean” (more on that later).
For CLT to be valid:
 and  of pdf must be finite
No one term in sum should dominate the sum
880.P20 Winter 2006
Richard Kass
Central Limit Theorem
Best illustration of the CLT.
a) Take 12 numbers (ri) from your computer’s random number generator
b) add them together
c) Subtract 6
d) get a number that is from a gaussian pdf !
Computer’s random number generator gives numbers distributed uniformly in the interval [0,1]
A uniform pdf in the interval [0,1] has =1/2 and 2=1/12
12
12
12






 ri  12
 ri  12  (1 / 2)
 ri  12  (1 / 2)






Y

Y

Y

Y

n



1
2
3
n
i 1
i 1
i 1
P a 
 b   P a 
 b   P  6 
 6  P   6 
 6
 n
 n
(1 / 12 )  12
(1 / 12 )  12














12
1  6  ( y 2 / 2)


P  6   ri  6  6 
dy
e
2  6


i 1
Thus the sum of 12 uniform
random numbers
minus 6 is distributed as if it
came from a gaussian pdf
with =0 and =1.
A) 5000 random numbers
C) 5000 triplets (r1+ r2+ r3)
of random numbers
B) 5000 pairs (r1+ r2)
of random numbers
D) 5000 12-plets (r1+ ++r12)
of random numbers.
E) 5000 12-plets
E
(r + ++r -6) of
1
12
random numbers.
Gaussian
=0 and =1
In this case, 12 is close to .
-6
880.P20 Winter 2006
0
+6
Richard Kass

Central Limit Theorem
Example: An electromagnetic calorimeter is being made out of a sandwich of lead and
plastic scintillator. There are 25 pieces of lead and 25 pieces of plastic, each piece is
nominally 1 cm thick. The spec on the thickness is 0.5 mm and is uniform in [-0.5,0.5] mm.
The calorimeter has to fit inside an opening of 51 cm. What is the probability that it won’t
will fit?
Since the machining errors come from a uniform distribution with a well defined mean and variance
the Central
 Limit
 1 b  12 y 2
Y Y Theorem
...Yn  nis
 applicable:
lim Pa  1 2
 b
dy
e

n 
 n
2 a
The upper
Y1 limit
Y2  ...corresponds
Yn  n 50(0to
.5) many
 50  0large machining errors, all +0.5 mm:
b

 n
1
12
 12.2
50
The lower limit
a sum
Y1  Ycorresponds
1to
 50
0 of machining errors of 1 cm.
2  ...Yn  n
a

 n
1
12
 0.49
50
12.2
The probability for
to be greater than 5 cm is:
1 the stack
 y
P
2
e
1
2
2
dy  0.31
0.49
(and a 100% chance someone will get fired if it doesn’t fit inside the box…)
There’s
a 31% chance
the calorimeter won’t fit inside the box!
880.P20 Winter 2006
Richard Kass
When Doesn’t the Central Limit Theorem Apply?
Case I) PDF does not have a well defined mean or variance.
The Breit-Wigner distribution does not have a well defined variance!
BW (m) 

1
2 (m  m0 ) 2  ( / 2) 2
Describes the shape of a resonance, e.g. K*

normalized :
 BW (m)dm  1


well defined average :
 mBW (m)dm  m

undefined variance since :
0

m
2
BW (m)dm  

Case II) Physical process where one term in the sum dominates the sum.
i) Multiple scattering: as a charged particle moves through material it undergoes
many elastic (“Rutherford”) scatterings. Most scattering produce small angular deflections
(d/dW~q-4) but every once in a while a scattering produces a very large deflection.
If we neglect the large scatterings the angle qplane is gaussian distributed.
The mean q depends on the material thickness & particle’s charge & momentum
ii) The spread in range of a stopping particle (straggling).
A small number of collisions where the particle loses a lot
of its energy dominates the sum.
iii) Energy loss of a charged particle going through a gas.
Described by a “Landau” distribution (very long “tail”).
880.P20 Winter 2006
Richard Kass