Transcript Experiments

Probability and Statistics
Many of the process involved with detection of particles are statistical in nature:
Number of ion pairs created when proton goes through 1 cm of gas
Energy lost by an electron going through 1 mm of lead
The understanding and interpretation of all experimental data depend
on statistical and probabilistic concepts:
“The result of the experiment was inconclusive so we had to use statistics”
how do we extract the best value of a quantity from a set of measurements?
how do we decide if our experiment is consistent/inconsistent with a given theory?
how do we decide if our experiment is internally consistent?
how do we decide if our experiment is consistent with other experiments?
how do we decide if we have a signal (i.e. evidence for a new particle)?
+
Our pentaquark example from the SPring-8 (LEPS) experiment :
signal: 19 events
Significance: 4.6s (Assuming gaussian stats the prob. for a 4.6s effect is ~4x10-6)
K n=Q5=dudus
What are the authors trying to say here?
If this bump is accidental, then the accident rate is 1 in 4 million.
Or
If I repeated the experiment 4 million times I would get a bump this big or bigger.
What do the authors want you to think?
Since the accident rate is so low it must not be an accident, therefore it is physics!
880.P20 Winter 2006
Richard Kass
Probability & Statistics & Reality
Sometimes it is not a question of statistical significance!
Again, the pentaquark state Q+(1540) gives a great example:
Consider the CLAS experiment at JLAB:
2003/4 report a 7.8s effect (~6x10-15 according to MATHEMATICA)
2005 report NO Signal! (better experiment)
CLAS 2003/2004
CLAS 2005 g11
What size signal should we expect?
Lesson: This is not a statistics issue, but one of experiment design and implementation.
880.P20 Winter 2006
Richard Kass
How do we define Probability?
Definition of probability by example (“empirical”):
One can find 4 definitions of
probability: mathematical,
empirical, objective, and
subjective.
(see STATISTICS by Barlow)
Suppose we have N trials and a specified event occurs r times.
example: the trial could be rolling a dice and the event could be rolling a 6.
define probability (P) of an event (E) occurring as:
P(E) = r/N when N 
examples:
six sided dice: P(6) = 1/6
for an honest dice: P(1) = P(2) = P(3) = P(4) =P(5) = P(6) =1/6
“objective”
probability
coin toss:
P(heads) = P(tails) =0.5
P(heads) should approach 0.5 the more times you toss the coin.
For a single coin toss we can never get P(heads) = 0.5!
By definition probability (P) is a non-negative real number bounded by 0 P 1
if P = 0 then the event never occurs “mathematical” probability
A= {1,2,3}
if P = 1 then the event always occurs
B={1,3,5}
intersection
AB={1,3}
Let A and B be subsets of S then P(A)≥0, P(B)≥0  union
AB= {1,2,3,5}
Events are independent if: P(AB) = P(A)P(B)
Coin tosses are independent events, the result of the next toss does not depend on previous toss.
Events are mutually exclusive (disjoint) if: P(AB) = 0 or P(AB) = P(A) + P(B)
In tossing a coin we either get a head or a tail.
Sum (or integral) of all probabilities if they are mutually exclusive must = 1.
 P(xi  = 1 or  p( x)dx = 1 p(x) =probability distribution function (pdf)
880.P20 Winter 2006
i
Richard Kass
Two Types of Probability
l
Probability can be a discrete or a continuous variable.
Discrete probability: P can have certain values only.
examples:
tossing a six-sided dice: P(xi) = Pi here xi = 1, 2, 3, 4, 5, 6 and Pi = 1/6 for all xi.
tossing a coin: only 2 choices, heads or tails.
NOTATION
for both of the above discrete examples (and in general)
xi is called a
when we sum over all mutually exclusive possibilities:
random variable
 P (xi  =1
i
Continuous probability: P can be any number between 0 and 1.
define a “probability density function”, pdf, f(x):
f (xdx = dP(x  a  x  dx with a a continuous variable
 Probability for x to be in the range a  x  b is:
b
P(a  x  b) =  f (xdx

Probability=“area under the curve”
a
Just like the discrete case the sum of all probabilities must equal 1.

 f (xdx =1


We say that f(x) is normalized to one.
Probability for x to be exactly some number is zero since:

x=a
 f (x dx = 0
x=a
Note: in the above example the pdf depends on only 1 variable, x. In general, the pdf can depend on many
variables, i.e. f=f(x,y,z,..). In these cases the probability is calculated from a multi-dimensional integration.
880.P20 Winter 2006
Richard Kass
Some Common Probability Distributions
Examples of some common P(x)’s and f(x)’s:
Discrete = P(x)
Continuous = f(x)
binomial
uniform, i.e. = constant
Poisson
Gaussian
exponential
chi square
How do we describe a probability distribution?
mean, mode, median, and variance
For a continuous distribution these quantities are defined by:
Mean
average

=
 xf(x)dx

Mode
most probable
 f (x 
=0
x x= a
Median
50% point
a
0.5 =
 f (x)dx
Variance
width of distribution

s =
2


f (x)(x    dx
2

For discrete distribution the mean and variance are defined by:
n
 =  xi / n
i =1
880.P20 Winter 2006
n
s =  ( xi   ) 2 / n
2
i =1
Richard Kass
Some Continuous Probability Distributions
Remember: Probability is the area under these curves!
For many pdfs its integral can not be done in closed form, use a table to calculate probability.
For a Gaussian pdf
the mean, mode,
and median are
all at the same x.
For many pdfs
the mean, mode,
and median are
in different places.
Chi-square distribution
880.P20 Winter 2006
v=1 Cauchy (Breit-Wigner)
v= gaussian
u
Student t distribution
Richard Kass
Uniform distribution and Random Numbers
What is a uniform probability distribution: p(x)?
p(x)=constant (c) for a  x b
p(x)=zero everywhere else
Therefore p(x1)dx1= p(x2)dx2 if dx1=dx2  equal intervals give equal probabilities
For a uniform distribution with a=0, b=1 we have p(x)=1
1
1
1
1
0
0
0
p(x)
 p( x)dx = 1   cdx = c  dx = c = 1
What is a random number generator ?
0
A number picked at random from a uniform distribution with limits [0,1]
x
All major computer languages (FORTRAN, C) come with a random number generator.
FORTRAN: RAN(iseed)
The following FORTRAN program generates 5 random numbers:
iseed=12345
do I=1,5
y=ran(iseed)
type *, y
enddo
end
880.P20 Winter 2006
0.1985246
0.8978736
0.2382888
0.3679854
0.3817045
If we generate “a lot” of random numbers
all equal intervals should contain the same
amount of numbers. For example:
generate: 106 random numbers
expect: 105 numbers [0.0, 0.1]
105 numbers [0.45, 0.55]
Richard Kass
1
Uniform Random Numbers & Monte Carlo
The uniform pdf is the basis of all Monte Carlo Calculations!
The Monte Carlo method is commonly used to simulate experiments.
A google search yielded 3,960,000 hits for “Monte Carlo Method”
First reference (http://www.geocities.com/CollegePark/Quad/2435/history.html):
“The Monte Carlo method provides approximate solutions to a variety of mathematical problems by
performing statistical sampling experiments on a computer. The method applies to problems with no
probabilistic content as well as to those with inherent probabilistic structure. Among all numerical methods
that rely on N-point evaluations in M-dimensional space to produce an approximate solution, the Monte
Carlo method has absolute error of estimate that decreases as N superscript -1/2 whereas, in the absence of
exploitable special structure all others have errors that decrease as N superscript -1/M at best.”
“The method is called after the city in the Monaco principality, because of a roulette, a simple random
number generator. The name and the systematic development of Monte Carlo methods dates from about
1944.”
“The real use of Monte Carlo methods as a research tool stems from work on the atomic bomb during the
second world war. This work involved a direct simulation of the probabilistic problems concerned with
random neutron diffusion in fissile material”
Basically, it is a way to do really complicated integrals!
A certain molecule always has a rectangular shape.
However, the length of a side varies uniformly between 0.5 and 1Å. 1 1
Calculate the probability that the area of a molecule is 0.5 Å2. P =  2  2dydx
0.5 0.5 / x
Suppose I want the volume to be 0.5 Å3?
880.P20 Winter 2006
Richard Kass
Uniform Random Numbers & Monte Carlo
Given the uniform pdf we can generate all other pdfs!
For example: if RAN is uniform in (0,1) then a+(b-a)*RAN is uniform in (a,b)
Suppose we want to generate random numbers according to a pdf=p(x) starting
from our uniform random number pdf (=r).
Let r0=random number uniform in (0,1), we want to find the x that satisfies:
r0
x
x
 rdx =  p( x)dx  r =  p( x)dx
0
0
0
“inversion”
0
Whether or not we can actually invert this equation depends on whether or not
p(x) can be integrated in closed form.
x
ex / 
Works for exponential pdf:
p ( x) =
  p( x)dx = 1  e  x /   x =  ln( 1  r0 )

0
Does not for gaussian since integral can not be done in closed form.
BUT lots of other clever ways of generating pdfs, e.g. gaussian: g=sin2pr1(-2lnr2)1/2
When all else fails, can use “acceptance-rejection”
1
p(x)
0
880.P20 Winter 2006
x
1) normalize p(x): p(x)/pmax Max. of function=1
2) pick a random number to represent x and calculate p(x)/pmax
3) pick another random number=y
4) if y<p(x)/pmax accept otherwise reject
5) repeat 2)-4) lots of times…
Problem: this algorithm can be very inefficient L
Richard Kass
Discrete Probability Distributions
Calculation of mean and variance:
example: a discrete data set consisting of three numbers: {1, 2, 3}
average () is just:
n x
1 2  3
= i =
=2
3
i=1 n
Complication: suppose some measurements are more precise than others.
Let each measurement xi have a weight wi associated with it then:
n
n
Real life example:

=
x
w
/
w


i
i
i
3 measurements of branching fraction of B-D0K*
i=1
i=1
“weighted average”


variance (s2) or average squared deviation from the mean is:
1 n
s 2 =  (xi   )2
n i=1
s is called the standard deviation
rewrite the above expression by expanding the summations:
n
n
n 
2 1
2
2
s =   xi     2   x i 
This is sometimes written as:
n i=1
i=1
i=1 
1 n 2 2
=  xi   2 2
n i=1
<x2>-<x>2 with <> average
of what ever is in the brackets
The variance
describes
the width
of the pdf !
1 n 2 2
=  xi 
n i=1
Note: The n in the denominator would be n -1 if we determined the average () from the data itself.
880.P20 Winter 2006
Richard Kass
Discrete & Continuous Probability Distributions
Using the definition of  from above we have for our example of {1,2,3}:
1 n
s 2 =  xi2  2 = 4.67  22 = 0.67
n i=1
The case where the measurements have different weights is more complicated:
n
n
n
n
i=1
i=1
i=1
i=1
s 2 =  w i (x i   ) 2 /  w i2 =  w i x i2 /  w i2   2

Here  is the weighted mean
If we calculated  from the data, s2 gets multiplied by a factor n/(n1).

Example: a continuous probability distribution, f ( x) = c sin 2 x for
This “pdf” has two modes!
It has same mean and median, but differ from the mode(s).
0  x  2p , c = constant
f(x)=sin2x is not a true pdf since it is not normalized!
f(x)=(1/p) sin2x is a normalized pdf (c=1/p).
2p
 sin
Note :
2p
0
2
xdx = p
2p
 =  x sin xdx /  sin 2 xdx = p
2
0
mode =
0

p 3p
sin 2 x = 0  ,
x
2 2
a
2p
median =  sin xdx /  sin 2 xdx =
0
880.P20 Winter 2006
2
0
1
a =p
2
Richard Kass
Probability, Set Theory and Stuff
The relationships and results from set theory are essential to the understanding of probability.
Below are some definitions and examples that illustrate the connection between set theory,
probability and statistics.
We define an experiment as a process that generates “observations” and a sample space (S)
as the set of all possible outcomes from the experiment:
simple event:
only one possible outcome
compound event: more than one outcome
As an example of simple and compound events consider particles (e.g. protons, neutrons) made
of u (“up”), d (“down”), and s (“strange”) quarks. The u quark has electric charge (Q) =2/3|e|
(e=charge of electron) while the d and s quarks have charge =-1/3|e|.
Let the experiment be the ways we combine 3 quarks to make a Q=0, 1, or 2 state.
Event A: Q=0 {ssu, ddu, sdu}
note: a neutron is a ddu state
Event B: Q=1{suu, duu}
note: a proton is a duu state
Event C: Q=2 {uuu}
For this example events A and B are compound while event C is simple.
The following definitions from set theory are used all the time in the discussion of probability.
Let A and B be events in a sample space S.
Union: The union of A & B (AB) is the event consisting of all outcomes in A or B.
Intersection: The intersection of A & B (AB) is the event consisting of all outcomes in A and B.
Complement: The complement of A (A´) is the set of outcomes in S not contained in A.
Mutually exclusive: If A & B have no outcomes in common they are mutually exclusive.
880.P20 Winter 2006
Richard Kass
Probability, Set Theory and Stuff
Returning to our example of particles containing 3 quarks (“baryons”):
The event consisting of charged particles with Q=1,2 is the union of B and C: BC
The events A, B, C are mutually exclusive since they do not have any particles in common.
A common and useful way to visualize union, intersection, and mutually exclusive is
to use a Venn diagram of sets A and B defined in space S:
B
B
A
B
S
S
Venn diagram of A&B
B
A
AB: intersection of A&B
S
A
AB: union of A&B
A
S
A & B mutually exclusive
The axioms of probabilities (P):
a) For any event A, P(A)≥0. (no negative probabilities allowed)
b) P(S)=1.
n
c) If A1, A2, ….An is a collection of mutually exclusive events then: P( A1  A2   An ) =  P( Ai )
i =1
(the collection can be infinite (n=∞))
From the above axioms we can prove the following useful propositions:
a) For any event A: P(A)=1-P(A´)
b) If A & B are mutually exclusive then P(AB)=0
c) For any two events A & B: P(AB)=P(A)+P(B)-P(AB)
880.P20 Winter 2006
items b, c are “obvious”
from their Venn diagrams
Richard Kass
Probability, Set Theory and Stuff
Example: Everyone likes pizza.
Assume the probability of having pizza for lunch is 40%, the probability of having
pizza for dinner is 70%, and the probability of having pizza for lunch and dinner is
30%. Also, this person always skips breakfast. We can recast this example using:
P(A)= probability of having pizza for lunch =40%
P(B)= probability of having pizza for dinner = 70%
P(AB)=30% (pizza for lunch and dinner)
1) What is the probability that pizza is eaten at least once a day?
The key words are “at least once”, this means we want the union of A & B
P(AB)=P(A)+P(B)-P(AB) = .7+.4-.3 =0.8
2) What is the probability that pizza is not eaten on a given day?
prop. c)
Not eating pizza (Z´) is the complement of eating pizza (Z) so P(Z)+P(Z´)=1
P(Z´)=1-P(Z) =1-0.8 = 0.2
prop. a)
3) What is the probability that pizza is only eaten once a day?
This can be visualized by looking at the Venn diagram and realizing we need to exclude the
overlap (intersection) region.
pizza for lunch
P(AB)-P(AB) = 0.8-0.3 =0.5
The non-overlapping blue area is pizza for lunch, no pizza for dinner.
The non-overlapping red area is pizza for dinner, no pizza for lunch.
pizza for dinner
880.P20 Winter 2006
Richard Kass
Conditional Probability
Frequently we must calculate a probability assuming something else has occurred.
This is called conditional probability.
Here’s an example of conditional probability:
Suppose a day of the week is chosen at random. The probability the day is Thursday is 1/7.
P(Thursday)=1/7
Suppose we also know the day is a weekday. Now the probability is conditional, =1/5.
P(Thursday|weekday)=1/5
the notation is: probability of it being Thursday given that it is a weekday
Formally, we define the conditional probability of A given B has occurred as:
P(A|B)=P(AB)/P(B)
We can use this definition to calculate the intersection of A and B:
P(AB)=P(A|B)P(B)
For the case where the Ai’s are both mutually exclusive and exhaustive we have:
n
P( B) = P( B | A1 ) P( A1 )  P( B | A2 ) P( A2 )  P( B | An ) P( An ) =  P( B | Ai ) P( Ai )
i =1
For our example let B=the day is a Thursday, A1= weekday, A2=weekend, then:
P(Thursday)=P(thursday|weekday)P(weekday)+P(Thursday|weekend)P(weekend)
P(Thursday)=(1/5)(5/7)+(0)(2/7)=1/7
880.P20 Winter 2006
Richard Kass
Bayes’s Theorem
Bayes’s Theorem relates conditional probabilities. It is widely used
in many areas of the physical and social sciences.
Let A1, A2,..Ai be a collection of mutually exclusive and exhaustive events with
P(Ai)>0 for all i.
Then for any other event B with P(B)>0 we have:
P( Ai  B)
P( Ai | B) =
=
P( B)
P( B | Ai ) P( Ai )
Probability it is a weekday given it is Thursday.
Probability it is Thursday given it is a weekday.
n
 P( B | A ) P( A )
j =1
j
j
We call: P(Aj) the aprori probability of Aj occurring
P(Aj|B) the posterior probability that Aj will occur given that B has occurred
P(B|Aj) the likelihood
Independence has a special meaning in probability:
Events A and B are said to be independent if P(A|B)=P(A)
Using the definition of conditional probability A and B are independent iff:
P(AB)=P(A)P(B)
880.P20 Winter 2006
Richard Kass
Independence Example
Let’s consider a situation where we are trying to determine the number of
events (N) we have in our data sample. This is a “classic” problem that comes in
many different situations:
ancient times: scanning emulsion or bubble/spark chamber photos
modern times: using computer algorithms to find rare events (top, beyond, GKZ, Bpp, etc)
Consider the case where “scan 1” finds N1 events and “scan 2” finds N2 events.
The number of events found by both scans is N12.
P(12)
What can we say about the total number of events, N?
P(2)
N1
N2
N12
P(1) =
P(2) =
P(1  2) =
N
N
N
If the scans are independent then: P(12)=P(1)P(2)
P(1)
N N
N
P(1) P(2) = P(1  2)  1  2 = 12
N N
N
NN
N= 1 2
N12
880.P20 Winter 2006
Richard Kass
Example of Bayes’s Theorem
While Bayes’s theorem is very useful in physics, perhaps the best
illustration of its use is in medical statistics, especially drug testing.
Assume a certain drug test:
gives a positive result 97% of the time when the drug is present:
P(positive test|drug present)=0.97
gives a positive result 0.4% of the time if the drug is not present (“false positive”)
P(positive test|drug not present)=0.004
Let’s assume that the drug is present in 0.5% of the population (1 out of 200 people).
P(drug present)=0.005
P(drug not present)=1-P(drug present)=0.995
What is the probability that the drug is not present and you have a positive test?
P(drug is not present|positive test)=????
Bayes’s Theorem gives:
P(drug not present | test positive ) =
P( test positive | drug not present ) P(drug not present )
P( test positive | drug not present ) P(drug not present )  P( test positive | drug present ) P(drug present )
P(drug not present | test positive ) =
(0.004)(1  0.005)
= 0.45
(0.004)(1  0.005)  (0.97)(0.005)
Thus there is a 45% chance that the test comes back positive even if you are drug free!
The real life consequence of this large probability is that drug tests are often administered twice!
880.P20 Winter 2006
Richard Kass
Another Example of Bayes’s Theorem
Assume we are trying to separate pions from kaons using (e.g.) a cerenkov counter (CC)
99% of the time when a pion goes through our CC we get a signal: P(signal|pion)=0.99
5.5% of the time when a kaon goes through our CC we get a signal: P(signal|kaon)=0.055
Suppose we are interested in looking for t-p-n
This decay of the t-lepton has never been observed, example of a 2nd class current! (wrong “G-parity”)
Theory says BF(t-p-n )=1.5x10-5. (ranges from 1.2-1.6x10-5)
There is a similar decay t--n that has been measured with BF(t--n )=27x10-5
P(t  pn ) =
BF (t  pn )
1.5  10 5
=
= 0.05
BF (t  Kn )  BF (t  pn ) 27  10 5  1.5  10 5
P(t  Kn ) =
BF (t  Kn )
27  10 5
=
= 0.95
BF (t  Kn )  BF (t  pn ) 27  10 5  1.5  10 5
Suppose we have an event where our CC gives us a signal.
What is the probability that the event is t-p-n ?
P(t  pn | signal ) =
P(signal | p ) P(t  pn )
P(signal | K) P(t  Kn )  P(signal | p ) P(t  pn )
P(t  pn | signal ) =
(0.99)(0.05)
= 0.49
(0.055)(0.95)  (0.99)(0.05)
Thus there is ~49% chance that an event with a CC signal is t-p-n L
880.P20 Winter 2006
Richard Kass
Bayes’s Theorem Continued
How can we improve the signal to noise (S/N)?
Suppose we add (or use) another independent particle detector where
90% of the time when a pion goes through our CC we get a signal
15% of the time when a kaon goes through our CC we get a signal
This is not such a great detector. But it really helps:
P(signal|pion)=(0.99)(0.9)=0.89
P(signal|kaon)=(0.055)(0.15)=0.00825
P(t  pn | signal ) =
(0.89)(0.05)
= 0.85
(0.00825)(0.95)  (0.89)(0.05)
Thus now is ~85% chance that an event with a CC signal is t-p-n J
Adding particle ID has improved our S/N:
No particle ID
S/N=1.5/27=0.06
One PID detector
S/N=.49/.51=0.96
Two PID detectorsS/N=0.85/0.15=5.7
S/N =
P(signal | p ) P(t  pn )
= (detector )( physics )
P(signal | K) P(t  Kn )n )
It is usually more practical to build two moderately good PID detectors than one
super high quality PID detector.
In BaBar we do PID with a cerenkov counter and the drift chamber.
880.P20 Winter 2006
Richard Kass
Accuracy and Precision
Accuracy: The accuracy of an experiment refers to how close the experimental measurement
is to the true value of the quantity being measured.
Precision: This refers to how well the experimental result has been determined, without
regard to the true value of the quantity being measured.
Just because an experiment is precise it does not mean it is accurate!!
example: measurements of the neutron lifetime over the years:
The size of bar
reflects the
precision of
the experiment
This figure shows
various measurements
of the neutron lifetime
over the years.
Steady increase in precision of the neutron lifetime but are any of these
measurements accurate?
880.P20 Winter 2006
Richard Kass
Measurement Errors (or Uncertainties)
Use results from probability and statistics as a way of calculating how “good” a measurement is.
most common quality indicator:
relative precision = [uncertainty of measurement]/measurement
example: we measure a table to be 10 inches with uncertainty of 1 inch.
relative precision = 1/10 = 0.1 or 10% (% relative precision)
Uncertainty in measurement is usually square root of variance:
s = standard deviation
s is usually calculated using the technique of “propagation of errors”.
However this s is not what most people think it is! We will discuss this in more detail soon.
Statistical and Systematic Errors
Results from experiments are often presented as:
N ± XX ± YY
N: value of quantity measured (or determined) by experiment.
XX: statistical error, usually assumed to be from a Gaussian distribution.
With the assumption of Gaussian statistics we can say (calculate) something about
how well our experiment agrees with other experiments and/or theories.
Expect ~ 68% chance that the true value is between N - XX and N + XX.
YY: systematic error. Hard to estimate, distribution of errors usually not known.
Examples: mass of proton = 0.9382769 ± 0.0000027 GeV (only statistical error given)
mass of W boson = 80.8 ± 1.5 ± 2.4 GeV (both statistical and systematic error given)
880.P20 Winter 2006
Richard Kass
Measurement Errors (or Uncertainties)
What’s the difference between statistical and systematic errors?
Statistical errors are “random” in the sense that if we repeat the measurement enough times:
XX  0
Example: the error in the mean sm:
If we repeat a measurement n times and each measurement has uncertainty s then: s m =
s
n
Systematic errors do not  0 with repetition.
examples of sources of systematic errors:
voltmeter not calibrated properly
a ruler not the length we think is (meter stick might really be < meter!)
Because of systematic errors, an experimental result can be precise, but not accurate!
How do we combine systematic and statistical errors to get one estimate of precision?
Can be a problem!
two choices:
stot = XX + YY add them linearly
stot = (XX2 + YY2)1/2 add them in quadrature
We will discuss a detailed averaging procedure next week………...
Some other ways of quoting experimental results
lower limit: “the mass of particle X is > 100 GeV”
upper limit: “the mass of particle X is < 100 GeV”
asymmetric errors: mass of particle X = 100 4
3 GeV
880.P20 Winter 2006
Richard Kass