Transcript lecture 6

‫פרקים נבחרים בפיסיקת החלקיקים‬
‫אבנר סופר‬
‫אביב ‪2007‬‬
‫‪6‬‬
Administrative stuff
• Projects status
• Other homework problems:
– Open questions in HW #1 (questions about the Quantum Universe) and
HW #3 (difference between D mixing and Bs mixing analyses) – we
will go over them when return from break
• The plan for the next few weeks:
– Statistics (with as many real examples as possible)
– Root and RooFit
– Practicing statistics and analysis techniques
• Lecture on Tuesday, April 10 (Mimona) instead of Monday
(Passover break)?
Why do we use statistics in EPP?
• Scientific claims need to be based on solid mathematics
– How confident are we of the result? What is the probability that we are wrong?
– Especially important when working at the frontier of knowledge:
extraordinary claims require extraordinary proof
• Proving something with high certainty is usually expensive
– Many first measurements are made with marginal certainty
• Statistical standards:
– “Evidence”
– “Observation”
Probability
•
•
•
Set S (sample space)
Subset A S
The probability P(A) is a real number that satisfies the
axioms
1.
2.
3.
P(A)  0
If A and B are disjoint subsets, i.e., A B = 0,
then P(A B) = P(A) + P(B)
P(S) = 1
Derived properties
•
•
•
•
P(!A) = 1 – P(A), where !A = S – A
P(A  !A) = 1
1  P(A)  0
P(null set) = 0
•
•
If A  B, then P(B)  P(A)
P(A  B) = P(A) + P(B) – P(A  B)
More definitions
• Subsets A and B are independent if P(A  B) = P(A) P(B)
• A random variable x is a variable that has a specific value for each
element of the set
• An element may have more than 1 random variables: x = {x1, …xn}
Interpretation of Probability
in data analysis
• Limiting relative frequency:
– Elements of the sample space S = possible outcomes of a repeatable
measurement
– The probability of a particular outcome e (= element of S) is
1
number of occurrence s of outcome e in n measuremen ts 
n  n
P(e)  lim
(note that the single element e belongs to a subset with one element = an
elementary subset)
– A non-elementary subset A corresponds to an occurrence of any of the
outcomes in the subset, with probability
P( A)   P(e)
eA
Example 1
• Element e = D mixing parameter y’ measured to be 0.01
• Subset A = y’ measured to be in range [0.005, 0.015]
• P(A) = fraction of experiments in which y’ is measured in [0.005, 0.015],
given that its true value is 0.002
Example 2
• e = (x’2, y’) measured to be (-0.0002, 0.01)
• A = (x’2, y’) measured to be anywhere outside the brown (“4s”) contour
• P(A) = fraction of experiments in which (x’2, y’)
are measured outside the contour,
given that their true values are the measured ones
Example 3
• e = error on CP-violating parameter q- measured to be 42
• A = q- error measured to be 42 or greater
• P(A) = fraction of experiments in which the q- error is measured to be 42
or greater
About the relative frequency interpretation
• Straightforward when measurements are repeatable:
– Particle collisions in an experiment
– Radioactive decays of identical nuclei
Physical laws don’t change
• Also works when measurements are repeatable only in principle :
– Measurement of the D mixing parameters using all the data we will ever have
– Measurement of the average height of all humans
Probability density functions
• Outcome of an experiment is a continuous random variable x
– Applies to most measurements in particle physics
• Define:
the probability density function (PDF) f(x), such that
f(x) dx = probability to observe x in [x, x+dx]
= fraction of experiments in which x will be measured in [x, x+dx]
• To satisfy axiom 3: P(S) = 1, normalize the PDF:
 f ( x)dx  1
S
The PDF and finite number of observations
• A set of nmeas measurements xm (m=1…nmeas) can be presented as a
histogram:
nb (b=1…nbins) = number of measurements for which x falls in bin b
• nb / nmeas = probability for a measurement to be in bin b.
– b nb / nmeas = 1
• nb / (nmeas Dxb) = (discrete) probability density function
• Continuum limit (infinite number of observations, infinitely fine binning):
f ( x)  lim nb / (n meas Dxb )
nmeas 
Dxb 0
Cumulative distribution
x
•
The cumulative distribution of f(x) is
F ( x) 
 f ( x' )dx'
-
• Alternatively: F(x) = probability to obtain measurement whose value is < x
f(x) = dF(x)/dx (for differentiable F(x))
a-point xa is the value of x such that F(xa) = a, where 1  a  0.
Or: xa = F-1(a)
• Median = x½ = value of x such that F(x½) = ½
• Mode = xmode such that f(xmode) > f(all other values of x)
•
– may not be useful or unique if f(x) has multiple local maxima
Extension to multi-variable PDFs
• For f(x), x = {x1, … xn},
the a-point turns into an
a-contour of dimension n-1
• Marginal PDFs:
– fx(x) = f(x,y) dy
– fy(y) = f(x,y) dx
• x and y are independent variables if
f(x,y) = fx(x) fy(y)
– Also called uncorrelated variables
Functions of random variables
• a(x) is a continuous function of random variable x, which has PDF f(x)
– E.g., a = x2, a = log(x), etc.
• What is the PDF g(a)?
• Require equal probabilities in corresponding infinitesimal regions:
g(a) da = f(x) dx

g(a) = f(x(a)) |dx/da|
Abs value to keep PDF positive
Assumes a(x) can be inverted
da
a
dx
x
Example
• The CP-violation phases a,b,g are not measured directly. We measure
cos f or sin f or sin 2f, then transform to the phases:
Multiple-valued x(a)
• If a(x) is not uniquely invertable, need to add up the different contributions.
da
a
dx1
dx2
dS(a) = sum of 2 regions
x
g (a )da 
 f ( x' )dx'
dS
for a(x) uniqely invertable we get the previous result :
x ( a  da )


f ( x' )dx' 
x(a)
x ( a )
dx
da
da

f ( x' )dx'  f ( x)
x(a)
dx
da
da
Functions of multiple random variables
• What is g(a) for a(x), x = {x1, … xn}
 f (x )dx ...dx
g (a)da 
1
dS is the hypersurface in x that encloses [a, a+da]
n
dS
• Example.: z = xy, what is f(z) given g(x) and h(y)?
f ( z )dz   g ( x)h( y )dxdy 
dS

So

( z  dz ) / | x|

-
z / | x|
-
 g ( x)dx
 h( y)dy 

g ( x ) h( z / x )
g ( z / y ) h( y )
f ( z)  
dx  
dy
| x|
| y|
-
-
• f(z) is the Mellin convolution of g(x) h(y)
 g ( x ) h( z / x )
dz
dx
| x|
• Another example: z = x + y
f ( z )dz   g ( x)h( y )dxdy 
dS
So
f ( z) 

z - x  dz

-
z-x
-
 g ( x)dx  h( y)dy   g ( x)h( z - x)dzdx


-
-
 g ( x)h( z - x)dx   g ( z - x)h( y)dy
• f(z) is the familiar Fourier convolution of g(x) and h(y).
• Recall from the D mixing analysis: The measured decay time t is the true
decay time t’ (distribution P(t’)) + a random detector error Dt (distribution
r(Dt):
 q 2 (t ) 2

qD
- t 
2
2
D
P(t )  e
( x'  y ' )  R t R y ' 
 pD

4
pD


In reality Observe
F(t)   P(t ' )r (t - t ' )dt '
Multiple functions of multiple random
variables
• g(a1, …. an) = f(x1, … xn) |J|, where the Jacobian is
x1 / a1
x1 / a2
...
J  x2 / a1 x2 / a2 ...
...
...
...
• To determine the marginal distribution gi(ai), need to integrate g(a1, …. an)
over the aji variables
Expectation values
• The expectation value of a random variable x distributed according to the
PDF f(x):

E[ x]  x 
 x f ( x)dx
-
• Also called population mean
• E[x] is the most commonly used location parameter (others are the apoint xa and the mode)
• The expectation value of a function a(x) is


-
-
E[a]   a g (a)da   a ( x) f ( x)dx
Moments

E[ x n ] 
• The nth algebraic moment of f(x):
n
'
x
f
(
x
)
dx

m
n

-
– Note that the population mean m is the special case m’1

• The nth central moment
E[( x - E[ x]) n ]   ( x - m ) n f ( x)dx  m n
-

• In particular,
E[( x - E[ x]) 2 ]   ( x - m ) 2 f ( x)dx  m 2  V [ x]
-
is the population variance of f(x)
• The standard deviation
s  V [x]
gives an idea of the spread of f(x)
Mean and variance of functions
• Take a function of many random variables: a(x). Then


-
-
E[a ( x)]  m a   a g (a )da   a ( x ) f ( x )dxn ...dxn


-
-
V [a ( x)]  s a2  E[( a - m a ) 2 ]   (a - m a ) 2 g (a )da   (a - m a ) 2 f ( x )dxn ...dxn
Covariance
• For 2 random variables x, y, the covariance cov[x,y] or Vxy is
Vxy  E[( x - m x )( y - m y )]  E[ xy] - m x m y 


  xy f ( x, y)dxdy - m m
x
y
- -
• For 2 functions a(x), b(x), the covariance is
Vab  E[( a - m a )(b - mb )]  E[ab] - m a mb 


  ab g (a, b)da db - m m
a
- -


-
-
  ...  a ( x )b( x ) f ( x ) dx1...dxn - m a mb
• Note that Vab = Vba and Vaa = sa2
• The dimensionless correlation coefficient is rab = Vab / (sa sb)
– Note that 1  rab  -1
b
Understanding covariance and correlation
• Vxy =E[(x - mx)(y - my)] is the expectation value of the product of the
deviations from the means.
• If having x > mx increases the probability of having y > my then Vxy > 0,
x and y are positively correlated
• If having x > mx increases the probability of having y < my then Vxy < 0,
x and y are negatively correlated or anti-correlated.
• For independent variables (defined as f(x,y) = fx(x) fy(y)),
we find E[xy] = E[x] E[y] = mx my so Vxy = 0.
• Does Vxy = 0 necessarily mean that the variables are independent?...
Covariance and correlation
• …No. E.g.,
Propagation of errors
• Take n random variables x with unknown PDF f(x), but with E[x] and Vij
known (or estimated)
• Take the function y(x). What are E[y] and V[y]?
– Remember: we don’t know f(x).
• Expand y:
 y 
y ( x)  y ( m )     ( xi - mi )
i 1  xi  x  m
n
E[ y ( x)]  y ( m ), since
E[ xi - mi ]  0
 y 
E[ y ( x)]  y ( m )  2 y ( m )   E[ xi - mi ]
i 1  xi  x  m
n
2
2
 n  

 n  y 

y
 E     ( xi - mi )   
( x j - m j ) 


 j 1  x j 
 i 1  xi  x  m


xm



 n  

 n  y 

y

E     ( xi - mi )   
( x j - m j ) 


 j 1  x j 
 i 1  xi  x  m


x

m





Evaluate
n 
 y y 
y y 
 
 E ( xi - mi )( x j - m j )   
 cov[ xi , x j ]

x

x

x

x
i , j 1 
i , j 1 
 i j  x  m
 i j  x  m

n

So :
 y y 
s  E[ y ( x)] - y ( m )   
 cov[ xi , x j ]
i , j 1 
 xi x j  x  m
n
2
y
2
2
Simlarly, for m functions y1( x ),...y m( x ) :
 yk yl 
cov[ yk , yl ]   
 cov[ xi , x j ]

x

x
i , j 1 
j 
 i
 xm
n
 y 
In matrix notation : U  DVD T , where Dij   k 
 xi  x  m
Why is this “error propagation”?
Because we often estimate errors from covariances
Special cases
For y  x1  x2 :
For y  x1 x2 :
s y2  s 12  s 22  2V12
s y2
y2

s 12
x12

s 22
x22
2
V12
x1 x2
For uncorrelat ed xi x j :
2
 y 
2
s
i

i 1  xi  x  m
n
s y2   
 y y 
cov[ yk , yl ]    k l  s i2
i 1  xi xi  x  m
n
Note: These formulae don’t work if y is significantly non-linear
within a distance si around the mean m
Orthogonal transformation of variables
• It is often useful to work in variables in which the covariance matrix is
diagonal: cov[yi,yj] = si2 dij
• This can always achieved with a linear transformation:
n
yi   Aij x j
j 1
where the rows of the transformation matrix Aij are the eigenvectors of
cov[xi,xj].
• Then si2 are the eigenvalues of cov[xi,xj]
Visualize for 2 dimensions
• Recall the definition of the correlation coefficient rab = Vab / (sa sb). So we
can write
 s 12
rs 1 s 2 

cov[ x1 , x2 ]  V  
2
s 2 
 rs 1 s 2
The eigenvalue s of V are found by solving | V - I | 0 :
2
1
   s 12  s 22  s 12  s 22  - 41 - r 2 s 12s 22 
2

The transform ation matrix is a simple rotation :
 cos q
A  
 - sin q
with
sin q 

cos q 
 2 rs s
1
q  tan -1  2 1 22
2
 s1 - s 2
Eigenvector 1
Eigenvector 2



More on linear variable transformations
• The uncorrelated variables yi have a simpler covariance matrix, but may
not always correspond to physically interesting quantities
• E.g., in D mixing, x’2 and y’ have a very high correlation coefficient of
r= -0.94
• But they are the physically interesting variables…