Basic principles of probability theory

Transcript Basic principles of probability theory

Some standard univariate probability distributions
•
•
•
•
•
Characteristic, moment generating, cumulant generating functions
Discrete distribution
Continuous distributions
Some distributions associated with normal
References
Characteristic, moment generating and cumulant
generating functions
Characteristic function is defined as the expectation of the function - eitx

C (t )   eitx f ( x)dx

Moment generating function is defined as (the expectation of etx ):

M (t )   etx f ( x)dx

Moments can be calculated in the following way. Obtain derivative of M(t) and take
the value of it at t=0
n
d
M (t )
n
E( x ) 
dtn t 0
Cumulant generating function is defined as the natural logarithm of the characteristic
function
c. g. f .  log(C(t ))
Discrete distributions: Binomial
Let us assume that we carry out an experiment and result of the experiment is either
“success” or “failure”. Probability of “success” is p. Then probability of failure
will be q=1-p. We carry experiments n times. What is the probability of k
successes:
n k
n!
n k
p(k )  P( X  k )    p (1  p) 
p k (1  p)nk
k!(n  k )!
k 
Characteristic function:
C(t )  ( pe(it )  1  p)n
Moment generating function:
M (t )  ( pe(t )  1  p)n
Find first and second moments
Discrete distributions: Poisson
When the number of trials (n) is large and the probability of successes (p) is small
and np is finite and tends to  then binomial distribution converges to Poisson
distribution:
k
p(k )  e( )

k!
, k  0,1,2,, , and   0
Poisson distribution describes the distribution an event that occurs rarely in a short
period. It is used for example in counting statistics to describe the number of
the registered photons.
Characteristic function is:
C (t )  e( (e(it )  1))
What is the first moment?
Discrete distributions: Negative Binomial
Consider experiment: Probability of “success” is p and probability of “failure” is
q=1-p. We carry out experiment until k-th success. We want to find probability
of j failures. (It is called sequential sampling. Sampling is carried out until
some stopping rule is satisfied). If we have j failure then it means that the
number of trials is k+j. Last trial was success. Then probability that we will
have exactly j failures is:
 k  j  1 k 1 j
 k  j  1 k j
p( j )  P( X  j )  
 p q p  
 p q , j  0,1,2,, , ,
j 
j 


It is called negative binomial because coefficients are from negative binomial series:
p-k=(1-q)-k
Characteristic function is:
C(t )  pk (1  qe(it ))k
What is the moment generating function? What is the first moment?
Continuous distributions: uniform
Simplest form of the continuous distribution is the uniform with density:
1


f ( x)   b  a

 0
if a  x  b
otherwise
Distribution is:
0
x a
F ( x)  
b  a
1
xa
a xb
xb
Moments and other properties are calculated easily.
Continuous distributions: exponential
Density of exponential distribution has the form:
f ( x)  e(x), 0  x  
This distribution has two origins.
1)
Maximum entropy. If we know that random variable is non-negative and we
know its first moment – 1/ then maximum entropy distribution has the
exponential form.
2)
From Poisson type random processes. If probability distribution of j(x) events
occurring during time interval (0;x] is a Poisson with mean value x then
probability of time elapsing till the first event occurs has the exponential
distribution. Let Tr denotes time elapsed until r-th event
P( j ( x)  r )  P(Tr  x)  1  Fr ( x)
Putting r=1 we get e(- x). Taking into account that P(T1>x) = 1-F1(x) and getting its
derivative wrt t we arrive to exponential distribution
This distribution together with Poisson is widely used in reliability studies, life testing
etc.
Continuous distributions: Gamma
Gamma distribution can be considered as a generalisation of the exponential
distribution. It has the form:
f r ( x) 
r x r 1e(x)
(r  1)!
, 0 x
It is the probability of time x elapsing before r events happens
Characteristic function of this distribution is:
it
C (t )  (1  )  r

This distribution is widely used in many application. One of the application is the use
in prior probability generation for sample variance. For this inverse Gamma
distribution is used (by changing variable y = 1/x we get inverse Gamma).
Gamma distribution can be generalised to non-integer values of r also (by
putting (r) instead of (r-1)! )
Continuous distributions: Normal
Perhaps the most popular and widely used continuous distribution is the normal
distribution. Main reason for this is that usually random variable is the sum of
the many random variables. According to central limit theorem under some
conditions (for example: random variables are independent. first and second
moments exist and finite then distribution of sum of random variables
converges to normal distribution)
Density of the normal distribution has the form
1
( x   )2
f ( x) 
e( 
)
2

2 
Another remarkable fact is that if we know only know mean and variance then
maximum entropy distribution is the normal.
Its characteristic function is:
C(t )  e(it  t 2 2 )
Exponential family
Exponential family of distributions has the form
f ( x)  e( A( ) B( x)  E( x)  D( ))
Many distributions are special case of this family.
Natural exponential family of distributions is the subclass of this family:
f ( x)  e( A( ) x  E ( x)  D( ))
Where A() is a natural parameter.
If we use the fact that distribution should be normalised to 1 then characteristic
function of the natural exponential family with natural parameter A() = 
can be derived to be:
C (t )  e( D( )  D(  it ))
Try to derive it. Hint: use the normalisation fact. Find D() and then use
expression of characteristic function and D() .
This distribution is used for fitting generalised linear models.
Continuous distributions: 2
Normal variables are called standardized if their mean is 0 and variance is 1.
Distribution of the sum of the squares of n standardized normal random variables is
2 with n degrees of freedom.
Density function is:
1
n 1
1
f ( x)  1
e(  x ) x 2 , 0  x  
n
2
n
2 2 ( )
2
1
If there are p linear restraints on the random variables then degree of freedom
becomes n-p.
Characteristic function for this distribution is:
C (t )  (1  2it )
1
 n
2
2 is used widely in statistics for such tests as goodness of fit of a model to the
experiment.
Continuous distributions: t and F-distributions
Two more distributions are closely related with the normal distribution. One of them
is Student’s t-distribution. It is used to test if mean value of the sample is
significantly different from 0. Another and similar application is for tests of
differences of means of two different samples. Distribution of ratio of the
random variables with standardised normal distribution to the square root of the
random variable with 2 distribution is t-distribution.
Fisher’s F-distribution is a distribution of ratio of the variances of two different
samples. It is used to test if their variances of two samples are different. One of
the important application is in ANOVA.
Reference
Johnson, N.L. & Kotz, S. (1969, 1970, 1972) Distributions in Statistics, I:
Discrete distributions; II, III: Continuous univariate distributions, IV:
Continuous multivariate distributions. Houghton Mufflin, New York.
Mardia, K.V. & Jupp, P.E. (2000) Directional Statistics, John Wiley &
Sons.

Basic principles of probability theory

Transcript Basic principles of probability theory

Directory