Transcript a 2

Probability &Statistics
Lecture 8
1
1. Reminding: Probability Density Function (pdf); Cumulative
Distribution Function (cdf).
2. Relation between the Binomial, Poisson and Normal
distributions
3. Measures of Central Tendency: Expectation, Variance, higher
moments
4. Work in groups, Mathematica
2
Topic 1.
Probability distribution function
also called “cumulative distribution function” ( CDF)
Lect.7 (equation numbers start with 7 ).
1.
<this is transferred from
Continuous random variable
From the “outside”, random distributions are well described by the probability
distribution function (we will use CDF for short) F(x) defined as
x
F ( x )  P(  X  x )   f (y )dy

(7.13)
This formula can also be rewritten in the following very useful form:
P(a  X  b)  F (b)  F (a)
(7.14)
3
To see what the distribution functions look like, we return to our examples.
a. The uniform distribution
ax b
1
,
 b a
(7.15):f ( x )  
0

otherwise
Before solving the problems involving integration, please read the following file :
Self-Test and About the integrals
Using the definition (7.16) and Mathematica, try to find F(x) for the
uniform distribution. Prove that
F(x)=0 for x a; (x-a)/(b-a) for a  x  b; 1 for x>b.
Draw the CDF for several a and b. Consider an important special case
a=0, b=1. How is it related to the spinner problem? To the balanced die?

b. The exponential distribution (7.17): f ( x )   e
0

 x ,
x0
otherwise
4
c.

(Gamma)- and chi-square distributions
We already introduced the 2-parameter Gamma distribution
discussing the maximum likelihood. Its pdf is
x 1Exp[ x /  ]
f ( x) 
for x  0 and f ( x)  0 otherwise
  [ ]
 >0 and  >0 are the prameters. At  =1 f ( x) turns into the
exponential distribution (7.17)
Please plot f(x) (7.18) corresponding CDF for
A specific case of Gamma distribution with
(7.18) .
  1, 2, 4 and  =1.
   and   2
describes so called chi-square distribution with v degrees of
freedom. This distribution is very important in many statistical
applications, e.g. For the hypothesis testing.
5
d. The standard and general normal distribution
fsnd(x)=(2)-1/2 exp(-x2/2)
(7.19)
fnd(x)=(2 2)-1/2 exp[-(x- )2/2 2]
(7.20’)
Using Mathematica and Eq. (7.12), find F[x] for the SND, ND.
Use NIntegrate[f[t],{t,-,x}] and Plot[…] functions.
2. CDF for discrete random variables
For discrete variables the integration is substituted by summation:
F ( x )  P( X  x )   p( x )
i
x x
i
(7.21)
It is clear from this formula that if X takes only a finite number of values,
the distribution function looks like a stairway.
6
F(x)
p(x4)
1
p(x3)
p(x2)
p(x1)
x1
x2
x3
x4
x
7
Topic 2
Relation between binomial and normal distributions.
If n is large and if neither p nor q=1-p are too close to zero, the
binomial distribution can be closely approximated by the normal
distribution with
= np and  =(npq)1/2.
p( X , p, n) 
1
2 npq
2
( X np)
2npq
e
(8.2)
Here p is the probability function for the binomial distribution and X is
the random variable (the number of successes)
8
Let’s now open
Lect8/Mathematica/Lect8_BinPoisNormal.nb and run
some examples.
Topic 3: Measures of Central tendency
When a large collection of data is considered, as in a census or in a massive
blood test, we are usually interested not in individual numbers but rather in
certain descriptive numbers such as the expected value and the variance.
9
Definition of mathematical expectation
(a) Descrete random variables
A very important concept in probability and statistics is the mathematical
expectation or expected value, or briefly the expectation of a random variable.
For a discrete random variable X having the possible values x1,x2, …xn
n
E(X)= x1p(x1)+ x2p(x2) + …+ xnp(xn)=  x j p(x j )
j1
(8.3)
This can also be rewritten as
E( X )   x  p( x)
x
(8.3’)
where the summation in rhs is taken over all the points of the sample space
A special case of (8.3) is the uniform distribution where the probabilities are equal:
x  x ... xn
E( X )  1 2 n
(8.4)
10
Example
What is the expectation for the balanced die?
Roll one die an Let X be the number that appears. P(X=x) = 1/6,
X = 1,2,3,4,5,6.
EX = (1/6) (1 +2+… + 6) = (1/6)*[6(6+1)/2]= 3.5.
11
Expectation for a function of the random variable
The functions of Random variables are very important in
various applications. We will only briefly mention this important
concept. A simple example is the kinetic energy mv2/2 which is a
quadratic function of the random variable v, the velocity of a
particle.
Theorem Let X be a discrete random variable with probability
function p(X) and g(X) be a real-valued function of X. Then the
expected value of g(X) is given by
n
E[g(X)] =  g ( x ) p(x )
j
j1 j
(8.3'')
12
b. Continuous random variable
For a continuous random variable X having density function f(x), the
expectation of X is defined as

E( X )   xf (x)dx
(8.5)

The expectation of X is often called the mean of X and is denoted by X, or
simply . If g(x) is a function of random variable x, then the expected value can
be found as

E[g ( X )]   g (x) f (x)dx
(8.6)

13
Some Examples (check the results at home)
1. Suppose that a game is played with a single fair die. Assuming that the
player wins $20 for X=2 (the face 2 is up), $40 if X=4 and -$30 (loses) if X=6.
In addition, he neither wins nor loses if any other face turns up.
Find the expected sum of money to be won.
2. The density function of a random variable X is given by
f(x)=x/2 for 0<x<2, and 0 otherwise. Find the expected value of X.
3. The same with f (x) = 4x3 for 0<x<1, 0 otherwise.
14
Some Examples (answers)
1. Suppose that a game is played with a single fair die. Assuming that the
player wins $20 for X=2 (the face 2 is up), $40 if X=4 and -$30 (loses) if X=6.
In addition, he neither wins nor loses if any other face turns up.
Find the expected sum of money to be won.
Solution:
X ($)
p
E( X ) 
20
1/6
40
-30
1/6
1/6
0
1/2
n
 x j p( x j )  20 1 / 6  40 1 / 6  30 1 / 6  0 1 / 2  5
j 1
15
Example 2 The density function of a random variable X is given by
f(x)=x/2 for 0<x<2, and 0 otherwise. Find the expected value of X.
Solution
Let’s use Eq. 8.5,

E( X )   xf ( x)dx , and Mathematica:

The answer: 4/3
16
The interpretation of the expected value
In statistics, one is frequently concerned with the average values.
Average = (Sum of entries) : (number of entries)
For any finite experiment, where the number of entries is small
or moderate, the average of the outcomes is not predictable.
However, we will eventually prove that that the average will
usually be close to E(X) if we repeat the experiment a large
number of times.
Let us illustrate these observations using the
Lect8_AverageVsExpectation.nb
17
Expectation of a function of random variable
(see for details G&S, 6.1, pp. 229-230. Note that they use notation m(x)
(instead of p(x) ) for the probability function)
This discussion can be considered a “proof” of the equation (8.3’’).
Suppose that f(X) is a real value function of the discrete random variable X.
How to determine its expected value?
Let’s consider the example 6.6 of G&S, where X describes the outcomes of
tossing three coins, and Y , the function of X, is the number of “runs”.
The sequence of the identical consecutive outcomes (such as HH, or HHH,
or TT) is considered as one run (see 6.6 for details).
Three coin experiment can be described in terms of X and Y by the following
table:
18
HHT 2
Notice that this table describes indeed Y as a function of
X by mapping all possible outcomes x in the real
numbers 1,2 or 3. The fact that x are the “three letter
words” should not be confusing. They can equally be
described as numbers. Using, for instance, a convention
HTH 3
H->1 and T->2 we will get HHH-> 111, THT->212, etc.
HTT
To define the expected value of Y we can build a
probability function p(Y). It can be done with the help of
the same table, by grouping the values of X
corresponding a common Y-value and adding
their probabilities:
p(Y=1) = 2/8= ¼; p(2) = ½; p(3) = ¼.
X
Y
HHH 1
2
THH 2
THT
3
TTH
2
TTT
1
Now we can easily find the expected value of Y:
E(Y) = 1*1/4 + 2*1/2 + 3*1/4 = 2.
19
X
Y
HHH 1
HHT 2
HTH 3
HTT
2
THH 2
THT
3
TTH
2
TTT
1
But what if we did not group the values of X with a
common Y-value, but instead simply multiplied each
value of Y(X) by the corresponding probability of X. In
other words, we multiply each value from the right
column by the probability of the corresponding outcome
from the left column (=1/8):
1*(1/8) +2*{1/8) +3*(1/8) + 2*(1/8) + 2*(1/8) + 3*(1/8) +
2*(1/8) + 1*(1/8) = 16/8 = 2
.
This illustrates the following general rule:
If X and Y are two random variables, and Y can be
written as a function of X, then one can compute the
expected value of Y using the distribution function
of X.
In other words,
E( f ( X ))   f ( x) p( x)
x
(8.6)
20
The sum of two random variables and its expected values
Sometimes, the sum of random variables is easy to define. For instance, the
definition is straightforward with dies or coins:
One simply adds up the corresponding numbers (0 or 1 for the coins, 1 through
6 for the dice).
But what if X= die, and Y= coin. What does X+Y mean in this case? In such a
case it’s reasonable to define the “joint” random variable Z ={X,Y} whose
outcomes are the ordered pairs {x,y} where x = {0,1}, and Y= {1,2,3,4,5,6}.
Each outcome, such as {0,5} or {1,4}, has now a probability p = 1/12 and all
the characteristics of the distribution can be easily obtained (see Example 6.7
from G&S, p.230)
E(Z=X+Y) = (1/12) ({0,1}+{0,2} +… + {0,6})+ (1/12) ({1,1}+{1,2} +… +
{1,6}) =
(1/12) {0,21} + (1/12){6,21} = {1/2, 3.5}.
21
Theorem: Let X and Y be random variables with finite expected values. Then,
E(X+Y)= E(X) + E(Y) ; E(X-Y)= E(X) - E(Y)
(8.7)
and if c any constant, then
E(cX)=cE(X)
(8.8)
The prove is quite simple, and can be found in Grinstead&Snell, p. 231.
Let’s discuss (8.7).
We can consider the random variable X+Y as a result of applying function
f(x,y) = x + y to a joined random variable (X,Y). Then according to (8.6) we
have: E ( X  Y )    xi p ( xi , y j )    y j p ( xi , y j )
i
j
i
j
  xi p ( xi )   y j p ( y j )  E ( X )  E (Y )
i
j
we used here the conditions that
 p ( xi , y j )  p ( x )
i
j
 p ( xi , y j )  p ( y )
j
i
In plain language these conditions mean that the
probability that X = xi while Y takes any of allowed
values (or regardless the value of Y) is given by the
22
probability function of X, and visa versa.
An Important Comment:
In the derivation of (8.7) it is not assumed that the summands
were mutually independent.
The fact that the expectations add, whether or not the summands
are mutually independent, is sometimes referred to as the
First Fundamental Mystery of Probability.
23
b. The variance and standard deviation.
While the expectation of a random variable describes its “center”, the
variance describes how strongly the distribution is localized near its center.
In other words, the variance (or standard deviation) is a measure of the
dispersion, or scatter, of the values of the random variable about the mean
value. Let’s start with an example of the normal distribution Var ( x)   2
24
25
Let us now define the variance and have some practice
Var(X) = E[(X- )2]
(8.6)
The positive square root of variance is called “standard deviation” and
is given by

X
 Var( X )  E[( X   )2 ]
(8.7)
Where no confusion can result, the index in standard deviation is
dropped.
Remember:
Var = 2.
26
Discrete distribution
n
 X 2   ( x j   )2 p( x j )
j 1
(8.8)
Continuous distribution

2
 X   ( x   )2 f ( x)

(8.9)
27
Some theorems on variance
Their proofs can be found in G&S, 6.2
μ
Theorem 1.
 2  E[( X   )2]  E[ X 2] 2E[ X ]   2  E[ X 2] E[ X ]2
Theorem 2. If c any constant, Var(cX)= c2Var(X).
Theorem 3 If X and Y are independent random variable, then
Var(X+Y)=Var(X)+Var(Y)
Var(X-Y)=Var(X)+Var(Y)
Let’s prove Theorem 3.
28
Theorem 3 If X and Y are independent random variable, then
Var(X+Y)=Var(X)+Var(Y)
Var(X-Y)=Var(X)+Var(Y)
Prove
We will use the fact that for the independent X and Y, E(XY) = E(X)E(Y). It can
be proven using the definition of independence : P(XY) = P(X)P(Y).
Let E(X) = a and E(Y) = b.
Var(X+Y) = (according to Theorem 1) E[(X+Y)2 ]– (a + b)2 = E(X2) + E(2 XY) +
E(Y2) – a2- 2ab – b2=
= (E(X2) - a2) + (E(Y2) - b2) =(using once again Theorem 1) Var(X) + Var(Y).
Var(X+Y) = E((X-Y)2) – (a - b)2 = E(X2) - E(2 XY) + Y(Y2) – a2+ 2ab – b2=
= (E(X2) - a2) + (E(Y2) - b2) = Var(X) + Var(Y).
As promised, we used that E(2XY) = E(2X)E(Y) = 2 ab.
29
Working in groups
1 Expectation and Variance for Poisson distribution
(open
Lect8/Lect8_ClassPract_ExpectVar_forStudents.nb)
(1) Finding Expectation and Variance for the Poisson
(discrete) and Exponential (continuous)
distributions.
(2) Find (a) expectation, (b) variance and (c) the
standard deviation of the sum obtained in tossing of
two fair dice.
30