Central Limit Theorem

Download Report

Transcript Central Limit Theorem

Probability and Statistical
Inference (9th Edition)
Chapter 5 (Part 2/2)
Distributions of Functions of
Random Variables
November 25, 2015
1
Outline
5.5 Random Functions Associated with Normal
Distributions
5.6 The Central Limit Theorem
5.7 Approximations for Discrete Distributions
5.8 Chebyshev’s Inequality and Convergence in
Probability
2
Random Functions Associated with
Normal Distributions
Theorem: Assume that X1, X2,…, Xn are
independent random variables with
distributions N(μ1,σ12), N(μ2,σ22),…,
N(μn,σn2), respectively. Then,
n
 n
2
2
Y   ci X i is N   ci i ,  ci  i 
i 1
i 1
 i 1

n
3
Random Functions Associated with
Normal Distributions
Proof:
Recall the mgf of N(,2) is
2 2 2

 i ci t 

M Y t    M xi ci t   exp  i ci t 

2
i 1
i 1


n
n
 n
t2
 exp  t  i ci 
2
 i 1

ci  i 

i 1

n
2
2
n
n
i 1
i 1
Therefore, Y is N (  ci i ,  ci  i )
2
2
4
Random Functions Associated with
Normal Distributions
 Example 1: If X1 and X2 are independent normal
random variables N(µ1,σ12) and N(µ2,σ22),
respectively, then
X1 + X2 is N(µ1+µ2, σ12+σ22), and
X1 - X2 is N(µ1-µ2, σ12+σ22)
5
Random Functions Associated with
Normal Distributions
 Example 2: If X1,X2,…,Xn correspond to random samples
from a normal distribution N(μ,σ2), then the sample mean
1 n
is N(μ,σ2/n)
X 
n
i 1
X
i
 Proof:
 t  2 t 2 
  t  

M t   M    Exp  
2 
  n  
 n 2 n 



t


, since M t   Exp t 
n
n
x
2


 2

Therefore, X is N   ,


n


2
2


6
Random Functions Associated with
Normal Distributions
 One important implication of the distribution
of X is that it has a greater probability of
falling in an interval containing μ than does a
single sample Xk
 The larger the sample size n, the smaller the
variance of the sample mean
 “Mean” is a constant, but “sample mean” is a
random variable
7
Random Functions Associated with
Normal Distributions
 For example, assume that X1, X2,…, Xn are
random samples from N(50,16) distribution.
Then, X is N(50,16/n). The following figure
shows the pdf of X with different values of n
8
Random Functions Associated with
Normal Distributions
9
Random Functions Associated with
Normal Distributions
Recall:
Let Z1, Z2, …, Zn be i.i.d. N(0,1). Then, w =
Z12+ Z22+ …+ Zn2 is χ2(n)
Let X1, X2,…, Xn be independent chisquare random variables with k1, k2,…, kn
degrees of freedom, i.e., χ2(k1), χ2(k2),…,
χ2(kn), respectively. Then, Y=X1+X2+…+Xn
is χ2(k1+k2+…+kn)
10
Random Functions Associated with
Normal Distributions
 Theorem: Let X1, X2,…, Xn be random
samples from the N(μ,σ2) distribution. The
sample mean and sample variance are given
by
Then,
(a)
(b)
and
are independent
is χ2(n-1)
11
Random Functions Associated with
Normal Distributions
 We will accept (a) without proving it
 Proof of (b):
12
Random Functions Associated with
Normal Distributions
13
Random Functions Associated with
Normal Distributions
14
Random Functions Associated with
Normal Distributions
 It is interesting to observe that
n
U 
X i   
i 1
n
W
i 1
2
X
2
is  2 n  and

2
i
X
2
n  1
is

2

 That is, when the actual mean is replaced by the
sample mean, one degree of freedom is lost
15
Central Limit Theorem
 It is useful to first review some related theorems
 Theorem (Sample Mean): Let X1, X2, …, Xn be a
sequence of i.i.d. random variables with mean  and
variance 2. Then, the sample mean
is a random variable with mean  and variance 2/n
16
Central Limit Theorem
 Theorem (Strong Law of Large Numbers): Let X1, X2, …,
Xn be a sequence of i.i.d. random variables with mean .
Then, with probability 1,
That is,
(The sample mean converges almost surely, or
converges with probability 1, to the expected value)
17
Central Limit Theorem
 Theorem (Strong Law of Large Numbers)(Cont.):
 This theorem holds for any distribution of the Xi’s
 This is one of the most well-known results in probability
theory
18
Central Limit Theorem
 Theorem (Central Limit Theorem): Let X1, X2, …, Xn be
a sequence of i.i.d. random variables with mean  and
variance 2. Then the distribution of
is N(0,1) as
That is,
(convergence in distribution)
19
Central Limit Theorem
 Theorem (Central Limit Theorem)(Cont.):
 While
tends to “degenerate” to zero (Strong
Law of Large Numbers), the factor
in
“spreads out” the probability enough to prevent this
degeneration
20
Central Limit Theorem
 Theorem (Central Limit Theorem)(Cont.):
 One observation that helps make sense of this result is
that, in the case of normal distribution (i.e., X1, X2, …, Xn
are i.i.d. normal),
is N(,2/n)
 Hence,
value of n
is (exactly) N(0,1) for each positive
 Thus, in the limit, the distribution must also be N(0,1)
21
Central Limit Theorem
 Theorem (Central Limit Theorem)(Cont.):
 The powerful fact is that this theorem holds for any
distribution of the Xi’s
 It explains the remarkable fact that the empirical
frequencies of so many natural “populations” exhibit a
bell-shaped (i.e., normal) curve
 The term “central limit theorem” traces back to George
Polya who first used the term in 1920 in the title of a
paper. Polya referred to the theorem as “central” due to
its importance in probability theory
22
Central Limit Theorem
 The Central Limit Theorem and the Strong Law of Large
Numbers are the two fundamental theorems of
probability
23
Central Limit Theorem
 Example 1 (Normal Approximation to the Uniform Sum
Distribution (a.k.a. the Irwin-Hall Distribution)): Let Xi,
i=1,2,… be i.i.d. U(0,1). Compare the graph of the pdf of
Y=X1+X2+…+Xn, with the graph of the N(n(1/2), n(1/12))
pdf
n=2
24
Central Limit Theorem
n=4
25
Central Limit Theorem
 Example 2 (Normal Approximation to the Uniform Sum
Distribution (a.k.a. the Irwin-Hall Distribution)): Let Xi,
i=1,2,…,10 be i.i.d. U(0,1). Estimate P(X1+X2+…+X10 > 7)
 Solution: With
and by the central limit theorem,
26
Central Limit Theorem
 Example 3 (Normal Approximation to the Chi-Square
Distribution): Let X1,X2,…,Xn be i.i.d. N(0,1). Then,
is chi-square with n degrees of freedom, with E(Y)=n
and Var(Y)=2n
 Recall the pdf of Y is
 Let
27
Central Limit Theorem
 The pdf of W is given by
 Compare the pdf of W and the pdf of N(0,1):
n=20
n=100
28
Approximations for Discrete Distributions
 The beauty of the central limit theorem is that
it holds regardless of the underlying
distribution (even discrete)
29
Approximations for Discrete Distributions
 Example 4 (Normal Approximation to the
Binomial Distribution): X1,X2,… Xn are
random samples from a Bernoulli distribution
with μ=p and σ2 = p(1-p). Then,
Y=X1+X2+…+Xn is binomial b(n,p). The
central limit theorem states that
is N(0,1) as n approaches infinity
30
Approximations for Discrete Distributions
 Thus, if n is sufficiently large, the distribution of
Y is approximately N(np,np(1-p)), and the
probabilities for the binomial distribution b(n,p)
can be approximated with this normal
distribution, i.e.,
for sufficiently large n
31
Approximations for Discrete Distributions
 Consider n=10, p=1/2, i.e., Y~b(10,1/2). Then, by CLT, Y
can be approximated by the normal distribution with
mean 10(1/2)=5 and variance 10(1/2)(1/2)=5/2. Compare
the pmf of Y and the pdf of N(5,5/2):
32
Approximations for Discrete Distributions
 Example 5 (Normal Approximation to the Poisson
Distribution):
 Recall the Poisson pmf
where parameter
the distribution
is both the mean and variance of
 Poisson random variable counts the number of discrete
occurrences (sometimes called “events” or “arrivals”)
that take place during a time-interval of given length
33
Approximations for Discrete Distributions
 A random variable having a Poisson distribution
with mean 20 can be thought of as the sum Y of
the observations of a random sample of size 20
from a Poisson distribution with mean 1. Thus,
has a distribution that is approximately N(0,1),
and the distribution of Y is approximately
N(20,20)
34
Approximations for Discrete Distributions
 Compare the pmf of Y and the pdf of N(20,20):
35
Markov’s Inequality
 Theorem (Markov’s Inequality): If X is a
continuous random variable that takes only
nonnegative values, then for any a>0,
 The inequality is valid for all distributions
(discrete or continuous)
36
Markov’s Inequality
 Proof:
37
Markov’s Inequality
 Intuition behind Markov’s Inequality, using a fair
dice (discrete) example:
Then,
…
38
Chebyshev’s Inequality
 Theorem (Chebyshev’s Inequality): If X is a
continuous random variable with mean  and
variance 2, then for any k>0,
 The inequality is valid for all distributions
(discrete or continuous) for which the standard
deviation exists
39
Chebyshev’s Inequality
 Proof: Since (X-)2 is a nonnegative random
variable, we can apply Markov’s inequality (with
a=k2) to obtain
Thus,
40
Chebyshev’s Inequality (Another Form)
 Chebyshev’s Inequality (another form):
 Chebyshev’s inequality states that the probability
that X differs from its mean by at least k
standard deviations is less than or equal to 1/k2
 It follows that the probability that X differs from
its mean by less than k standard deviations is at
least 1-1/k2
41
Chebyshev’s Inequality
 The importance of Markov’s and Chebyshev’s
inequalities is that they enable us to derive
(sometimes loose but still useful) bounds on
probabilities when only the mean, or both the
mean and the variance, of the probability
distribution are known
42
Chebyshev’s Inequality
 Example 1: If it is known that X has a mean of
25 and a variance of 16, then, a lower bound for
P(17<X<33) is given by
and an upper bound for P(|X-25|>=12) is
 The results hold for any distribution with mean
25 and variance 16
43