Limits and the Law of Large Numbers

Download Report

Transcript Limits and the Law of Large Numbers

Lecture XIV
w represent the entire random sequence
{Zt}. As discussed last time, our interest
typically centers around the averages of this
sequence:
 Let
1 n
bn    t 1 Z t
n
 Definition
2.9: Let {bn(w)} be a sequence of
real-valued random variables. We say that
bn(w) converges almost surely to b, written
bn   b
a.s.
if and only if there exists a real number b
such that
P : bn    b  1
 The
probability measure P describes the
distribution of w and determines the joint
distribution function for the entire sequence
{Zt}.
 Other common terminology is that bn(w)
converges to b with probability 1 (w.p.1) or
that bn(w) is strongly consistent for b.
 Example
2.10: Let
1 n
Z n  t 1 Z t
n
where {Zt} is a sequence of independently
and identically distributed (i.i.d.) random
variables with E(Zt)=m<. Then
Z n  m
a.s.
by the Komolgorov strong law of large
numbers (Theorem 3.1).
2.11: Given g: RkRl (k,l<∞) and
any sequence {bn} such that
 Proposition
bn b
a. s .
where bn and b are k x 1 vectors, if g is
continuous at b, then
g bn   g b
a. s .
 Theorem



2.12: Suppose
y=Xb0+e;
X’e/n a.s. 0;
X’X/a.s.M, finite and positive definite.
bn exists a.s. for all n sufficiently large,
and bna.s.b0.
 Then
Since X’X/n a.s.M, it follows from
Proposition 2.11 that det(X’X/n) a.s.det(M).
Because M is positive definite by (iii),
det(M)>0. It follows that det(X’X/n)>0 a.s.
for all n sufficiently large, so (X’X/n)-1 exists
a.s. for all n sufficiently large. Hence
 Proof:

bˆn  X ' X
n
1
X' y
n
 In
addition,

ˆ
bn  b0  X ' X
 It
n
1
X 'e
n
follows from Proposition 2.11 that
a.s.
1
ˆ
b0  b0  M  0  b0
A
weaker stochastic convergence concept is
that of convergence in probability.
 Definition 2.23: Let {bn(w)} be a sequence of
real-valued random variables. If there exists
a real number b such that for every
e > 0,
P : bn    b  e  1
as n , then bn(w) converges in probability
to b.
 The
almost sure measure of probability takes
into account the joint distribution of the
entire sequence {Zt}, but with convergence
in probability, we only need to be concerned
with the joint distribution of those elements
that appear in bn(w).
 Convergence in probability is also referred to
as weak consistency.
 Theorem
2.24: Let { bn(w)} be a sequence of
random variables. If
bn b, thenbn 
b
a.s.
p
If bn converges in probability to b, then there
exists a subsequence {bnj} such that
bn j  b
a. s.
 Definition
2.37: Let {bn(w)} be a sequence of
real-valued random variables. If there exists
a real number b such that


E bn    b  0
r
as n   for some r > 0, then bn(w)
converges in the rth mean to b, written as
bn   b
r .m
 Proposition
2.38: (Jensen’s inequality) Let g:
R1R1 be a convex function on an interval B
 R1 and let Z be a random variable such
that P[ZB]=1. Then g(E(Z))  E(g(Z)). If g
is concave on B, then g(E(Z)) E(g(Z)).
 Proposition
2.41: (Generalized Chebyshev
Inequality) Let Z be a random variable such
that E|Z|r < , r > 0. Then for ever e > 0
P Z  e  
 
E Z
r
e
r
 Theorem
2.42: If bn(w)r.m. b for some
r > 0, then bn(w)p b.
 Proposition
3.0: Given restrictions on the
dependence, heterogeneity, and moments of
a sequence of random variables {Zt},
Zn  mn 0
a. s .
where
1 n
Z n  t 1 Z t and m n  E Z n 
n
 Theorem
3.1: (Komolgorov) Let {Zt} be a
sequence of i.i.d. random variables. Then
Z n  m
a.s.
if and only if E|Zt| <  and E(Zt) = m.
 This result is consistent with Theorem 6.2.1
(Khinchine) Let {Xi} be independent and
identically distributed (i.i.d.) with E[Xi] = m.
Then
Xn 
 m
P
 Proposition
3.4: (Holder’s Inequality) If
p > 1 and 1/p+1/q=1 and if E|Y|p <  and
E|Z|q < , then E|YZ|[E|Y|p]1/p[E|Z|q]1/q.
 If p=q=2, we have the Cauchy-Schwartz
inequality
  EZ 
EYZ   E Y
2
1
2
2
1
2
 Under
the traditional assumptions of the
linear model (fixed regressors and normally
distributed error terms) bn is distributed
multivariate normal with:
E  bˆn   b 0
1
2
ˆ
V  b n    0  X ' X 
for any sample size n.
 However,
when the sample size becomes
large the distribution of bn is approximately
normal under some general conditions.
 Definition
4.1: Let {bn} be a sequence of
random finite-dimensional vectors with joint
distribution functions {Fn}. If Fn(z)  F(z) as
n   for every continuity point z, where F
is the distribution function of a random
variable Z, then bn converges in distribution
to the random variable Z, denoted
bn 
 Z
d
 Other
ways of stating this concept are that bn
converges in law to Z:
bn 
 Z
L
Or, bn is asymptotically distributed as F
A
bn ~ F
In this case, F is called the limiting
distribution of bn.
 Example
4.3: Let {Zt} be a i.i.d. sequence of
random variables with mean m and variance
2 < . Define
bn

Z

n  E Z n 
V Z 
n
1
2
1
 
n
1
2

n
t 1
Z t  m 
Then by the Lindeberg-Levy central limit
theorem (Theorem 6.2.2),
bn ~ N 0,1
A

 Theorem
(6.2.2): (Lindeberg-Levy) Let {Xi} be
i.i.d. with E[Xi]=m and V(Xi)=2. Then
ZnN(0,1).
 Definition 4.8: Let Z be a k x 1 random
vector with distribution function F. The
characteristic function of Z is defined as
f l   Eexpil ' Z 
where i2=-1 and l is a k x 1 real vector.
 Example
4.10: Let Z~N(m,2). Then

f l   exp ilm  l 
 This
2
2
2
proof follows from the derivation of the
moment generating function in Lecture VII.
 Specifically,
note the similarity between the
definition of the moment generating function
and the characteristic function:
M X t   Eexptx 
f l   Eexpilz 
 Theorem
4.11 (Uniqueness Theorem) Two
distribution functions are identical if and
only if their characteristic functions are
identical.
 Note
that we have a similar theorem for
moment generating functions.
 Proof of Lindeberg-Levy:

First define f(l) as the characteristic function for
Zt-m and let fn(l) be the characteristic function
of
1
n Z n  m n 
 
n  n
1
2

n
t 1
Zt  m 


By the structure of the characteristic function we
have
 l 
f n l   f 

 n 
  l 
ln f n l   n ln f 
 
   n 
n

Taking a second order Taylor series expansion of
f(l) around l=0 gives
f l   1   l
2 2
 
o l
2
2
Thus,

2
l
ln f n l   n ln 1 
2n
 
2
l
o
2
l

as n  
n
2
 Thus,
by the Uniqueness Theorem the
characteristic function of the sample
approaches the characteristic function of the
standard normal.