Transcript Lecture 6

Outline of lecture 6
 Convergence concepts
 Definitions and relations between convergence concepts
 Sufficient conditions for almost sure convergence
 Convergence via transforms
 The law of large numbers and the central limit theorem
Probability theory 2008
Coin-tossing: relative frequency of heads
1
0.9
Relative frequency of heads
0.8
0.7
Convergence of
each trajectory?
0.6
Series1
0.5
Series2
Series3
Convergence in
probability?
0.4
0.3
0.2
0.1
0
0
50
100
150
200
250
300
350
Probability theory 2008
400
Convergence to a constant
The sequence {Xn} of random variables converges almost
surely to the constant c iff
P({ ; Xn()  c as n   }) = 1
The sequence {Xn} of random variables converges in
probability to the constant c iff, for all  > 0,
P({ ; | Xn() – c| > })  0 as n  
Probability theory 2008
An (artificial) example
Let X1, X2,… be a sequence of independent binary random variables such
that
P(Xn = 1) = 1/n and P(Xn = 0) = 1 – 1/n
Does Xn converge to 0 in probability?
Does Xn converge to 0 almost surely?
Common exception set?
Probability theory 2008
The law of large numbers for random variables
with finite variance
Let {Xn} be a sequence of independent and identically distributed random
variables with mean  and variance 2, and set
Sn = X1 + … + Xn
Then
P(|
Sn
  |   )  0 as n  , for all   0
n
Proof: Assume that  = 0. Then
.
P(|
Sn
|  ) 
n
Var (
Sn
)
n
2
Probability theory 2008
Convergence to a random variable: definitions
The sequence {Xn} of random variables converges almost surely to the
random variable X iff
P({ ; Xn()  X() as n   }) = 1
a.s.
Notation: X 
X as n  
n
The sequence {Xn} of random variables converges in probability to the
random variable X iff, for all  > 0,
P({ ; | Xn() – X()| > })  0 as n  
Notation:
Probability theory 2008
p
Xn 

X as n  
Convergence to a random variable: an example
Assume that the concentration of NO in air is continuously
recorded and let Xt, be the concentration at time t.
Consider the random variables:
Y  max 0  t 1 X t
Yn  max  X 0 , X 1/ n , X 2 / n , ... , X 1 
Does Yn converge to Y in probability?
Does Yn converge to Y almost surely?
Probability theory 2008
Convergence in distribution: an example
Let Xn  Bin(n, c/n). Then the distribution of Xn converges to
a Po(c) distribution as n  
Binomial and Poisson distributions (n = 20, c p== 0.1)
0.1)
0.30
Probability
0.25
0.20
Binomial
Poisson
0.15
0.10
0.05
.
0.00
0
1
2
3
4
5
6
7
Probability theory 2008
8
9
10
Convergence in distribution and in norm
The sequence Xn converges in distribution to the random variable X as
n   iff
FX n ( x)  FX ( x) as n  
for all x where FX(x) is continuous.
Notation:
d
Xn 

X as n  
The sequence Xn converges in quadratic mean to the random variable X
as n   iff
E | X n  X |2  0 as n  
Notation:
Probability theory 2008
2
Xn 

X as n  
Relations between the convergence concepts
Almost sure
convergence
Convergence
in probability
Convergence
in r-mean
Probability theory 2008
Convergence
in distribution
Convergence in probability implies
convergence in distribution
Observe that, for all  > 0,
P X n  x 
 PX n  x | X n  X |   
 PX n  x | X n  X |   
Probability theory 2008
Convergence almost surely convergence in r-mean
Consider a branching process in which the offspring distribution has mean 1.
Does to converge to zero almost surely?
Does it converge to zero in quadratic mean?
Let X1, X2,… be a sequence of independent random variables such that
P(Xn = n2) = 1/n2 and P(Xn = 0) = 1 – 1/n2
Does Xn converge to 0 in probability?
Does Xn converge to 0 almost surely?
Does Xn converge to 0 in quadratic mean?
Probability theory 2008
Relations between different types of
convergence to a constant
Almost sure
convergence
Convergence
in probability
Convergence
in r-mean
Probability theory 2008
Convergence
in distribution
Convergence via generating functions
Let X, X1, X2, … be a sequence of nonnegative, integervalued random variables, and suppose that
g X n (t )  g X (t ) as n  
Then
d
Xn 

X as n  
Is the limit function of a sequence of
generating functions a generating
function?
Probability theory 2008
Convergence via moment generating functions
Let X, X1, X2, … be a sequence of random variables, and
suppose that
 X (t )   X (t ) as n  , for | t |  h
n
Then
d
Xn 

X as n  
Is the limit function of a sequence of
moment generating functions a
moment generating function?
Probability theory 2008
Convergence via characteristic functions
Let X, X1, X2, … be a sequence of random variables, and
suppose that
 X (t )   X (t ) as n  , for    t  
n
Then
d
Xn 

X as n  
Is the limit function of a sequence of
characteristic functions a
characteristic function?
Probability theory 2008
Convergence to a constant
via characteristic functions
Let X1, X2, … be a sequence of random variables, and
suppose that
 X (t )  eitc as n  
n
Then
p
Xn 

c as n  
Probability theory 2008
The law of large numbers
(for variables with finite expectation)
Let {Xn} be a sequence of independent and identically
distributed random variables with expectation , and set
Sn = X1 + … + Xn
Then
.
Sn
p
Xn 


 as n  
n
Probability theory 2008
The strong law of large numbers
(for variables with finite expectation)
Let {Xn} be a sequence of independent and identically
distributed random variables with expectation , and set
Sn = X1 + … + Xn
Then
Xn 
.
S n a.s.
  as n  
n
Probability theory 2008
The central limit theorem
Let {Xn} be a sequence of independent and identically distributed random
variables with mean  and variance 2, and set
Sn = X1 + … + Xn
Then
S n  n d

 N (0 ,1) as n  
 n
Proof: If  = 0, we get
t
t   t
t 

 Sn (t )   Sn ( )   X ( )   1   o( ) 
n 
n 
n   2n
n
n
.
Probability theory 2008
2
2
n
Rate of convergence in the central limit theorem
E| X |3
sup x | FSn n ( x)  ( x) |  0.7975 3
 n
 n
Example: XU(0,1)
.
E| X |3
1/ 4
8.3
0.7975 3
 0.7975

 n
1 /(12 12 ) n
n
Probability theory 2008
Sums of exponentially distributed random variables
gamma(10;1)
N(10;sqr(10))
0.14
1.20
0.12
1.00
Cumulative distribution function
Probability density
gamma(10;1)
0.10
0.08
0.06
0.04
0.02
N(10;sqr(10))
0.80
0.60
0.40
0.20
0.00
0.00
0
5
10
15
20
25
0
Probability theory 2008
5
10
15
20
25
Convergence of empirical distribution functions
Fn ( x) 
# observations  x
n
d
n ( Fn ( x)  F ( x)) 

N (0, F ( x)(1  F ( x) )
Proof: Write Fn(x) as a sum of indicator functions
Bootstrap techniques: The original distribution is replaced with the empirical distribution
Probability theory 2008
Resampling techniques
- the bootstrap method
Resampled data
Observed data
62
90
22
41
34
67
88 79
39
73
58
x1* , x2* , ...60
, xN*
Sampling with
replacement
58
90 88 79
41
22
44
70 60
34
41
44
70 60
85
85
x
88
Probability theory 2008
x1* , x2* , ..., x N*
Characteristics of infinite sequences of events
Let {An, n = 1, 2, …} be a sequence of events, and define

A*  lim inf n An  

A
m
n 1 m  n

A  lim sup n An  
*

A
m
n 1 m  n
Example: Consider a queueing system and let
An = {the queueing system is empty at time n}
Probability theory 2008
The probability that an event occurs infinitely often
- Borel-Cantelli’s first lemma
Let {An, n = 1, 2, …} be an arbitrary sequence of events. Then

 P( A )  
n 1
n
 P( An i.o.)  0
Is the converse true?
Example: Consider a queueing system and let
An = {the queueing system is empty at time n}
Probability theory 2008
The probability that an event occurs infinitely often
- Borel-Cantelli’s second lemma
Let {An, n = 1, 2, …} be a sequence of independent events. Then

 P( A )  
n 1
n
 P( An i.o.)  1
Probability theory 2008
Necessary and sufficient conditions for almost sure
convergence of independent random variables
Let X1, X2, … be a sequence of independent random variables. Then
a.s.
X n 
0 as n   

 P(| X
n 1
Probability theory 2008
n
|  )  
Exercises: Chapter VI
6.1, 6.4, 6.5, 6.8, 6.9, 6.14, 6.17, 6.24 6.30, 6.31
Probability theory 2008