ch2 (Review_of_Probability)

Download Report

Transcript ch2 (Review_of_Probability)

Review of Probability
1
Probability Theory:

Many techniques in speech processing
require the manipulation of probabilities
and statistics.

The two principal application areas we will
encounter are:
 Statistics
pattern recognition.
 Modeling of linear systems.
2
Events:

It is customary to refer to the probability of
an event.

An event is a certain set of possible
outcomes of an experiment or trial.

Outcomes are assumed to be mutually
exclusive and, taken together, to cover all
possibilities.
3
Axioms of Probability:

To any event A we can assign a number,
P(A), which satisfies the following axioms:
 P(A)≥0.
 P(S)=1.
 If
A and B are mutually exclusive, then
P(A+B)=P(A)+P(B).

The number P(A) is called the probability
of A.
4
Axioms of Probability (some consequence):

Some immediate consequence:
 If
A is the complement of A, then
 ( A  A)  S
P( A )  1  P( A)
 P(ø) ,the probability of the impossible event,
 P(A)

is 0.
≤ 1.
If two event A and B are not mutually
exclusive, we can show that
 P(A+B)=P(A)+P(B)-P(AB).
5
Conditional Probability:

The conditional probability of an event A,
given that event B has occurred, is defined
P( AB)
as:
P( A | B) 
P( B)

We can infer P(B|A) by means of Bayes’
theorem:
P( B)
P( B | A)  P( A | B)
P( A)
6
Independence:
Events A and B may have nothing to do
with each other and they are said to be
independent.
 Two events are independent if

P(AB)=P(A)P(B).
Independence -> mutually exclusive
Mutually exclusive +> independence
P( A | B)  P( A)
 From the definition of conditional
P( B | A)  P( B)
probability:
P( A  B)  P( A)  P( B)  P( A) P( B)
7
Independence:

Three events A,B and C are independent
only if:
 P( AB)  P( A) P( B)
 P( AC )  P( A) P(C )


P
(
BC
)

P
(
B
)
P
(
C
)

 P( ABC )  P( A) P( B) P(C )
8
Random Variables:
A random variable is a number chosen at
random as the outcome of an experiment.
 Random variable may be real or complex
and may be discrete or continuous.

 In
S.P. ,the random variable encounter are
most often real and discrete.

We can characterize a random variable by
its probability distribution or by its
probability density function (pdf).
9
Random Variables (distribution function):

The distribution function for a random
variable y is the probability that y does not
exceed some value u,
Fy (u )  P( y  u )

and
P(u  y  v)  Fy (v)  Fy (u )
10
Random Variables (probability density function):

The probability density function is the
derivative of the distribution:
d
f y (u )  Fy (u )
du

and,
v
P(u  y  v)   f y ( y)dy
u
Fy ()  1





f y ( y)dy  1
11
Random Variables (expected value):
We can also characterize a random
variable by its statistics.
 The expected value of g(x) is written
E{g(x)} or <g(x)> and defined as


Continuous random variable:

 g ( x)   g ( x) f ( x)dx


Discrete random variable:
 g ( x)   g ( x) p( x)
x
12
Random Variables (moments):
The statistics of greatest interest are the
moment of p(x).
 The kth moment of p(x) is the expected
k
value of x .


For a discrete random variable:
mk  x   x p( x)
k
k
x
13
Random Variables (mean & variance):

The first moment, m1,is the mean of x.
 Continuous:

x   xf ( x)dx

 Discrete:
  x  x   xp( x)
x

The second central moment, also known
as the variance of p(x), is given by
 2   ( x  x ) 2 p ( x)
x
 m2  x 2
14
Random Variables …:

To estimate the statistics of a random
variable, we repeat the experiment which
generates the variable a large number of
times.
 If
the experiment is run N times, then each
value x will occur Np(x) times, thus
1
ˆk 
m
N
1
̂ x 
N
N
k
x
 i
i 1
N
x
i 1
i
15
Random Variables (Uniform density):

A random variable has a uniform density
on the interval (a, b) if :
 0,

 Fx ( x)  ( x  a) /(b  a),
1,



xa
a xb
xb
1 /(b  a), a  x  b
f x ( x)  
otherwise
0,
1
 
(b  a ) 2
12
2
16
Random Variables

(Gaussian density):
The gaussian, or normal, density function
is given by:
1
 ( x   ) 2 / 2 2
n( x;  ,  ) 
e
 2
17
Random Variables (…Gaussian density):

The distribution function of a normal
variable is:
x
N ( x;  ,  )   n(u;  ,  )du


If we define error function as
erf ( x) 

Thus,
1
2

x
e

u 2 / 2
du
1
x
N ( x;  ,  )  erf (
)


18
Two Random Variables:

If two random variables x and y are to be
considered together, they can be described in
terms of their joint probability density f(x, y) or,
for discrete variables, p(x, y).

Two random variable are independent if

p ( x, y )  p ( x ) p ( y )
19
Two Random Variables(…Continue):

Given a function g(x, y), its expected
value is defined as:  
 Continuous:  g ( x, y ) 
  g ( x, y) f ( x, y)dxdy
 

Discrete:
 g ( x, y )   g ( x, y ) p( x, y )
x, y

And joint moment for two discrete random variable is:
mij   x y p( x, y )
i
j
x, y
20
Two Random Variables(…Continue):

Moments are estimated in practice by averaging
repeated measurements:
1 N i j
mˆ ij   x y
N  1

A measure of the dependence of two random
variable is their correlation and the correlation of
two variable is their joint second moment:
m11  xy   xyp( x, y )
x, y
21
Two Random Variables(…Continue):

The joint second central moment of x , y is
their covariance:
 xy  ( x  x )( y  y )  m11  x y


If x and y are independent then their covariance is zero.
The correlation coefficient of x and y is
their covariance normalized to their
standard deviations:
 xy
rxy 
 x y
22
Two Random Variables(…Gaussian Random Variable):

Two random variables x and y are jointly
gaussian if their density function is :
n ( x, y ) 
1
2 x y
 Where

 x 2 2rxy y 2 
1

exp  

 2 
2
2
1 r 2
 2(1  r )   x  x y  y 
 xy
rxy 
 x y
23
Two Random Variables(…Sum of Random Variable):

The expected value of the sum of two
random variables is :
 x  y  x    y 

This is true whether x and y are independent or not
 And
also we have :
 cx  c  x 
x
i
i
   xi 
i
24
Two Random Variables(…Sum of Random Variable):

The variance of the sum of the two independent
random variable is :



2
x y
  
2
x
2
y
If two random variable are independent, the
probability density of their sum is the convolution
of the densities of the individual variables :

 Continuous:
 Discrete:
f x y ( z)   f x (u) f y ( z  u)du
px y ( z) 


 p (u) p ( z  u)
u  
x
y
25
Central Limit Theorem

Central Limit Theorem (informal
paraphrase):
If many independent random variable are
summed, the probability density function
(pdf) of the sum tends toward the gaussian
density, no matter what their individual
densities are.
26
Multivariate Normal Density

The normal density function can be generalized
to any number of random variables.
 Let
x be the random vector,
Col[ X 1 , X 2 ,..., X n ]
 1

n / 2
1
N ( x)  (2 )
| R | exp  Q( x  x )
2


 Where
1
Q( x  x )  ( x  x ) R ( x  x )
T
 The
matrix R is the covariance matrix of x
(R is Positive-Definite)
R  ( x  x )( x  x ) 
T
27
Random Functions :
A random function is one arising as the
outcome of an experiment.
 Random function need not necessarily be
functions of time, but in all case of interest
to us they will be.
 A discrete stochastic process is
characterized by many probability density
of the form,

p( x1 , x2 , x3 ,..., xn , t1 , t2 , t3 ,..., tn )
28
Random Functions :

If the individual values of the random
signal are independent, then
p( x1 , x2 ,..., xn , t1 , t2 ,..., tn )  p( x1 , t1 ) p( x2 , t2 )... p( xn , tn )

If these individual probability densities are
all the same, then we have a sequence of
independent, identically distributed
samples (i.i.d.).
29
mean & autocorrelation

The mean is the expected value of x(t) :
x (t )  x(t )   xp( x, t )
x

The autocorrelation function is the
expected value of the product x(t1 ) x(t2 ) :
r (t1 , t2 )  x(t1 ) x(t2 )   x1 x2 p( x1 , x2 ,t1 , t2 )
x1 , x2
30
ensemble & time average

Mean and autocorrelation can be determined in
two ways:
 The experiment can be repeated many times
and the average taken over all these
functions. Such an average is called
ensemble average.
 Take any one of these function as being
representative of the ensemble and find the
average from a number of samples of this one
function. This is called a time average.
31
ergodic & stationary

If the time average and ensemble average
of a random function are the same, it is
said to be ergodic.

A random function is said to be stationary
if its statistics do not change as a function
of time.

Any ergodic function is also stationary.
32
ergodic & stationary

In stationary signal we have:
x (t )  x
p( x1 , x2 , t1 , t2 )  p( x1 , x2 , )
  t2  t1
 And the autocorrelation function is :

Where
r ( )   x1 x2 p( x1 , x2 , )
x1 , x2
33
ergodic & stationary

When x(t) is ergodic, its mean and
autocorrelation is :
1 N
x  lim
x(t )

N  2 N
t  N
N
1
r ( )  x(t ) x(t   )  lim
x(t ) x(t  )

N  N
t  N
34
cross-correlation

The cross correlation of two ergodic
random functions is :
1
rxy ( )  x(t ) y (t   )  lim
N  N

N
 x(t ) y(t   )
t  N
The subscript xy indicates a cross-correlation.
35
Random Functions (power & cross spectral density):

The Fourier transform of r ( ) (the
autocorrelation function of an ergodic
random function) is called the power

spectral density of x(t) :
 j
S ( )   r ( )e
  

The cross-spectral density of two ergodic
random function is :
S xy ( ) 

r


 
xy
( )e
 j
36
Random Functions (…power density):

For ergodic signal x(t), r ( ) can be written
as:
r ( )  x( )  x( )
 Then
from elementary Fourier transform properties,
S ( )  X ( ) X ( )
 X ( ) X  ( )
| X ( ) |
2
37
Random Functions (White Noise):

If all values of a random signal are
uncorrelated,
2
r ( )    ( )
 Then

this random function is called white noise
The power spectrum of white noise is constant,
S ( )   2

White noise is mixture of all frequencies.
38
Random Signal in Linear Systems :

Let T[ ] represent the linear operation; then
 T [ x(t )]  T [ x(t ) ]

Given a system with impulse response h(n),
 y (n)  x(n)  h(n)  x(n)  h(n)

A stationary signal applied to a linear system
yields a stationary output,
ryy ( )  rxx ( )  h( )  h( )
S yy ()  S xx () | H () |
2
39