Probability and Statistical Review

Download Report

Transcript Probability and Statistical Review

Random Variable and Probability Distribution
Outline of Lecture
 Random Variable
– Discrete Random Variable.
– Continuous Random Variables.
 Probability Distribution Function.
– Discrete and Continuous.
– PDF and PMF.
 Expectation of Random Variables.
 Propagation through Linear and Nonlinear model.
 Multivariate Probability Density Functions.
 Some Important Probability Distribution
Functions.
2
Random Variables
 A random variables are functions that associate a
numerical value to each outcome of an experiment.
– Function values are real numbers and depend on “chance”.
 The function that assigns value to each outcome is
fixed and deterministic.
– The randomness is due to the underlying randomness of the
argument of the function X.
– If we roll a pair of dice then the sum of two face values is a
random variable.
 Random numbers can be Discrete or Continuous.
– Discrete: Countable Range.
– Continuous: Uncountable Range.
3
Discrete Random Variables
 A random variable X and the corresponding
distribution are said to be discrete, if the number of
values for which X has non-zero probability is finite.
 Probability Mass Function of X:
f ( x) 
pj
when x  x j
0
otherwise
 Probability Distribution Function of x:
F ( x )  P( X  x )
 Properties of Distribution Function:
monotonically increasing
Right Continuous
0  F ( x)  1
P(a  x  b)  F (b)  F (a)
4
Examples
 X denote the number of heads when a biased coin with probability
of head p is tossed twice.
– X can take value 0, 1 or 2.
0
x0
(1  p)2 0  x  1
F ( x) 
p(1  p)1  x  2
p2
x2
 X denote the random variable that is equal to sum of two fair
dices.
– Random variable can take any integral value between 1 and 12.
5
Continuous Random Variables and Distributions
 X is a continuous random variable if there exists a
non-negative function f(x) defined for real line having
the property that
x
P( X  x )  F ( x ) 

f ( y)dy

F '( x)  f ( x)
 The integrand f(y) is called a probability density
function.

 Properties:

f ( x)dx  1

b
P (a  X  b)  F (b)  F (a )   f ( x)dx
a
6
Continuous Random Variables and Distributions
 Probability that a continuous random variable will assume any
particular value is zero.
a
P( X  a)   f ( x)dx  0
a
 It does not mean that event will never occur.
– Occur infrequently and its relative frequency will converge to zero.
– f(a) large Probability mass is very dense.
– f(a) small  Probability mass is not very dense.
 f(a) is the measure of how likely it is that random variable will be
a 
near a.
P(a    X  a   ) 
 f ( x)dx  2 f (a)
a
7
Difference Between PDF and PMF
 Probability density function does not defines a probability but
probability density.
– To obtain the probability we must integrate it in an interval.
 Probability mass function gives the true probability.
– It does not need to be integrate to obtain the probability.
a
b
 Probability distribution function is either continuous or has a jump
discontinuity.
1) P(a  X  b) 3) P(a  X  b)
2) P(a  X  b) 4) P(a  X  b)
– Are they equal?
8
Statistical Characterization of Random Variables
 Recall, a random number denote the numerical attribute assigned
to an outcome of an experiment.
 We can not be certain which value of X will be observed on a
particular trial.
 Will average of all the values will be same for two different set of
trials?
x1  x2 
x
n
 xn
y1  y2 
y
n
 yn
 Recall, probability approx. equal to relative frequency.
– Approx. Np1 number of xi’s have value u1
x1  x2 
x
n
 xn

np1u1 
 npmum
n
  ui pi
9
Statistical Characterization of Random Variables
 Expected Value:
–The expected value of a discrete random variable, x
is found by multiplying each value of random
variable by its probability and then summing over all
values of x.
Expected value of x: E[ x]   xP( x)   xf ( x)
x
x
– Expected value is equivalent to center of mass concept.
r   mi   ri  mi
– That’s why name first moment also.
– Body is perfectly balanced abt. Center of mass
 The expectation value of x is the “balancing point” for the
probability mass function of x
– Expected value is equal to the point of symmetry in case of
symmetric pmf/pdf.
10
Statistical Characterization of Random Variables
 Law of Unconscious Statistician (LOTUS): We can
take an expectation of any function of a random
variable.
Expected value of (y=g(x)) = E[y]= yf y  g(x)f x
y
x
 This balance point is the value expected for g(x) for
all possible repetitions of the experiment involving the
random variable x.
 Expected value of a continuous density function f(x),
is given by

E ( x) 
 xf ( x)dx

11
Example
 Let us assume that we have agreed to pay $1 for
each dot showing when a pair of dice is thrown. We
are interested in knowing, how much we would lose
on the average?
Values of x
Frequency
Values of
Probability Function
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
5
4
3
2
1
P(x=2) = 1/36
P(x=3) = 2/36
P(x=4) = 3/36
P(x=5) = 4/36
P(x=6) = 5/36
P(x=7) = 6/36
P(x=8) = 5/36
P(x=9) = 4/36
P(x=10) = 3/36
P(x=11) = 2/36
P(x=12) = 1/36
Sum
36
1.00
Probability
Distribution
Function
P(x2) = 1/36
P(x3) = 3/36
P(x4) = 6/36
P(x5) = 10/36
P(x6) = 15/36
P(x7) = 21/36
P(x8) = 26/36
P(x9) = 30/36
P(x10) = 33/36
P(x11) = 35/36
P(x12) = 1
 Average amount we pay=
(($2x1)+($3x2)+……+($12x1))/36=$7
 E(x)=$2(1/36)+$3(2/36)+……….+$12(1/36)=$7
12
Example (Continue…)
 Let us assume that we had agreed to pay an amount
equal to the squares of the sum of the dots showing
on a throw of dice.
– What would be the average loss this time?
 Will it be ($7)2=$49.00?
 Actually, now we are interested in calculating E[x2].
– E[x2]=($2)2(1/36)+……….+($12)2(1/36)=$54.83  $49
– This result also emphasized that (E[x])2  E[x2]
13
Expectation Rules
 Rule 1: E[k]=k; where k is a constant
 Rule 2: E[kx] = kE[x].
 Rule 3: E[x  y] = E[x]  E[y].
 Rule 4: If x and y are independent
E[xy] = E[x]E[y]
 Rule 5: V[k] = 0; where k is a constant
 Rule 6: V[kx] = k2V[x]
14
Variance of Random Variable
 Variance of random variable, x is defined as
V ( x)   2  E[( x   )2 ]
V ( x)  E[ x 2  2  x   2 ]
 E[ x 2 ]  2( E[ x]) 2  ( E[ x]) 2
 E[ x 2 ]  ( E[ x]) 2
This result is also known as “Parallel Axis Theorem”
15
Propagation of moments and density function through
linear models
 y=ax+b
– Given:  = E[x] and 2 = V[x]
– To find: E[y] and V[y]
E[y] = E[ax]+E[b] = aE[x]+b = a+b
V[y] = V[ax]+V[b] = a2V[x]+0 = a2 2
 Let us define
z
(x  )

Here, a = 1/  and b = - / 
Therefore, E[z] = 0 and V[z] = 1
z is generally known as “Standardized variable”
16
Propagation of moments and density function through
non-linear models
 If x is a random variable with probability density function p(x) and y =
f(x) is a one to one transformation that is differentiable for all x then the
probability function of y is given by
– p(y)=p(x)|J|-1, for all x given by x=f-1(y)
– where J is the determinant of Jacobian matrix J.
 Example:
Let y  ax 2 and p( x) 
1
 x 2
exp( x 2 / 2 x2 )
NOTE: for each value of y there are two values of x.
1
p( y ) 
exp( y / 2a x2 ),  y  0
2 x 2 ay
and
p(y) = 0, otherwise
We can also show that
E( y)  a x2 and V ( y)  2a4 x4
17
Random Variables
 One random number depicts one physical phenomenon.
– Web server.
 Just an extension to random variable
– A vector random variable X is a function that assigns a vector of
real number to each outcome in the sample space.
– e.g. Sample Space = Set of People.
–
Random vector=[X=weight, Y=height of a person].
 A random point (X,Y) has more information than X or Y.
– It describes the joint behavior of X and Y.
 The joint probability distribution function:
F ( X , Y )  P({X  x} {Y  y})
 What Happens: x  
x  
y
y
18
Random Vectors
 Joint Probability Functions:
– Joint Probability Distribution Function:
F ( X )  P[{X1  x1} {X 2  x2} ......... {X n  xn}]
– Joint Probability Density Function:
n F ( X )
f ( x) 
X1X 2 ...X n
 Marginal Probability Functions: A marginal probability
functions are obtained by integrating out the
variables that are of no interest.
F ( x)   P ( x, y ) or
y
y 

f ( x, y )dy
y 
19
Multivariate Expectations

 xf
E( X ) 
( x)dx
X

What abt. g(X,Y)=X+Y

f X ( x) 

f X ,Y ( x, y )dy


E( X ) 
E (Y ) 
 xf
 
( x)dx 
X
  xf

 

 
 yf

Y
( y )dy 
X ,Y
 

E ( g ( X )) 
  yf
X ,Y
 g ( x) f
X
( x, y )dxdy
( x, y )dydx
 
( x)dx 

  g ( x) f
X ,Y
( x, y )dydx
 
 
E (h(Y )) 
  h( y ) f
X ,Y
( x, y )dxdy
 
 
E ( g ( X , Y )) 
  g ( x, y ) f
X ,Y
( x, y )dxdy
 
20
Multivariate Expectations
 Mean Vector:
E[x]  [ E[ x1 ] E[ x2 ] ...... E[ xn ]]
 Expected value of g(x1,x2,…….,xn) is given by
E[ g (x)]    ..... g (x) f ( x) or
xn xn1
x1
  ..... g (x) f (x)dx
xn xn-1
x1
 Covariance Matrix:
cov[x]  P  E[(x   )(x   )T ]  E[xxT ]  T
where, S  E[xxT ] is known as autocorrelation matrix.
 1 0  0   1
 0   0  
2
  21
NOTE: P  R  




 0 0   n    n1
12
1
n2
1n   1 0 
 2 n   0  2 


1   0
0
0
0 


  n 
R is the correlation matrix
21
Covariance Matrix
 Covariance matrix indicates the tendency of each
pair of dimensions in random vector to vary together
i.e. “co-vary”.
 Properties of covariance matrix:
– Covariance matrix is square.
– Covariance matrix is always +ive definite i.e. xTPx > 0.
– Covariance matrix is symmetric i.e. P = PT.
– If xi and xj tends to increase together then Pij > 0.
– If xi and xj are uncorrelated then Pij = 0.
22
Independent Variables
 Recall, two random variables are said to be independent
if knowing values of one tells you nothing about the other
variable.
– Joint probability density function is product of the marginal
probability density functions.
– Cov(X,Y)=0 if X and Y are independent.
– E(XY)=E(X)E(Y).
 Two variables are said to be uncorrelated if cov(X,Y)=0.
– Independent variables are uncorrelated but vice versa is not
true.
 Cov(X,Y)=0Integral=0.
– It tells us that distribution is balanced in some way but says
nothing abt. Distribution values.
– Example: (X,Y) uniformly distributed on unit circle.
23
Gaussian or Normal Distribution
 The normal distribution is the most widely known and used distribution
in the field of statistics.
– Many natural phenomena can be approximated by Normal
distribution.
 Central Limit Theorem:
– The central limit theorem states that given a distribution with a mean
 and variance 2, the sampling distribution of the mean approaches
a normal distribution with a mean  and a variance  2/N as N, the
sample size increases.
 Normal Density Function:
f ( x) 
1
e
 2
0.399

( x   )2
2 2
x 
-2 -
 + +2
24
Multivariate Normal Distribution
 Multivariate Gaussian Density Function:
1
f ( X) 
n
2 R
1
2
e
T 1
 1

  2  X μ  R  X μ  


 How to find equal probability surface?

1

Xμ
2

T
R
1
 Xμ  constant
 More ever one is interested to find the probability of x lies inside
the quadratic hyper surface
– For example what is the probability of lying inside 1-σ ellipsoid.
1
R  CΣC
T
P   zi2  c 2    f ( z )dV
V
Y  C( X  μ)
Yi
zi 
i
z12  z22 
 zn2  c 2
 1
 2
 1

 0



 0

0
1
 22
0

0 


0 
Σ


1 
 n2 
25
Multivariate Normal Distribution
 Yi represents coordinates based on Cartesian
principal axis system and σ2i is the variance along the
principal axes.
 Probability of lying inside 1σ,2σ or 3σ ellipsoid
decreases with increase in dimensionality.
n\c
1
2
3
1 0.683 0.955 0.997
2 0.394 0.865 0.989
Curse of Dimensionality
3 0.200 0.739 0.971
26
Summary of Probability Distribution Functions
Probability
Distribution
Discrete
Parameters
Characteristics Probability
Function
Binomial
0  p  1 and n  0,1, 2,
Skewed unless
p=0.5
M=0…n, N=0,1,2…
Hypergeometric n=0…N
Poisson
>0
n
Cx p x q n x
M
Skewed
Skewed
positively
Continuous
C x N  M Cn  x
N
Cn
 x e 
Symmetric
about 
Standardized
Normal
Symmetric
about zero
1 x2
e
2
Exponential
Skewed
Positively
-     and   0
0
np
n
M
N
npq
nM ( N  M )( N  n)
N 2 ( N  1)



2
0
1
1/
1/2
x!
1
e
 2
Normal
Mean Variance
( x   )2
2 2
2
 e T
A distribution is skewed if it has most of its values either to the right or to the left of its mean
27
Properties of Estimators
 Unbiasedness
– On average the value of parameter being estimated is equal
to true value.
E[xˆ ]  x
 Efficiency
– Have a relatively small variance.
– The values of parameters being estimated should not vary
with samples.
 Sufficiency
– Use as much as possible information available from the
samples.
 Consistency
– As the sample size increases, the estimated value
approaches the true value.
28