Probability and Statistical Review
Download
Report
Transcript Probability and Statistical Review
Random Variable and Probability Distribution
Outline of Lecture
Random Variable
– Discrete Random Variable.
– Continuous Random Variables.
Probability Distribution Function.
– Discrete and Continuous.
– PDF and PMF.
Expectation of Random Variables.
Propagation through Linear and Nonlinear model.
Multivariate Probability Density Functions.
Some Important Probability Distribution
Functions.
2
Random Variables
A random variables are functions that associate a
numerical value to each outcome of an experiment.
– Function values are real numbers and depend on “chance”.
The function that assigns value to each outcome is
fixed and deterministic.
– The randomness is due to the underlying randomness of the
argument of the function X.
– If we roll a pair of dice then the sum of two face values is a
random variable.
Random numbers can be Discrete or Continuous.
– Discrete: Countable Range.
– Continuous: Uncountable Range.
3
Discrete Random Variables
A random variable X and the corresponding
distribution are said to be discrete, if the number of
values for which X has non-zero probability is finite.
Probability Mass Function of X:
f ( x)
pj
when x x j
0
otherwise
Probability Distribution Function of x:
F ( x ) P( X x )
Properties of Distribution Function:
monotonically increasing
Right Continuous
0 F ( x) 1
P(a x b) F (b) F (a)
4
Examples
X denote the number of heads when a biased coin with probability
of head p is tossed twice.
– X can take value 0, 1 or 2.
0
x0
(1 p)2 0 x 1
F ( x)
p(1 p)1 x 2
p2
x2
X denote the random variable that is equal to sum of two fair
dices.
– Random variable can take any integral value between 1 and 12.
5
Continuous Random Variables and Distributions
X is a continuous random variable if there exists a
non-negative function f(x) defined for real line having
the property that
x
P( X x ) F ( x )
f ( y)dy
F '( x) f ( x)
The integrand f(y) is called a probability density
function.
Properties:
f ( x)dx 1
b
P (a X b) F (b) F (a ) f ( x)dx
a
6
Continuous Random Variables and Distributions
Probability that a continuous random variable will assume any
particular value is zero.
a
P( X a) f ( x)dx 0
a
It does not mean that event will never occur.
– Occur infrequently and its relative frequency will converge to zero.
– f(a) large Probability mass is very dense.
– f(a) small Probability mass is not very dense.
f(a) is the measure of how likely it is that random variable will be
a
near a.
P(a X a )
f ( x)dx 2 f (a)
a
7
Difference Between PDF and PMF
Probability density function does not defines a probability but
probability density.
– To obtain the probability we must integrate it in an interval.
Probability mass function gives the true probability.
– It does not need to be integrate to obtain the probability.
a
b
Probability distribution function is either continuous or has a jump
discontinuity.
1) P(a X b) 3) P(a X b)
2) P(a X b) 4) P(a X b)
– Are they equal?
8
Statistical Characterization of Random Variables
Recall, a random number denote the numerical attribute assigned
to an outcome of an experiment.
We can not be certain which value of X will be observed on a
particular trial.
Will average of all the values will be same for two different set of
trials?
x1 x2
x
n
xn
y1 y2
y
n
yn
Recall, probability approx. equal to relative frequency.
– Approx. Np1 number of xi’s have value u1
x1 x2
x
n
xn
np1u1
npmum
n
ui pi
9
Statistical Characterization of Random Variables
Expected Value:
–The expected value of a discrete random variable, x
is found by multiplying each value of random
variable by its probability and then summing over all
values of x.
Expected value of x: E[ x] xP( x) xf ( x)
x
x
– Expected value is equivalent to center of mass concept.
r mi ri mi
– That’s why name first moment also.
– Body is perfectly balanced abt. Center of mass
The expectation value of x is the “balancing point” for the
probability mass function of x
– Expected value is equal to the point of symmetry in case of
symmetric pmf/pdf.
10
Statistical Characterization of Random Variables
Law of Unconscious Statistician (LOTUS): We can
take an expectation of any function of a random
variable.
Expected value of (y=g(x)) = E[y]= yf y g(x)f x
y
x
This balance point is the value expected for g(x) for
all possible repetitions of the experiment involving the
random variable x.
Expected value of a continuous density function f(x),
is given by
E ( x)
xf ( x)dx
11
Example
Let us assume that we have agreed to pay $1 for
each dot showing when a pair of dice is thrown. We
are interested in knowing, how much we would lose
on the average?
Values of x
Frequency
Values of
Probability Function
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
5
4
3
2
1
P(x=2) = 1/36
P(x=3) = 2/36
P(x=4) = 3/36
P(x=5) = 4/36
P(x=6) = 5/36
P(x=7) = 6/36
P(x=8) = 5/36
P(x=9) = 4/36
P(x=10) = 3/36
P(x=11) = 2/36
P(x=12) = 1/36
Sum
36
1.00
Probability
Distribution
Function
P(x2) = 1/36
P(x3) = 3/36
P(x4) = 6/36
P(x5) = 10/36
P(x6) = 15/36
P(x7) = 21/36
P(x8) = 26/36
P(x9) = 30/36
P(x10) = 33/36
P(x11) = 35/36
P(x12) = 1
Average amount we pay=
(($2x1)+($3x2)+……+($12x1))/36=$7
E(x)=$2(1/36)+$3(2/36)+……….+$12(1/36)=$7
12
Example (Continue…)
Let us assume that we had agreed to pay an amount
equal to the squares of the sum of the dots showing
on a throw of dice.
– What would be the average loss this time?
Will it be ($7)2=$49.00?
Actually, now we are interested in calculating E[x2].
– E[x2]=($2)2(1/36)+……….+($12)2(1/36)=$54.83 $49
– This result also emphasized that (E[x])2 E[x2]
13
Expectation Rules
Rule 1: E[k]=k; where k is a constant
Rule 2: E[kx] = kE[x].
Rule 3: E[x y] = E[x] E[y].
Rule 4: If x and y are independent
E[xy] = E[x]E[y]
Rule 5: V[k] = 0; where k is a constant
Rule 6: V[kx] = k2V[x]
14
Variance of Random Variable
Variance of random variable, x is defined as
V ( x) 2 E[( x )2 ]
V ( x) E[ x 2 2 x 2 ]
E[ x 2 ] 2( E[ x]) 2 ( E[ x]) 2
E[ x 2 ] ( E[ x]) 2
This result is also known as “Parallel Axis Theorem”
15
Propagation of moments and density function through
linear models
y=ax+b
– Given: = E[x] and 2 = V[x]
– To find: E[y] and V[y]
E[y] = E[ax]+E[b] = aE[x]+b = a+b
V[y] = V[ax]+V[b] = a2V[x]+0 = a2 2
Let us define
z
(x )
Here, a = 1/ and b = - /
Therefore, E[z] = 0 and V[z] = 1
z is generally known as “Standardized variable”
16
Propagation of moments and density function through
non-linear models
If x is a random variable with probability density function p(x) and y =
f(x) is a one to one transformation that is differentiable for all x then the
probability function of y is given by
– p(y)=p(x)|J|-1, for all x given by x=f-1(y)
– where J is the determinant of Jacobian matrix J.
Example:
Let y ax 2 and p( x)
1
x 2
exp( x 2 / 2 x2 )
NOTE: for each value of y there are two values of x.
1
p( y )
exp( y / 2a x2 ), y 0
2 x 2 ay
and
p(y) = 0, otherwise
We can also show that
E( y) a x2 and V ( y) 2a4 x4
17
Random Variables
One random number depicts one physical phenomenon.
– Web server.
Just an extension to random variable
– A vector random variable X is a function that assigns a vector of
real number to each outcome in the sample space.
– e.g. Sample Space = Set of People.
–
Random vector=[X=weight, Y=height of a person].
A random point (X,Y) has more information than X or Y.
– It describes the joint behavior of X and Y.
The joint probability distribution function:
F ( X , Y ) P({X x} {Y y})
What Happens: x
x
y
y
18
Random Vectors
Joint Probability Functions:
– Joint Probability Distribution Function:
F ( X ) P[{X1 x1} {X 2 x2} ......... {X n xn}]
– Joint Probability Density Function:
n F ( X )
f ( x)
X1X 2 ...X n
Marginal Probability Functions: A marginal probability
functions are obtained by integrating out the
variables that are of no interest.
F ( x) P ( x, y ) or
y
y
f ( x, y )dy
y
19
Multivariate Expectations
xf
E( X )
( x)dx
X
What abt. g(X,Y)=X+Y
f X ( x)
f X ,Y ( x, y )dy
E( X )
E (Y )
xf
( x)dx
X
xf
yf
Y
( y )dy
X ,Y
E ( g ( X ))
yf
X ,Y
g ( x) f
X
( x, y )dxdy
( x, y )dydx
( x)dx
g ( x) f
X ,Y
( x, y )dydx
E (h(Y ))
h( y ) f
X ,Y
( x, y )dxdy
E ( g ( X , Y ))
g ( x, y ) f
X ,Y
( x, y )dxdy
20
Multivariate Expectations
Mean Vector:
E[x] [ E[ x1 ] E[ x2 ] ...... E[ xn ]]
Expected value of g(x1,x2,…….,xn) is given by
E[ g (x)] ..... g (x) f ( x) or
xn xn1
x1
..... g (x) f (x)dx
xn xn-1
x1
Covariance Matrix:
cov[x] P E[(x )(x )T ] E[xxT ] T
where, S E[xxT ] is known as autocorrelation matrix.
1 0 0 1
0 0
2
21
NOTE: P R
0 0 n n1
12
1
n2
1n 1 0
2 n 0 2
1 0
0
0
0
n
R is the correlation matrix
21
Covariance Matrix
Covariance matrix indicates the tendency of each
pair of dimensions in random vector to vary together
i.e. “co-vary”.
Properties of covariance matrix:
– Covariance matrix is square.
– Covariance matrix is always +ive definite i.e. xTPx > 0.
– Covariance matrix is symmetric i.e. P = PT.
– If xi and xj tends to increase together then Pij > 0.
– If xi and xj are uncorrelated then Pij = 0.
22
Independent Variables
Recall, two random variables are said to be independent
if knowing values of one tells you nothing about the other
variable.
– Joint probability density function is product of the marginal
probability density functions.
– Cov(X,Y)=0 if X and Y are independent.
– E(XY)=E(X)E(Y).
Two variables are said to be uncorrelated if cov(X,Y)=0.
– Independent variables are uncorrelated but vice versa is not
true.
Cov(X,Y)=0Integral=0.
– It tells us that distribution is balanced in some way but says
nothing abt. Distribution values.
– Example: (X,Y) uniformly distributed on unit circle.
23
Gaussian or Normal Distribution
The normal distribution is the most widely known and used distribution
in the field of statistics.
– Many natural phenomena can be approximated by Normal
distribution.
Central Limit Theorem:
– The central limit theorem states that given a distribution with a mean
and variance 2, the sampling distribution of the mean approaches
a normal distribution with a mean and a variance 2/N as N, the
sample size increases.
Normal Density Function:
f ( x)
1
e
2
0.399
( x )2
2 2
x
-2 -
+ +2
24
Multivariate Normal Distribution
Multivariate Gaussian Density Function:
1
f ( X)
n
2 R
1
2
e
T 1
1
2 X μ R X μ
How to find equal probability surface?
1
Xμ
2
T
R
1
Xμ constant
More ever one is interested to find the probability of x lies inside
the quadratic hyper surface
– For example what is the probability of lying inside 1-σ ellipsoid.
1
R CΣC
T
P zi2 c 2 f ( z )dV
V
Y C( X μ)
Yi
zi
i
z12 z22
zn2 c 2
1
2
1
0
0
0
1
22
0
0
0
Σ
1
n2
25
Multivariate Normal Distribution
Yi represents coordinates based on Cartesian
principal axis system and σ2i is the variance along the
principal axes.
Probability of lying inside 1σ,2σ or 3σ ellipsoid
decreases with increase in dimensionality.
n\c
1
2
3
1 0.683 0.955 0.997
2 0.394 0.865 0.989
Curse of Dimensionality
3 0.200 0.739 0.971
26
Summary of Probability Distribution Functions
Probability
Distribution
Discrete
Parameters
Characteristics Probability
Function
Binomial
0 p 1 and n 0,1, 2,
Skewed unless
p=0.5
M=0…n, N=0,1,2…
Hypergeometric n=0…N
Poisson
>0
n
Cx p x q n x
M
Skewed
Skewed
positively
Continuous
C x N M Cn x
N
Cn
x e
Symmetric
about
Standardized
Normal
Symmetric
about zero
1 x2
e
2
Exponential
Skewed
Positively
- and 0
0
np
n
M
N
npq
nM ( N M )( N n)
N 2 ( N 1)
2
0
1
1/
1/2
x!
1
e
2
Normal
Mean Variance
( x )2
2 2
2
e T
A distribution is skewed if it has most of its values either to the right or to the left of its mean
27
Properties of Estimators
Unbiasedness
– On average the value of parameter being estimated is equal
to true value.
E[xˆ ] x
Efficiency
– Have a relatively small variance.
– The values of parameters being estimated should not vary
with samples.
Sufficiency
– Use as much as possible information available from the
samples.
Consistency
– As the sample size increases, the estimated value
approaches the true value.
28