Probability_lecture1
Download
Report
Transcript Probability_lecture1
Probability and Statistical Review
Lecture 1
Manoranjan Majji
Lecture slides and notes available online:
Visit http://dnc.tamu.edu/Class Notes/AERO626/index.php
Probability and Statistical Review
Probability
–Motivating Example
–Definition of Probability
–Axioms of Probability
–Conditional Probability
–Bayes’s Theorem
–Random Variables
Discrete and Continuous
–Expectation of Random Variables
–Multivariate Density Functions
2
Basic Probability Concepts
Probabilities are numbers assigned to events that indicate “how
likely” it is that event will occur when a random experiment is
performed.
– The statement “E has probability P(E)” then means that if we perform
the experiment very often, it is practically certain that the relative
frequency is approximately equal to P(E).
What do we mean by Relative Frequency?
– The relative frequency is at least equal to 0 and at most equal to 1.
0 P( E ) 1
– Frequency Function: It shows how the values of the samples are
distributed.
f ( x)
fj
0
when x x j
for any value x not appearing in the sample
– Sample Distribution function:
F ( x) f (t )
tx
4
Basic Probability Concepts
The frequency function characterizes a given sample in detail.
– We can compute some numbers that characterize certain properties
of the sample.
– Sample Mean:
m
1 n
1 m
x j x j nf ( x j ) x j f ( x j )
n j 1
n j 1
j 1
– Sample Variance:
m
1 n
2
( x j ) ( x j )2 f ( x j )
n j 1
j 1
2
5
Useful Definitions
Random Experiment or Random Observation:
– It is performed according to a set of rules that determines the
performance completely.
– It can be repeated arbitrarily often.
– The result of each performance depends on “chance” (that is, on
influences which we can not control) and can therefore not be
uniquely predicted.
The result of single performance of the experiment is called the
outcome of that experiment.
The set of all possible outcomes of an experiment is called the
sample space of the experiment.
In most practical problems, we are not interested in the
individual outcomes of the experiment but in whether an
outcome belongs to a certain set of outcomes. Such a set is
called an “Event”
6
Useful Definitions
Impossible Event: An event containing no element and is
denoted by .
Mutually Exclusive or Disjoint Events: if A B
Example: Let us consider a rolling of a dice.
Sample Space: S {1, 2,3, 4,5, 6}
E : an event that the dice turns up an even number {2, 4, 6}
O: an event that the dice turns up an odd number = {1,3,5}
E O
E and O are mutually exclusive events.
7
Axioms of Probability
Property 1: 0 P( E ) 1.
Property 2: P( S ) 1.
Property 3: P( E c ) 1 P( E ).
Property 4: P( A B) P( A) P( B) P( A B).
Property 5: if E1 , E2 , .........., En are mutually exclusive events then
P( E1 E2 ........ En ) P( E1 ) P( E2 ) .......... P( En )
8
Conditional Probability
The probability of an event B under the condition that an event
A occurs is given by
P( A B)
P( B / A)
P( A)
– P(B/A) is called the conditional probability of B given A.
AB
A
B
– In this case, event A serves as a new sample space and event B
becomes AB.
– A and B are called independent events if
P( B / A) P( B)
P( A / B ) P ( A)
P( A B) P( A) P ( B )
9
Theorem of Total Probability.
B1
Bn-1
A
B2
B3
Bn
Let B1, B2………,Bn be mutually exclusive events s.t.
n
Bi S
i 1
The probability of an event A can be represented as:
P( A) P( A B1 ) P( A B2 ) .............. P( A Bn )
and, therefore
n
P( A) P( A / B1 ) P( B1 ) .............. P( A / Bn ) P( Bn ) P( A / Bi ) P( Bi )
i 1
10
Bayes’s Theorem
Let us assume there are m mutually exclusive states of nature
(classes) labeled j(j=1,2,…..,m).
Let P(x) be the probability that an event assumes the specific
value x.
Definitions:
– Prior Probability: P(j).
– Posterior Probability: P(j/x) (of class j given observation x)
– Likelihood Probability: P(x/j) (conditional probability of observation
x given class j).
Bayes’s Theorem: gives the relationship between the m prior
probabilities P(j), the m likelihood probabilities P(x/j) and one
posterior probability of interest.
P( j ) P( x / j )
P( j / x) m
P( k ) P( x / k )
k 1
11
Exercise
Consider a clinical problem where we have to decide if a patient
has one particular rare disease on the basis of an imperfect
medical test.
– 1 in 1000 people have a rare disease.
– Test shows positive 99% when a person has disease.
– Test shows positive 2% when a person does not have a disease.
What is the probability when the test is positive and person
actually has the disease?
P( B / A1 ) P( A1 )
P( A 1 / B)
P( B / A1 ) P( A1 ) P( B / A2 ) P( A2 )
0.99 0.001
(0.99 0.001) (0.02 0.999)
0.047
0.02
0.98
12
Exercise (continued….)
P(A1/B)=0.047=4.7%
– seems counter-intuitive…WHY?
Most positive tests arise from error than from people actually having the
disease.
– From prior 0.001 to posterior 0.047.
Disease is rare and test is marginally reliable.
NOTE: if disease were not so rare (e.g. say 25% incidents) then
we would get a good diagnoses.
– P(A1/B)=0.94.
13
Random Variables
A random variable X (also called stochastic variable) is a
function whose values are real numbers and depend on
“chance”. More precisely, it is a function X which has following
properties:
– X is defined on the sample space S of the experiment, and its values
are real numbers
The function that assigns value to each outcome is fixed and
deterministic.
– The randomness is due to the underlying randomness of the
argument of the function X.
Random numbers can be Discrete or Continuous.
14
Discrete Random Variables
A random variable X and the corresponding distribution are said
to be discrete, if the number of values for which X has non-zero
probability is finite.
Probability Mass Function of X:
f ( x)
pj
0
when x x j
otherwise
Probability Distribution Function of x:
F ( x) P( X x)
Properties of Distribution Function:
0 F ( x) 1
P(a x b) F (b) F (a )
15
Continuous Random Variables and Distributions
A random variable X and the corresponding distribution are said
to be continuous if the distribution function F(x)=P(Xx) of X can
be represented by an integral form.
x
F ( x)
f ( y )dy
The integrand f(y) is called a probability density function.
F '( x) f ( x)
Properties:
f ( x)dx 1
b
P(a X b) F (b) F (a) f ( x)dx
a
16
Statistical Characterization of Random Variables
Expected Value:
– The expected value of a discrete random variable, x is found by
multiplying each value of random variable by its probability and then
summing over all values of x.
Expected value of x: E[ x] xP( x) xf ( x)
x
x
– The expectation value of x is the “balancing point” for the probability
mass function of x. That is, it is the arithmetic mean.
– We can take an expectation of any function of a random variable.
Expected value of g(x) = E[g(x)]= g(x)f(x)
x
– This balance point is the value expected for g(x) for all possible
repetitions of the experiment involving the random variable x.
– Expected value of a continuous density function f(x), is given by
E ( x)
xf ( x)dx
17
Illustration of Expectation
A Lottery has two schemes, the First scheme has two
outcomes (denoted by 1 and 2)and the second has
three (denoted by 1,2 and 3). It is agreed that the
participant in the First scheme gets $1, if outcome is
1, $2, if the outcome is 2. The participant in the
second scheme gets $3 if the outcome is 1, -$2 if the
outcome is 2 and $3 if the outcome is 3. The
probabilities of each outcome are listed as follows.
p(1, 1) = 0.1; p(1, 2) = 0.2; p(1, 3) = 0.3
p(2, 1) = 0.2; p(2, 2) = 0.1; p(2; 3) = 0.1
Help the investor to decide on which scheme to
prefer.[Bryson]
18
Example
Let us assume that we have agreed to pay $1 for each dot
showing when a pair of dice is thrown. We are interested in
knowing, how much we would lose on the average?
Values of x
Frequency
Values of
Probability Function
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
5
4
3
2
1
P(x=2) = 1/36
P(x=3) = 2/36
P(x=4) = 3/36
P(x=5) = 4/36
P(x=6) = 5/36
P(x=7) = 6/36
P(x=8) = 5/36
P(x=9) = 4/36
P(x=10) = 3/36
P(x=11) = 2/36
P(x=12) = 1/36
Sum
36
1.00
Probability
Distribution
Function
P(x2) = 1/36
P(x3) = 3/36
P(x4) = 6/36
P(x5) = 10/36
P(x6) = 15/36
P(x7) = 21/36
P(x8) = 26/36
P(x9) = 30/36
P(x10) = 33/36
P(x11) = 35/36
P(x12) = 1
Average amount we pay= (($2x1)+($3x2)+……+($12x1))/36=$7
E(x)=$2(1/36)+$3(2/36)+……….+$12(1/36)=$7
19
Example (Continue…)
Let us assume that we had agreed to pay an amount equal to
the squares of the sum of the dots showing on a throw of dice.
– What would be the average loss this time?
Will it be ($7)2=$49.00?
Actually, now we are interested in calculating E[x2].
– E[x2]=($2)2(1/36)+……….+($12)2(1/36)=$54.83 $49
– This result also emphasized that (E[x])2 E[x2]
20
Variance of Random Variable
Variance of random variable, x is defined as
V ( x) 2 E[( x )2 ]
V ( x) E[ x 2 2 x 2 ]
E[ x 2 ] 2( E[ x]) 2 ( E[ x]) 2
E[ x 2 ] ( E[ x]) 2
This result is also known as “Parallel Axis Theorem”
21
Expectation Rules
Rule 1: E[k]=k; where k is a constant
Rule 2: E[kx] = kE[x].
Rule 3: E[x y] = E[x] E[y].
Rule 4: If x and y are independent
E[xy] = E[x]E[y]
Rule 5: V[k] = 0; where k is a constant
Rule 6: V[kx] = k2V[x]
22
Propagation of moments and density function through
linear models
y=ax+b
– Given: = E[x] and 2 = V[x]
– To find: E[y] and V[y]
E[y] = E[ax]+E[b] = aE[x]+b = a+b
V[y] = V[ax]+V[b] = a2V[x]+0 = a2 2
Let us define
z
(x )
Here, a = 1/ and b = - /
Therefore, E[z] = 0 and V[z] = 1
z is generally known as “Standardized variable”
23
Propagation of moments and density function through
non-linear models
If x is a random variable with probability density function p(x)
and y = f(x) is a one to one transformation that is differentiable
for all x then the probability function of y is given by
– p(y)=p(x)|J|-1, for all x given by x=f-1(y)
– where J is the determinant of Jacobian matrix J.
Example:
Let y ax 2 and p( x)
1
x 2
exp( x 2 / 2 x2 )
NOTE: for each value of y there are two values of x.
p( y )
1
exp( y / 2a x2 ), y 0
2 x 2 ay
and
p(y) = 0, otherwise
We can also show that
E ( y ) a x2 and V ( y ) 2a 4 x4
24
Random Vectors
Just an extension to random variable
– A vector random variable X is a function that assigns a vector of real
number to each outcome in the sample space.
Joint Probability Functions:
– Joint Probability Distribution Function:
F ( X ) P[{X1 x1} {X 2 x2} ......... { X n xn}]
– Joint Probability Density Function:
n F ( X )
f (x)
X 1X 2 ...X n
Marginal Probability Functions: A marginal probability functions
are obtained by integrating out the variables that are of no
interest.
F ( x) P( x, y ) or
y
y
f ( x, y )dy
y
25
Multivariate Expectations
Mean Vector:
E[ x ] [ E[ x1 ] E[ x2 ] ...... E[ xn ]]
Expected value of g(x1,x2,…….,xn) is given by
E[ g ( x )] ..... g ( x ) f ( x ) or
xn xn1
x1
..... g ( x ) f ( x )dx
xn xn-1
x1
Covariance Matrix:
cov[ x ] P E[( x )( x )T ] E[ XX T ] T
where, S E[ xx T ] is known as autocorrelation matrix.
1 0 0 1
0 0
2
21
NOTE: P R
0 0 n n1
12
1
n 2
1n 1 0
2 n 0 2
1 0
0
0
0
n
R is the correlation matrix
26
Covariance Matrix
Covariance matrix indicates the tendency of each pair of
dimensions in random vector to vary together i.e. “co-vary”.
Properties of covariance matrix:
– Covariance matrix is square.
– Covariance matrix is always +ive definite i.e. xTPx > 0.
– Covariance matrix is symmetric i.e. P = PT.
– If xi and xj tends to increase together then Pij > 0.
– If xi and xj are uncorrelated then Pij = 0.
27
Probability Distribution Function
There are many situations in statistics that involves the same
type of probability functions.
– It is not necessary to derive these results over and over again in
each special case with different numbers.
We can avoid this tedious process by recognizing the
similarities between certain types of apparently
unique
experiments, and then merely matching a given case to general
formula.
Examples:
– Toss a coin:
– Take an exam:
– analyzing stock market:
Head or Tail
Pass or Fail
Up or Down
So all the above written processes can be distinguished by only
two events “Success” and “Failure”
28
Binomial Distribution
Binomial Distribution plays an important role in experiments
involving repeated independent trials each with just two possible
outcomes.
– Independent trials means result of one trial cannot influence the
result of other trials.
– Repeated trials means the probability of “success” or “failure” does
not change with trials.
In binomial distribution, we are interested in the probability of
receiving a certain number of successes.
Let us assume that we have n independent trials, each trial
having same probability of success say p
– probability of failure: q = 1-p
Let us say we are interested in determine the probability of x
successes in n trials.
– Find the probability of any one occurrence of this type and then
multiply this value by number of possible occurrences.
29
Binomial Distribution
One of the possible occurrence is:
SS ......S FF ......F
x times
n x times
Joint probability of this particular sequence is given by p x q n x
NOTE: pxqn-x represents the probability not only of our one
arrangement but of any possible arrangement of x successes
and n-x failures.
How many occurrences of n successes and n-x failures are
possible?
n
Cx
n!
x !(n x)!
P( x successes in n trials) nCx p x q n x
Binomial distribution is discrete in nature as x or n can take only discrete values.
30
Mean and Variance of Binomial Distribution
Mean:
E[ xbinomial ] np
Variance:
V [ xbinomial ] 2 ( E[ x 2 ] ( E[ x]) 2 ) npq
Example:
A football executive claims that 90% of viewers watch football
over baseball on a concurrent telecast.
An advertising agency claims that the viewers for each are 50%.
Who is right?
We did a survey in 25 households and assume that in 10 of them the
games were being viewed with the following breakdown
Viewing Football
Viewing Baseball
7
3
Which of the two reports is correct?
31
Hypergeometric Distribution
Binomial distribution is important in sampling with replacement
but many practical problems involve sampling without
replacement.
– In that case hypergeometric distribution can be used to obtain
precise probability.
Cx N M Cn x
f ( x)
N
Cn
M
n
M
N
2
nM ( N M )( N n)
N 2 ( N 1)
– Example: We want to pick two apples from a box containing 15
apples, 5 of which are rotten. Find the probability function for
number of rotten apples in our sample?
Without Replacement
5
Cx 10C2 x
f ( x) 15
C2
With Replacement
x
5 10
f ( x) 2Cx
15 15
2 x
32
Poisson Distribution
Poisson Distribution is one of the most important discrete
distribution.
– It was first used by French Mathematician S.D. Poisson in 1837 to
describe the probability of deaths in the Prussian army from the kick
of a horse as well as number of suicides among women and
children.
– These days it is successfully used in problems involving the number
of arrivals/requests for a service per unit time at any service facility.
Assumptions
– It must be possible to divide the time interval being used into a large
number of small time intervals s.t. the the probability of an
occurrence in each sub interval is very small.
– The probability of an occurrence in each of these sub intervals must
remain constant throughout the time period being considered.
– The probability of two or more occurrences in each sub intervals
must be small enough to be ignored.
– The occurrences in one time interval are independent from the any
occurrence in other time interval.
33
Poisson Distribution
Probability mass function for Poisson distribution is given by
f ( x)
x e
x!
The Poisson distribution has the mean and the variance 2= .
It can be shown that Poisson distribution can be obtained as a
special case of Binomial distribution when p0 and n.
Example:
It is given that on average 60 customers visit the bank
b/w 10am and 11am daily. Then we may be interested in
knowing the probability of exactly or less than or equal to 2
customers visiting the bank in a given one minute time interval.
e 1.12
1
P(2 arrivals)
2
2e
1 1 1
5
P( 2 arrivals)=
e e 2e 2e
34
Gaussian or Normal Distribution
The normal distribution is the most widely known and used distribution in the
field of statistics.
– Many natural phenomena can be approximated by Normal distribution.
Central Limit Theorem:
– The central limit theorem states that given a distribution with a mean and
variance 2, the sampling distribution of the mean approaches a normal
distribution with a mean and a variance 2/N as N, the sample size,
increases.
Normal Density Function:
f ( x)
1
e
2
0.399
( x )2
2 2
x
-2 -
+ +2
35
Normal Distribution
Multivariate Gaussian Density Function:
T 1
1
2 X μ P X μ
1
f ( X)
e
1
n
2 P 2
What is the probability that
Y A( X μ)
zi
Yi
i
z12 z22
zn2 R 2
X μ
T
P
1
2
1
0
Where,
0
P zi2 R 2 f ( z )dV
V
Curse of Dimensionality
1
X μ R 2
0
1
22
0
0
0
-1 T
AP A
1
n2
2
3
n\R 1
1 0.683 0.955 0.997
2 0.394 0.865 0.989
3 0.200 0.739 0.971
36
Summary of Some Probability Mass/Density Functions
Probability
Distribution
Discrete
Parameters
Characteristics Probability
Function
Binomial
0 p 1 and n 0,1, 2,
Skewed unless
p=0.5
M=0…n, N=0,1,2…
Hypergeometric n=0…N
n
Cx p x q n x
M
Skewed
x e
Symmetric
about
1
e
2
Standardized
Normal
Symmetric
about zero
1 x2
e
2
Exponential
Skewed
Positively
>0
Continuous
Normal
- and 0
0
np
C x N M Cn x
N
Cn
Skewed
positively
Poisson
Mean Variance
n
M
N
npq
nM ( N M )( N n)
N 2 ( N 1)
2
0
1
1/
1/2
x!
( x )2
2 2
2
e T
A distribution is skewed if it has most of its values either to the right or to the left of its mean
A measure of this variability in density is given by the third moment of a distribution called the
“skewness” defined as E(x3).
37