Review of Probability
Download
Report
Transcript Review of Probability
Probability Review
CS479/679 Pattern Recognition
Dr. George Bebis
1
Why Bother About Probabilities?
• Accounting for uncertainty is a crucial component
in decision making (e.g., classification) because of
ambiguity in our measurements.
• Probability theory is the proper mechanism for
accounting for uncertainty.
• Take into account a-priori knowledge, for example:
"If the fish was caught in the Atlantic ocean, then
it is more likely to be salmon than sea-bass
2
Definitions
• Random experiment
– An experiment whose result is not certain in advance
(e.g., throwing a die)
• Outcome
– The result of a random experiment
• Sample space
– The set of all possible outcomes (e.g., {1,2,3,4,5,6})
• Event
– A subset of the sample space (e.g., obtain an odd
number in the experiment of throwing a die = {1,3,5})
3
Intuitive Formulation of Probability
• Intuitively, the probability of an event α could be defined
as:
where N(a) is the number of times that event α happens
in n trials
• Assumes that all outcomes are equally likely (Laplacian
definition)
4
Axioms of Probability
S
A1
A3
A2
A4
C)
5
Unconditional (Prior) Probability
• This is the probability of an event in the absence
of any evidence.
P(Cavity)=0.1
means that
“in the absence of any other information, there is a
10% chance that the patient is having a cavity”.
6
Conditional (Posterior) Probability
• This is the probability of an event given some
evidence.
P(Cavity/Toothache)=0.8 means that
“there is an 80% chance that the patient is having a
cavity given that he is having a toothache”
7
Conditional (Posterior) Probability
(cont’d)
• Conditional probabilities are defined as follows:
P( A, B)
P( A / B)
,
P( B)
P( A, B)
P B / A
P( A)
• The above definitions lead to the chain rule:
P(A,B)=P(A/B)P(B)=P(B/A)P(A)
8
Law of Total Probability
• If A1, A2, …, An is a partition of mutually exclusive events
and B is any event, then:
P( B) P( B, A1 ) P( B, A2 ) ... P( B, An )
P( B / A1 ) P( A1 ) P( B / A2 ) P( A2 ) ... P( B / An ) P( An )
n
P( B / Aj ) P( Aj )
j 1
• Special case : A, A
S
A1
A2
A3
A4
P( B) P( B, A) P( B, A) P( B / A) P( A) P( B / A) P( A)
9
Bayes’ Rule
• Using the definition of conditional probabilities leads
to the Bayes’ rule:
P( A / B)
P( A, B)
,
P( B)
P B / A
P( A, B)
P( A)
P( B / A) P( A)
P( A / B)
P( B)
where:
P( B) P( B, A) P( B, A) P( B / A) P( A) P( B / A) P( A)
10
Example
• Consider the probability of Disease given Symptom:
P( Symptom / Disease) P( Disease)
P( Disease / Symptom)
P( Symptom)
where:
P ( Symptom ) P ( Symptom / Disease ) P ( Disease )
P ( Symptom / Disease ) P ( Disease )
11
Example (cont’d)
• Meningitis causes a stiff neck 50% of the time.
• A patient comes in with a stiff neck – what is the probability
that he has meningitis?
P( S / M ) P( M )
P( M / S )
P( S )
where P(S/M)=0.5
• Need to know the following:
– The prior probability of a patient having meningitis (P(M)=1/50,000)
– The prior probability of a patient having a stiff neck (P(S)=1/20)
P(M/S)=0.0002
12
General Form of Bayes’ Rule
• If A1, A2, …, An is a partition of mutually exclusive events
and B is any event, then the Bayes’ rule is given by:
P( B / Ai ) P( Ai )
P( Ai / B)
P( B)
where P( B)
n
P( B / A ) P( A )
j 1
j
j
13
Independence
•
Two events A and B are independent iff:
P(A,B)=P(A)P(B)
•
Using independence, we can show that:
P(A/B)=P(A) and P(B/A)=P(B)
•
A and B are conditionally independent given C iff:
P(A/B,C)=P(A/C)
e.g., P(WetGrass/Season,Rain)=P(WetGrass/Rain)
14
Random Variables
• In many experiments, it is easier to deal with a summary
variable than with the original probability structure.
Example
• In an opinion poll, we ask 50 people whether agree or
disagree with a certain issue.
– Suppose we record a "1" for agree and "0" for disagree.
– The sample space for this experiment has 250 elements.
• Suppose we are only interested in the number of people
who agree.
– Define the variable X=number of "1“ 's recorded out of 50.
– The new sample space has only 51 elements.
15
Random Variables (cont’d)
• A random variable (r.v.) is a function that assigns a value
to the outcome of a random experiment.
X(j)
16
Random Variables (cont’d)
• How do we compute probabilities using random variables?
– Suppose the sample space is S=<s1, s2, …, sn>
– Suppose the range of the random variable X is <x1,x2,…,xm>
– We observe X=xj iff the outcome of the random experiment
is an s j S such that X(sj)=xj
P( X x j )
s j :X ( s j ) x j
P( s j )
17
Example
• Consider the experiment of throwing a pair of dice:
18
Discrete/Continuous Random Variables
• A discrete r.v. can assume only discrete values.
• A continuous r.v. can assume a continuous range of values
(e.g., sensor readings).
• Probability mass/density function (pmf/pdf)
• Probability distribution function (PDF)
19
Probability mass function (pmf)
• The pmf of a discrete r.v. X assigns a probability to
each possible value x of X.
• Notation: given two r.v.'s, X and Y, their pmf are denoted as
pX(x) and pY(y); for convenience, we will drop the subscripts and
denote them as p(x) and p(y). However, keep in mind that these
functions are different!
20
Probability density function (pdf)
• The pdf of a continuous r.v. X shows the
probability of being “close” to some number x.
• Notation: given two r.v.'s, X and Y, their pdf are denoted as
pX(x) and pY(y); for convenience, we will drop the subscripts and
denote them as p(x) and p(y). However, keep in mind that these
functions are different!
21
Probability Distribution Function (PDF)
• Definition:
F ( x) P( X x)
• Some properties of PDF:
– (1) 0 F ( x) 1
– (2) F(x) is a non-decreasing function of x
• If X is discrete, its PDF can be computed as follows:
x
x
k 0
k 0
F ( x) P ( X x) P ( X k ) p (k )
22
Probability Distribution Function (PDF)
(cont’d)
pmf
23
Probability Distribution Function (PDF)
(cont’d)
• If X is continuous, its PDF can be computed as follows:
x
F ( x)
p(t )dt
for all x
• The pdf can be obtained from the PDF using:
dF
p( x)
( x)
dx
24
Example
• Gaussian pdf and PDF
0
25
Joint pmf (discrete case)
• For n random variables, the joint pmf assigns a
probability for each possible combination of
values:
p(x1,x2,…,xn)=P(X1=x1, X2=x2, …, Xn=xn)
p( x1 , x2 ,..., xn ) 1
x1 , x2 ,..., xn
Notation: the joint pmf /pdf of the r.v.'s X1, X2, ..., Xn and Y1,
Y2, ..., Yn are denoted as pX1X2...Xn(x1,x2,...,xn) and
pY1Y2...Yn(y1,y2,...,yn); for convenience, we will drop the subscripts
and denote them p(x1,x2,...,xn) and p(y1,y2,...,yn), keep in mind,
however, that these are two different functions.
26
Joint pmf (discrete case) (cont’d)
• Specifying the joint pmf requires a large number of values
– kn assuming n random variables where each one can
assume one of k discrete values.
– Can be simplified if we assume independence or
conditional independence.
P(Cavity, Toothache) is a 2 x 2 matrix
Joint Probability
Toothache
Not Toothache
Cavity
0.04
0.06
Not Cavity
0.01
0.89
Sum of probabilities = 1.0
27
Joint pdf (continuous r.v.)
For n random variables, the joint pdf gives the probability of
being “close” to a given combination of values:
p( x1 , x2 ,..., xn ) 0
... p( x , x ,..., x )dx ...dx
1
x1
2
n
1
n
1
xn
28
Conditional pdf
• The conditional pdf is defined as follows:
p( x, y )
p ( y / x)
p ( x)
• Using the definition of conditional pdfs, leads to
the chain rule:
p ( x, y ) p ( y / x ) p ( x )
• General case:
p( x1 , x2 ,..., xn ) p( x1 / x2 ,..., xn ) p( x2 / x3 ,..., xn )... p( xn1 / xn ) p( xn )
29
Independence
• Knowledge about independence between r.v.'s is
very useful because it simplifies things greatly!
e.g., if X and Y are independent, then:
p ( x, y ) p ( x ) p ( y )
2D
1D
30
Law of Total Probability
• The law of total probability:
p( y ) p( y / x) p( x)
x
31
Marginalization
• Given the joint pmf/pdf, we can compute the pmf/pdf of
any subset of variables using marginalization:
– Examples:
p( x) p( x, y)dy
p( x1 , x2 ,..., xi 1 , xi 1 ,..., xn ) p( x1 , x2 ,..., xn )dxi
p ( x1 , x2 ) ... p( x1 , x2 ,..., xn )dx3 ...dxn
x3
xn
32
Example
33
Normal (Gaussian) Distribution
1-dimensional:
-
d-dimensional:
dx1
d x d symmetric matrix
34
Normal (Gaussian) Distribution (cont’d)
• Parameters and shape of Gaussian distribution
d ( d 1)
– Number of parameters is d 2
– Location determined by μ; shape determined by Σ
35
Normal (Gaussian) Distribution (cont’d)
• Mahalanobis distance:
r (x ) (x )
t
1
• If the variables are independent, the multivariate
normal distribution becomes:
( xi i ) 2
p ( x)
exp[
]
2
2
i
2 i
2 i
1
Σ is diagonal
36
Expected Value
37
Expected Value (cont’d)
• The sample mean and expected value are related
by:
• The expected value for a continuous r.v. is given by:
38
Variance and Standard Deviation
39
Covariance
40
Covariance Matrix – 2 variables
Σ=
and Cov(X,Y)=Cov(Y,X)
(i.e., symmetric matrix)
can be computed using:
41
Covariance Matrix – n variables
(i.e., symmetric matrix)
42
Uncorrelated r.v.’s
D
D
43
Properties of covariance matrix
(Note: we will review these concepts later in case you do not remember them)
44
Covariance Matrix Decomposition
where Φ-1= ΦT
45
Linear Transformations
46
Linear Transformations (cont’d)
Whitening
transform
47