Transcript X - MMLab
Module #16:
Probability Theory
Rosen 5th ed., ch. 5
Let’s move on to probability, ch. 5.
Terminology
• A (stochastic) experiment is a procedure that
yields one of a given set of possible outcomes
• The sample space S of the experiment is the
set of possible outcomes.
• An event is a subset of sample space.
• A random variable is a function that assigns a
real value to each outcome of an experiment
Normally, a probability is related to an experiment or a trial.
Let’s take flipping a coin for example, what are the possible outcomes?
Heads or tails (front or back side) of the coin will be shown upwards.
After a sufficient number of tossing, we can “statistically” conclude
that the probability of head is 0.5.
In rolling a dice, there are 6 outcomes. Suppose we want to calculate the
prob. of the event of odd numbers of a dice. What is that probability?
Probability: Laplacian Definition
• First, assume that all outcomes in the sample
space are equally likely
– This term still needs to be defined.
• Then, the probability of event E in sample
space S is given by
Pr[E] = |E|/|S|.
Even though there are many definitions of probability, I would like to
use the one from Laplace.
The expression “equally likely” may be a little bit vague
from the perspective of pure mathematics.
But in engineering viewpoint, I think that is ok.
Probability of Complementary
Events
• Let E be an event in a sample space S.
• Then, E represents the complementary
event.
• Pr[E] = 1 − Pr[E]
• Pr[S] = 1
Probability of Unions of
Events
• Let E1,E2 S
• Then:
Pr[E1 E2] = Pr[E1] + Pr[E2] − Pr[E1E2]
– By the inclusion-exclusion principle.
Mutually Exclusive Events
• Two events E1, E2 are called mutually
exclusive if they are disjoint: E1E2 =
• Note that two mutually exclusive events
cannot both occur in the same instance
of a given experiment.
• For mutually exclusive events,
Pr[E1 E2] = Pr[E1] + Pr[E2].
Exhaustive Sets of Events
• A set E = {E1, E2, …} of events in the sample
space S is exhaustive if
S
• An exhaustive set of events that are all
E
i
mutually exclusive with each other has the
property that
Pr[ E ] 1
i
Independent Events
• Two events E,F are independent if
Pr[EF] = Pr[E]·Pr[F].
• Relates to product rule for number of
ways of doing two independent tasks
• Example: Flip a coin, and roll a die.
Pr[ quarter is heads die is 1 ] =
Pr[quarter is heads] × Pr[die is 1]
Now the question is: how we can figure out whether two events
are independent or not
Conditional Probability
• Let E,F be events such that Pr[F]>0.
• Then, the conditional probability of E given F,
written Pr[E|F], is defined as Pr[EF]/Pr[F].
• This is the probability that E would turn out to
be true, given just the information that F is
true.
• If E and F are independent, Pr[E|F] = Pr[E].
Here is the most important part in the probability, the cond. prob.
By the cond. prob., we can figure out whether there is a correlation
or dependency between two probabilities.
Bayes’ Theorem
• Allows one to compute the probability that a
hypothesis H is correct, given data D:
Pr[ D | H ] Pr[ H ]
Pr[ H | D]
Pr[ D]
Pr[ D | H i ] Pr[ H i ]
Pr[ H i | D]
Pr[ D | H j ] Pr[ H j ]
j
Set of Hj is exhaustive
Bayes’ theorem: example
•
•
•
•
Suppose 1% of population has AIDS
Prob. that the positive result is right: 95%
Prob. that the negative result is right: 90%
What is the probability that someone who has
the positive result is actually an AIDS patient?
• H: event that a person has AIDS
• D: event of positive result
• P[D] = P[D|H]P[H]+P[D|H]P[H ]
= 0.95*0.01+0.1*0.99=0.1085
• P[H|D] = 0.95*0.01/0.1085=0.0876
Expectation Values
• For a random variable X(s) having a
numeric domain, its expectation value
or expected value or weighted average
value or arithmetic mean value E[X] is
defined as
p( s) X ( s)
sS
Linearity of Expectation
• Let X1, X2 be any two random variables
derived from the same sample space.
Then:
• E[X1+X2] = E[X1] + E[X2]
• E[aX1 + b] = aE[X1] + b
Variance
• The variance Var[X] = σ2(X) of a random
variable X is the expected value of the square
of the difference between the value of X and
its expectation value E[X]:
Var[ X ] : ( X ( s) E[ X ]) p( s)
2
sS
• The standard deviation or root-mean-square
(RMS) difference of X, σ(X) :≡ (Var[X])1/2.
Visualizing
Sample Space
• 1. Listing
– S = {Head, Tail}
• 2. Venn Diagram
• 3. Contingency Table
• 4. Decision Tree Diagram
Venn Diagram
Experiment: Toss 2 Coins. Note Faces.
Tail
TH
Outcome
HH
HT
TT
S
S = {HH, HT, TH, TT}
Sample Space
Event
Contingency Table
Experiment: Toss 2 Coins. Note Faces.
2
st
Coin
Head
Tail
Total
Head
HH
HT
HH, HT
Tail
TH
TT
TH, TT
1 Coin
Simple
Event
(Head on
1st Coin)
nd
Total
HH, TH HT, TT
S = {HH, HT, TH, TT}
S
Sample Space
Outcome
Event Probability Using
Contingency Table
Event
Event
B1
B2
Total
A1
P(A1 B1) P(A1 B2) P(A1)
A2
P(A2 B1) P(A2 B2) P(A2)
Total
Joint Probability
P(B1)
P(B2)
1
Marginal (Simple) Probability
Marginal probability
• Let S be partitioned into m x n disjoint
sets Ei and Fj where the general subset
is denoted Ei Fj . Then the marginal
probability of Ei is
n
E F
j 1
i
j
Tree Diagram
Experiment: Toss 2 Coins. Note Faces.
H
HH
T
HT
H
TH
T
TT
H
Outcome
T
S = {HH, HT, TH, TT}
Sample Space
Discrete Random Variable
– Possible values (outcomes) are discrete
• E.g., natural number (0, 1, 2, 3 etc.)
– Obtained by Counting
– Usually Finite Number of Values
• But could be infinite (must be “countable”)
Discrete
Probability Distribution
1.List of All possible [x, p(x)] pairs
– x = Value of Random Variable (Outcome)
– p(x) = Probability Associated with Value
2.Mutually Exclusive (No Overlap)
3.Collectively Exhaustive (Nothing Left Out)
4. 0 p(x) 1
5. p(x) = 1
Visualizing Discrete Probability
Distributions
Table
Listing
• { (0, .25), (1, .50),
(2, .25) }
p(x)
.50
.25
.00
# Tails
f(x)
Count
p(x)
0
1
2
1
2
1
.25
.50
.25
Graph
Equation
p ( x)
x
0
1
2
n!
p x (1 p) n x
x !(n x)!
Cumulative Distribution Function (CDF)
Fx a Pr X a p x
xa
Binomial Distribution
1. Sequence of n Identical Trials
2. Each Trial Has 2 Outcomes
–
‘Success’ (Desired/specified Outcome) or ‘Failure’
3. Constant Trial Probability
4. Trials Are Independent
5. # of successes in n trials is a binomial
random variable
Binomial Probability
Distribution Function
n x n x
n!
x
n x
p( x) p q
p (1 p)
x!(n x)!
x
p(x) = Probability of x ‘Successes’
n = Sample Size
p = Probability of ‘Success’
x = Number of ‘Successes’ in Sample
(x = 0, 1, 2, ..., n)
Binomial Distribution
Characteristics
Mean
E ( x ) np
P(X)
.6
.4
.2
.0
np (1 p)
X
0
Standard Deviation
n = 5 p = 0.1
P(X)
.6
.4
.2
.0
1
2
3
4
5
n = 5 p = 0.5
X
0
1
2
3
4
5
Useful Observation 1
•
For any X and Y
E X Y x y p( x y)
x y
xp( x y ) yp( x y )
x y
x y
x p( x y) y p( x y)
x
y
y
xp x yp y
x
y
E X E Y
x
One Binary Outcome
•
•
•
•
Random variable X, one binary outcome
Code success as 1, failure as 0
P(success)=p, P(failure)=(1-p)=q
E(X) = p
2
Var X E X E X
E X E X
2
2
p *1 q * 0 p
2
2
p p p1 p
2
2
Mean of a Binomial
• Independent, identically distributed
• X1, …, Xn; E(Xi)=p; Binomial X = X i
E X i
E X i
nE X1
np
By useful
observation 1
Useful Observation 2
•
For independent X and Y
E XY xy p ( x y )
x y
xyp x p y
x y
xp x yp y xp x E Y
x
y
x
E Y xp x E Y E X E X E Y
x
Useful Observation 3
•
For independent X and Y
Var X Y E X Y E X Y
2
2
E X Y 2 XY E X E Y
2
2
2
E Y 2E XY
E X
2
2
E X E Y 2 E X E Y
2
2
E X 2 E X 2 E Y 2 E Y 2
2 E XY 2 E X E Y
cancelled by obs. 2
E X 2 E X 2 E Y 2 E Y 2 Var X Var Y
Variance of Binomial
•
•
Independent, identically distributed
X1, …, Xn; E(Xi)=p; Binomial X = X i
Var X i Var X i
nVar X i
np1 p
Useful Observation 4
•
For any X
Var kX E k 2 X 2 E kX 2
kE X
2
2
2
2
k E X k E X
2
k E X
2
k Var X
2
2
Continuous random variable
Continuous Prob. Density Function
1. Mathematical Formula
2. Shows All Values, x, and
Frequencies, f(x)
– f(x) Is Not Probability
(Value, Frequency)
f(x)
3. Properties
f (x )dx 1
All x
(Area Under Curve)
f ( x ) 0, a x b
a
b
Value
x
Continuous Random Variable
Probability
d
P (c x d) c f ( x ) dx
f(x)
Probability Is Area
Under Curve!
c
d
X
Uniform Distribution
1. Equally Likely Outcomes
2. Probability Density
f(x)
1
d c
1
f (x)
d c
c
3. Mean & Standard Deviation
cd
2
d c
12
d
Mean
Median
x
Uniform Distribution Example
• You’re production manager of a soft drink
bottling company. You believe that when a
machine is set to dispense 12 oz., it really
dispenses 11.5 to 12.5 oz. inclusive.
• Suppose the amount dispensed has a uniform
distribution.
• What is the probability that less than 11.8 oz.
is dispensed?
Uniform Distribution Solution
f(x)
1.0
1
1
d c 12.5 11.5
1
1.0
1
x
11.5 11.8
12.5
P(11.5 x 11.8) = (Base)(Height)
= (11.8 - 11.5)(1) = 0.30
Normal Distribution
1. Describes Many Random Processes
or Continuous Phenomena
2. Can Be Used to Approximate Discrete
Probability Distributions
– Example: Binomial
3. Basis for Classical Statistical Inference
4. A.k.a. Gaussian distribution
Normal Distribution
1. ‘Bell-Shaped’ &
Symmetrical
f(X)
2. Mean, Median, Mode
Are Equal
4. Random Variable Has
Infinite Range
* light-tailed distribution
X
Mean: 평균
Median: 중간값
Mode: 최빈값
Probability
Density Function
1
f ( x)
e
2
f(x)
x
=
=
=
=
=
1 x
2
2
Frequency of Random Variable x
Population Standard Deviation
3.14159; e = 2.71828
Value of Random Variable (-< x < )
Population Mean
Effect of Varying Parameters
( & )
f(X)
B
A
C
X
Normal Distribution
Probability
Probability is
area under
curve!
d
P(c x d ) f ( x) dx
c
f(x)
c
d
x
?
Infinite Number
of Tables
Normal distributions differ by
mean & standard deviation.
Each distribution would
require its own table.
f(X)
X
That’s an infinite number!
Standardize the
Normal Distribution
X
Z
Normal
Distribution
Standardized
Normal Distribution
= 1
X
=0
One table!
Z
Intuitions on Standardizing
•
Subtracting from each value X just
moves the curve around, so values are
centered on 0 instead of on
•
Once the curve is centered, dividing
each value by >1 moves all values
toward 0, pressing the curve
Standardizing Example
X 6.2 5
Z
.12
10
Normal
Distribution
= 10
= 5 6.2 X
Standardizing Example
X 6.2 5
Z
.12
10
Normal
Distribution
= 10
= 5 6.2 X
Standardized
Normal Distribution
=1
= 0 .12
Z
Obtaining the Probability
Standardized Normal
Probability Table (Portion)
Z
.00
.01
=1
.02
0.0 .0000 .0040 .0080
.0478
0.1 .0398 .0438 .0478
0.2 .0793 .0832 .0871
= 0 .12
0.3 .1179 .1217 .1255
Probabilities
Z
Shaded area
exaggerated