Transcript X - MMLab

Module #16:
Probability Theory
Rosen 5th ed., ch. 5
Let’s move on to probability, ch. 5.
Terminology
• A (stochastic) experiment is a procedure that
yields one of a given set of possible outcomes
• The sample space S of the experiment is the
set of possible outcomes.
• An event is a subset of sample space.
• A random variable is a function that assigns a
real value to each outcome of an experiment
Normally, a probability is related to an experiment or a trial.
Let’s take flipping a coin for example, what are the possible outcomes?
Heads or tails (front or back side) of the coin will be shown upwards.
After a sufficient number of tossing, we can “statistically” conclude
that the probability of head is 0.5.
In rolling a dice, there are 6 outcomes. Suppose we want to calculate the
prob. of the event of odd numbers of a dice. What is that probability?
Probability: Laplacian Definition
• First, assume that all outcomes in the sample
space are equally likely
– This term still needs to be defined.
• Then, the probability of event E in sample
space S is given by
Pr[E] = |E|/|S|.
Even though there are many definitions of probability, I would like to
use the one from Laplace.
The expression “equally likely” may be a little bit vague
from the perspective of pure mathematics.
But in engineering viewpoint, I think that is ok.
Probability of Complementary
Events
• Let E be an event in a sample space S.
• Then, E represents the complementary
event.
• Pr[E] = 1 − Pr[E]
• Pr[S] = 1
Probability of Unions of
Events
• Let E1,E2  S
• Then:
Pr[E1 E2] = Pr[E1] + Pr[E2] − Pr[E1E2]
– By the inclusion-exclusion principle.
Mutually Exclusive Events
• Two events E1, E2 are called mutually
exclusive if they are disjoint: E1E2 = 
• Note that two mutually exclusive events
cannot both occur in the same instance
of a given experiment.
• For mutually exclusive events,
Pr[E1  E2] = Pr[E1] + Pr[E2].
Exhaustive Sets of Events
• A set E = {E1, E2, …} of events in the sample
space S is exhaustive if
S
• An exhaustive set of events that are all
E
i
mutually exclusive with each other has the
property that
 Pr[ E ]  1
i
Independent Events
• Two events E,F are independent if
Pr[EF] = Pr[E]·Pr[F].
• Relates to product rule for number of
ways of doing two independent tasks
• Example: Flip a coin, and roll a die.
Pr[ quarter is heads  die is 1 ] =
Pr[quarter is heads] × Pr[die is 1]
Now the question is: how we can figure out whether two events
are independent or not
Conditional Probability
• Let E,F be events such that Pr[F]>0.
• Then, the conditional probability of E given F,
written Pr[E|F], is defined as Pr[EF]/Pr[F].
• This is the probability that E would turn out to
be true, given just the information that F is
true.
• If E and F are independent, Pr[E|F] = Pr[E].
Here is the most important part in the probability, the cond. prob.
By the cond. prob., we can figure out whether there is a correlation
or dependency between two probabilities.
Bayes’ Theorem
• Allows one to compute the probability that a
hypothesis H is correct, given data D:
Pr[ D | H ]  Pr[ H ]
Pr[ H | D] 
Pr[ D]
Pr[ D | H i ]  Pr[ H i ]
Pr[ H i | D] 
 Pr[ D | H j ]  Pr[ H j ]
j
Set of Hj is exhaustive
Bayes’ theorem: example
•
•
•
•
Suppose 1% of population has AIDS
Prob. that the positive result is right: 95%
Prob. that the negative result is right: 90%
What is the probability that someone who has
the positive result is actually an AIDS patient?
• H: event that a person has AIDS
• D: event of positive result
• P[D] = P[D|H]P[H]+P[D|H]P[H ]
= 0.95*0.01+0.1*0.99=0.1085
• P[H|D] = 0.95*0.01/0.1085=0.0876
Expectation Values
• For a random variable X(s) having a
numeric domain, its expectation value
or expected value or weighted average
value or arithmetic mean value E[X] is
defined as
 p( s)  X ( s)
sS
Linearity of Expectation
• Let X1, X2 be any two random variables
derived from the same sample space.
Then:
• E[X1+X2] = E[X1] + E[X2]
• E[aX1 + b] = aE[X1] + b
Variance
• The variance Var[X] = σ2(X) of a random
variable X is the expected value of the square
of the difference between the value of X and
its expectation value E[X]:
Var[ X ] :  ( X ( s)  E[ X ]) p( s)
2
sS
• The standard deviation or root-mean-square
(RMS) difference of X, σ(X) :≡ (Var[X])1/2.
Visualizing
Sample Space
• 1. Listing
– S = {Head, Tail}
• 2. Venn Diagram
• 3. Contingency Table
• 4. Decision Tree Diagram
Venn Diagram
Experiment: Toss 2 Coins. Note Faces.
Tail
TH
Outcome
HH
HT
TT
S
S = {HH, HT, TH, TT}
Sample Space
Event
Contingency Table
Experiment: Toss 2 Coins. Note Faces.
2
st
Coin
Head
Tail
Total
Head
HH
HT
HH, HT
Tail
TH
TT
TH, TT
1 Coin
Simple
Event
(Head on
1st Coin)
nd
Total
HH, TH HT, TT
S = {HH, HT, TH, TT}
S
Sample Space
Outcome
Event Probability Using
Contingency Table
Event
Event
B1
B2
Total
A1
P(A1  B1) P(A1  B2) P(A1)
A2
P(A2  B1) P(A2  B2) P(A2)
Total
Joint Probability
P(B1)
P(B2)
1
Marginal (Simple) Probability
Marginal probability
• Let S be partitioned into m x n disjoint
sets Ei and Fj where the general subset
is denoted Ei  Fj . Then the marginal
probability of Ei is
n
E F
j 1
i
j
Tree Diagram
Experiment: Toss 2 Coins. Note Faces.
H
HH
T
HT
H
TH
T
TT
H
Outcome
T
S = {HH, HT, TH, TT}
Sample Space
Discrete Random Variable
– Possible values (outcomes) are discrete
• E.g., natural number (0, 1, 2, 3 etc.)
– Obtained by Counting
– Usually Finite Number of Values
• But could be infinite (must be “countable”)
Discrete
Probability Distribution
1.List of All possible [x, p(x)] pairs
– x = Value of Random Variable (Outcome)
– p(x) = Probability Associated with Value
2.Mutually Exclusive (No Overlap)
3.Collectively Exhaustive (Nothing Left Out)
4. 0  p(x)  1
5.  p(x) = 1
Visualizing Discrete Probability
Distributions
Table
Listing
• { (0, .25), (1, .50),
(2, .25) }
p(x)
.50
.25
.00
# Tails
f(x)
Count
p(x)
0
1
2
1
2
1
.25
.50
.25
Graph
Equation
p ( x) 
x
0
1
2
n!
p x (1  p) n  x
x !(n  x)!
Cumulative Distribution Function (CDF)
Fx a   Pr  X  a    p x 
xa
Binomial Distribution
1. Sequence of n Identical Trials
2. Each Trial Has 2 Outcomes
–
‘Success’ (Desired/specified Outcome) or ‘Failure’
3. Constant Trial Probability
4. Trials Are Independent
5. # of successes in n trials is a binomial
random variable
Binomial Probability
Distribution Function
 n  x n x
n!
x
n x
p( x)    p q 
p (1  p)
x!(n  x)!
 x
p(x) = Probability of x ‘Successes’
n = Sample Size
p = Probability of ‘Success’
x = Number of ‘Successes’ in Sample
(x = 0, 1, 2, ..., n)
Binomial Distribution
Characteristics
Mean
  E ( x )  np
P(X)
.6
.4
.2
.0
  np (1  p)
X
0
Standard Deviation
n = 5 p = 0.1
P(X)
.6
.4
.2
.0
1
2
3
4
5
n = 5 p = 0.5
X
0
1
2
3
4
5
Useful Observation 1
•
For any X and Y
E  X  Y      x  y  p( x  y)
x y
   xp( x  y )    yp( x  y )
x y
x y
  x p( x  y)   y  p( x  y)
x
y
y
  xp x    yp y 
x
y
 E  X   E Y 
x
One Binary Outcome
•
•
•
•
Random variable X, one binary outcome
Code success as 1, failure as 0
P(success)=p, P(failure)=(1-p)=q
E(X) = p
2

Var  X   E  X  E  X 
 E X   E  X 
2
2
 p *1  q * 0   p 
2
2
 p  p  p1  p 
2
2

Mean of a Binomial
• Independent, identically distributed
• X1, …, Xn; E(Xi)=p; Binomial X =  X i
E  X i 
  E X i 
 nE  X1 
 np
By useful
observation 1
Useful Observation 2
•
For independent X and Y
E  XY      xy  p ( x  y )
x y
   xyp x  p y 
x y
  xp x  yp y    xp x E Y  
x
y
x
E Y  xp x   E Y E  X   E  X E Y 
x
Useful Observation 3
•
For independent X and Y


Var  X  Y   E  X  Y   E  X  Y 

2

2
 E X  Y  2 XY  E  X   E Y 
2
2
2
  E Y  2E XY 
E X
2
2
 E  X   E Y   2 E  X E Y 
2
2
 
 
 
 
 E X 2  E  X 2  E Y 2  E Y 2
 2 E  XY   2 E  X E Y 
cancelled by obs. 2
 E X 2  E  X 2  E Y 2  E Y 2  Var  X   Var Y 
Variance of Binomial
•
•
Independent, identically distributed
X1, …, Xn; E(Xi)=p; Binomial X =  X i
Var  X i   Var  X i 
 nVar X i 
 np1  p 
Useful Observation 4
•
For any X


Var kX   E k 2 X 2  E kX 2
  kE X 
2
2
2
2
 k E X  k E  X 
2
k E X
2
 k Var  X 
2
2
Continuous random variable
Continuous Prob. Density Function
1. Mathematical Formula
2. Shows All Values, x, and
Frequencies, f(x)
– f(x) Is Not Probability
(Value, Frequency)
f(x)
3. Properties
 f (x )dx  1
All x
(Area Under Curve)
f ( x )  0, a  x  b
a
b
Value
x
Continuous Random Variable
Probability
d
P (c  x  d)  c f ( x ) dx
f(x)
Probability Is Area
Under Curve!
c
d
X
Uniform Distribution
1. Equally Likely Outcomes
2. Probability Density
f(x)
1
d c
1
f (x) 
d c
c
3. Mean & Standard Deviation
cd

2
d c

12
d
Mean
Median
x
Uniform Distribution Example
• You’re production manager of a soft drink
bottling company. You believe that when a
machine is set to dispense 12 oz., it really
dispenses 11.5 to 12.5 oz. inclusive.
• Suppose the amount dispensed has a uniform
distribution.
• What is the probability that less than 11.8 oz.
is dispensed?
Uniform Distribution Solution
f(x)
1.0
1
1

d  c 12.5  11.5
1
  1.0
1
x
11.5 11.8
12.5
P(11.5  x  11.8) = (Base)(Height)
= (11.8 - 11.5)(1) = 0.30
Normal Distribution
1. Describes Many Random Processes
or Continuous Phenomena
2. Can Be Used to Approximate Discrete
Probability Distributions
– Example: Binomial
3. Basis for Classical Statistical Inference
4. A.k.a. Gaussian distribution
Normal Distribution
1. ‘Bell-Shaped’ &
Symmetrical
f(X)
2. Mean, Median, Mode
Are Equal
4. Random Variable Has
Infinite Range
* light-tailed distribution
X
Mean: 평균
Median: 중간값
Mode: 최빈값
Probability
Density Function
1
f ( x) 
e
 2
f(x)


x

=
=
=
=
=
 1  x  
 
 2  
2


Frequency of Random Variable x
Population Standard Deviation
3.14159; e = 2.71828
Value of Random Variable (-< x < )
Population Mean
Effect of Varying Parameters
( & )
f(X)
B
A
C
X
Normal Distribution
Probability
Probability is
area under
curve!
d
P(c  x  d )   f ( x) dx
c
f(x)
c
d
x
?
Infinite Number
of Tables
Normal distributions differ by
mean & standard deviation.
Each distribution would
require its own table.
f(X)
X
That’s an infinite number!
Standardize the
Normal Distribution
X 
Z

Normal
Distribution
Standardized
Normal Distribution

= 1

X
=0
One table!
Z
Intuitions on Standardizing
•
Subtracting  from each value X just
moves the curve around, so values are
centered on 0 instead of on 
•
Once the curve is centered, dividing
each value by >1 moves all values
toward 0, pressing the curve
Standardizing Example
X   6.2  5
Z

 .12

10
Normal
Distribution
 = 10
= 5 6.2 X
Standardizing Example
X   6.2  5
Z

 .12

10
Normal
Distribution
 = 10
= 5 6.2 X
Standardized
Normal Distribution
=1
= 0 .12
Z
Obtaining the Probability
Standardized Normal
Probability Table (Portion)
Z
.00
.01
=1
.02
0.0 .0000 .0040 .0080
.0478
0.1 .0398 .0438 .0478
0.2 .0793 .0832 .0871
= 0 .12
0.3 .1179 .1217 .1255
Probabilities
Z
Shaded area
exaggerated