Transcript Document
Basic Probability - Part 1
Kwang In Kim
A. I. Lab
Terms
The set of all possible subsets of set S is called
power set S and denoted as 2S
A subset of S containing single element AS is
called singleton
(singleton {A} and element A are distinct!)
When S consists of infinitely many element, size
of S can be classified into
Countable, if correspondence can be made for each
element of S with distinct natural number
Uncountable, if not countable
Contents
Acting under uncertainty
Basic Idea of probability
Tools for Probability
Probability axioms
σ-algebra
Basic notation
probability distribution, random variables
Acting under Uncertainty
Causes Consequences
Logic does not fit everywhere
Laziness: There may be too many cases and
their causal relations
Ignorance: It may be practically impossible to
obtain all information necessary for
conclusion
Acting under Uncertainty (Cont’d)
Examples
Dental Diagnosis
If symptom(toothache) then disease(cavity)
If symptom(toothache) then disease(cavity) or
disease(abscess) or …
No wrong; But it will be difficult to consider all
Coin tossing experiment:
If I toss a coin, will head appear?
Wrong; Many other reasons exist
Too many factors to consider: coin weight, temperature,
tossing direction, speed, etc.
It is sometimes required to do reasoning even
under uncertainty (with incomplete knowledge)
Probability
Probability provides a way of summarizing
the uncertainty (or randomness)
represents degree of our belief / knowledge
provides tool for manipulating this
Example
If symptom(toothache) then disease(cavity)
with 80% chance (or 0.8 probability)
“Head appeared with 60/100 relative
frequency for this coin”
Axioms of Probability
As did many great scientific discoveries,
probability theory did not first started in
axiomatic forms
It has discovered however, that some desirable
properties cannot be satisfied simultaneously
Probability axioms are results of moderating
between desiderata and provide background on
which facilities for reasoning and analysis
developed
Actual reasoning with probability (assigning,
estimating, manipulating probabilities) will be treated
in detail in last of this course
Coin Tossing Example
Suppose you toss a coin and are interested in whether it shows
head (H) or tail (T)
Because of randomness, you do not know which outcome will
come in advance
However, you know the set of all possible outcomes is {H,T}
Suppose your belief indicates H and T are equally probable. Then,
in probability language we say the probability that
neither H nor T appears is 0
H appears is ½
T appears is ½
either H or T appears is 1
Note: the above are all possible subsets of {H,T}:
{H}
{T}
{H,T}
Coin Tossing Example (Cont’d)
Suppose you toss two coins and observe their faces
The set of all possible outcomes is
{HH, HT, TH, TT}
Assuming H/T are equally probable for both coins, the probability of
No H/T is 0
each of HH, HT, TH, TT is ¼
only one H is ½
at least one H is ¾
anyone of above is 1
Again, the above are possible subsets of {HH, HT, TH, TT}
{HH},{HT},{TH},{TT}
{TH, HT}
{TH, HT, HH}
{HH, HT, TH, TT}
each of which is subset of power set 2{HH, HT, TH, TT}:
{,
{HH},{HT},{TH},{TT},
{HH, HT}, {HH, TH}, {HH, TT}, {HT, TH}, {HT,TT},{TH,TT},
{HH, HT, TH}, {HH, HT, TT}, {HH, TH, TT}, {HT, TH, TT},
{HH, HT, TH, TT}}
Basic Ingredients
Suppose you are performing an experiment
Sample space S
The set of all possible outcomes of an experiment
E.g., {HH, HT, TH, TT} in two-coin tossing experiments
Event
Specification of state of world
or question that can be answered “yes” or “no”
E.g.,
Event (A) is subset of sample space (S):
AS
or element of power set
A2S
Probability (measure) is a function defined on events
“TH observed?”
“At least one H observed?”
Real Interval Example
Suppose you have real interval [0,1] and choose at random
(uniformly) a number within this interval
Then
Intuitively, the probability that the chosen number is
length of [0,1] is 1
lengths of intervals [0,0],[0.7,0.7], and [0.95,0.95] are all 0
lengths of intervals [0.2,0.3] and [0.4,0.6] are 0.1 and 0.2,
respectively
lengths of intervals [0.2,0.5] and [0.4,0.6] are 0.3 and 0.2,
respectively
within [0,1] is 1
outside [0,1] is 0
within [0.7,0.7] is 0
within [0.5, 0.7] is 0.2
outside [0.5, 0.7] is 0.8
in the union of [0.2,0.3] and [0.4,0.6] is 0.3
in the intersection of [0.2,0.5] and [0.4,0.6] is 0.1
Length of subinterval used as probability measure
Tentative Axioms of Probability
Given sample space S it might be desirable if we
can define function P (called probability) such
that
P()=0
0<P(A)<1 for all A2S
P(S)=1
P(i(Ai))=i(P(Ai)) for all disjoint events
(AiAj= if i≠j and Ai 2S for all different i): additivity
P(AC)=1-P(A)
P(AB)=P(A)+P(B)-P(AB)
Anomaly in Uncountable Set
Previous plan is OK when S is finite or countably
infinite (discrete)
For uncountably infinite S additivity makes
nonsense
Union of infinite events with probability 0 can make
event of probability one; This makes analysis tough
E.g., for [0,1] as S
Each singletons is of probability 0
Uncountable union of singletons constitute S which is of
probability 1
Tentative Axioms of Probability, Modified
Given a sample space S we define a
function P such that
0<P(A)<1 for all A2S
P(S)=1
P(i(Ai))=i(P(Ai)) for all countable disjoint
collection {Ai}: countable additivity
P(AC)=1-P(A)
P(AB)=P(A)+P(B)-P(AB)
Would this then, be enough?
Anomaly in Uncountable Set (Cont’d)
Unfortunately, it has shown that no useful probability
measure P can be defined for 2[0,1] (and other many
interesting spaces)
This is intrinsic property of sample space [0,1] (and other
many interesting spaces!)
We restrict our attention on an interesting collection
subsets of S
should be rich enough to include interesting events
should not be too rich so that probability can be defined
It might be desirable if satisfies
if A then AC
if countable collection {Ai} then i(Ai)
S
if countable collection {Ai} then i(Ai)
Such a collection is called a σ-algebra
Examples of σ-Algebra
For countably large sample space S
2S is default σ-algebra
For R (uncountable set)
{, R}: trivial to include any interesting events
2R : too rich to define any proper P
Borel σ-algebra
σ-algebra generated* by all open intervals ((a,b)R)
includes virtually all interesting events in R
(can be extended to RN )
default σ-algebra in R, RN, etc.
* Smallest σ-algebra including all open intervals
Axioms of Probability
Probability is function P defined on σalgebra on sample space S such that
0<P(A)<1 for all A
P(S)=1
P(i(Ai))=i(P(Ai)) for all countable events
(AiAj= if i≠j and Ai for all i)
Following properties follow immediately
P()=0
P(Ac)=1-P(A)
P(AB)=P(A)+P(B)-P(AB)
How do we describe probability?:
Probability Distribution Function
For finite or countable S, prescribe P({x}) for all xS (table)
p(x)=P({x}) is called the probability mass function
For S=R?
For any AR, define P(A)=length(A)=xAdx. Is this a
probability?
Let f(x) be any nonnegative function defined on R such that
xRf(x)dx=1. Define P(A)= xAf(x)dx. Is this a probability?
Such f is called the probability density function (pdf)
Example: f(x)=exp(x2)/(2)½ the Gaussian or normal density
Similar results hold for RN
Probability mass functions and pdfs are called probability
distribution functions
Random Variables
Suppose a probability P is defined on S
Sometimes we are interested in defining a function X from S
to another space N and doing probabilistic reasoning in N
In this case, probability on N can be induced from that of S
E.g., when you toss three coins you might be interested in
only the number of heads (rather than exact outcome)
S: {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
N: {0,1,2,3}
X:S N summarizes events in S
Probability of X=2 can be induced from P:
P(X-1(2))=P ({HHT,HTH,THH})=3/8
Such function X is called a random variable
Random Variables (Cont’d)
Suppose a probability P is defined on S
For a function X:SN to be a random variable it should
induce a probability G on N based on P:
For each event BN, X-1(B)S and X-1(BC)S
For events B,C N, X-1(BC)S, etc.
Inverse image of σ-algebra on N should be included in the
σ-algebra on S
Probability distribution functions can be induced in N
In practice, virtually any function you can think of satisfies
this condition
Often, output space N is
R: X is called a random variable
RN: X is called a random vector
R: X is called a random process
References
Part of this slide comes from Dr. Jahwan Kim’s
Lecture slide “Probability”
Introductory textbook
Textbooks for measure theoretic probability
S. Ross, A first course in probability, 6th ed.
P. Billingsley, Probability and Measure, 3rd ed.
D. Pollard, User’s Guide to Measure Theoretic
Probability
Reference book
E. Jaynes, Probability Theory: The Logic of Science