Transcript Document

Basic Probability - Part 1
Kwang In Kim
A. I. Lab
Terms



The set of all possible subsets of set S is called
power set S and denoted as 2S
A subset of S containing single element AS is
called singleton
(singleton {A} and element A are distinct!)
When S consists of infinitely many element, size
of S can be classified into


Countable, if correspondence can be made for each
element of S with distinct natural number
Uncountable, if not countable
Contents



Acting under uncertainty
Basic Idea of probability
Tools for Probability

Probability axioms


σ-algebra
Basic notation

probability distribution, random variables
Acting under Uncertainty


Causes  Consequences
Logic does not fit everywhere


Laziness: There may be too many cases and
their causal relations
Ignorance: It may be practically impossible to
obtain all information necessary for
conclusion
Acting under Uncertainty (Cont’d)

Examples

Dental Diagnosis

If symptom(toothache) then disease(cavity)


If symptom(toothache) then disease(cavity) or
disease(abscess) or …


No wrong; But it will be difficult to consider all
Coin tossing experiment:
If I toss a coin, will head appear?


Wrong; Many other reasons exist
Too many factors to consider: coin weight, temperature,
tossing direction, speed, etc.
It is sometimes required to do reasoning even
under uncertainty (with incomplete knowledge)
Probability

Probability provides a way of summarizing
the uncertainty (or randomness)



represents degree of our belief / knowledge
provides tool for manipulating this
Example


If symptom(toothache) then disease(cavity)
with 80% chance (or 0.8 probability)
“Head appeared with 60/100 relative
frequency for this coin”
Axioms of Probability



As did many great scientific discoveries,
probability theory did not first started in
axiomatic forms
It has discovered however, that some desirable
properties cannot be satisfied simultaneously
Probability axioms are results of moderating
between desiderata and provide background on
which facilities for reasoning and analysis
developed

Actual reasoning with probability (assigning,
estimating, manipulating probabilities) will be treated
in detail in last of this course
Coin Tossing Example




Suppose you toss a coin and are interested in whether it shows
head (H) or tail (T)
Because of randomness, you do not know which outcome will
come in advance
However, you know the set of all possible outcomes is {H,T}
Suppose your belief indicates H and T are equally probable. Then,
in probability language we say the probability that





neither H nor T appears is 0
H appears is ½
T appears is ½
either H or T appears is 1
Note: the above are all possible subsets of {H,T}:





{H}
{T}
{H,T}
Coin Tossing Example (Cont’d)



Suppose you toss two coins and observe their faces
The set of all possible outcomes is
{HH, HT, TH, TT}
Assuming H/T are equally probable for both coins, the probability of






No H/T is 0
each of HH, HT, TH, TT is ¼
only one H is ½
at least one H is ¾
anyone of above is 1
Again, the above are possible subsets of {HH, HT, TH, TT}






{HH},{HT},{TH},{TT}
{TH, HT}
{TH, HT, HH}
{HH, HT, TH, TT}
each of which is subset of power set 2{HH, HT, TH, TT}:
{,
{HH},{HT},{TH},{TT},
{HH, HT}, {HH, TH}, {HH, TT}, {HT, TH}, {HT,TT},{TH,TT},
{HH, HT, TH}, {HH, HT, TT}, {HH, TH, TT}, {HT, TH, TT},
{HH, HT, TH, TT}}
Basic Ingredients



Suppose you are performing an experiment
Sample space S

The set of all possible outcomes of an experiment

E.g., {HH, HT, TH, TT} in two-coin tossing experiments
Event

Specification of state of world
or question that can be answered “yes” or “no”

E.g.,


Event (A) is subset of sample space (S):
AS
or element of power set
A2S
Probability (measure) is a function defined on events


“TH observed?”
“At least one H observed?”
Real Interval Example


Suppose you have real interval [0,1] and choose at random
(uniformly) a number within this interval
Then





Intuitively, the probability that the chosen number is








length of [0,1] is 1
lengths of intervals [0,0],[0.7,0.7], and [0.95,0.95] are all 0
lengths of intervals [0.2,0.3] and [0.4,0.6] are 0.1 and 0.2,
respectively
lengths of intervals [0.2,0.5] and [0.4,0.6] are 0.3 and 0.2,
respectively
within [0,1] is 1
outside [0,1] is 0
within [0.7,0.7] is 0
within [0.5, 0.7] is 0.2
outside [0.5, 0.7] is 0.8
in the union of [0.2,0.3] and [0.4,0.6] is 0.3
in the intersection of [0.2,0.5] and [0.4,0.6] is 0.1
Length of subinterval used as probability measure
Tentative Axioms of Probability

Given sample space S it might be desirable if we
can define function P (called probability) such
that






P()=0
0<P(A)<1 for all A2S
P(S)=1
P(i(Ai))=i(P(Ai)) for all disjoint events
(AiAj= if i≠j and Ai 2S for all different i): additivity
P(AC)=1-P(A)
P(AB)=P(A)+P(B)-P(AB)
Anomaly in Uncountable Set


Previous plan is OK when S is finite or countably
infinite (discrete)
For uncountably infinite S additivity makes
nonsense


Union of infinite events with probability 0 can make
event of probability one; This makes analysis tough
E.g., for [0,1] as S


Each singletons is of probability 0
Uncountable union of singletons constitute S which is of
probability 1
Tentative Axioms of Probability, Modified

Given a sample space S we define a
function P such that






0<P(A)<1 for all A2S
P(S)=1
P(i(Ai))=i(P(Ai)) for all countable disjoint
collection {Ai}: countable additivity
P(AC)=1-P(A)
P(AB)=P(A)+P(B)-P(AB)
Would this then, be enough?
Anomaly in Uncountable Set (Cont’d)




Unfortunately, it has shown that no useful probability
measure P can be defined for 2[0,1] (and other many
interesting spaces)

This is intrinsic property of sample space [0,1] (and other
many interesting spaces!)
We restrict our attention on an interesting collection  
subsets of S

 should be rich enough to include interesting events

 should not be too rich so that probability can be defined
It might be desirable if  satisfies



if A then AC

if countable collection {Ai} then i(Ai)

S

if countable collection {Ai} then i(Ai)
Such a collection is called a σ-algebra
Examples of σ-Algebra

For countably large sample space S


2S is default σ-algebra
For R (uncountable set)



{, R}: trivial to include any interesting events
2R : too rich to define any proper P
Borel σ-algebra

σ-algebra generated* by all open intervals ((a,b)R)


includes virtually all interesting events in R
(can be extended to RN )
default σ-algebra in R, RN, etc.
* Smallest σ-algebra including all open intervals
Axioms of Probability

Probability is function P defined on σalgebra  on sample space S such that




0<P(A)<1 for all A
P(S)=1
P(i(Ai))=i(P(Ai)) for all countable events
(AiAj= if i≠j and Ai for all i)
Following properties follow immediately



P()=0
P(Ac)=1-P(A)
P(AB)=P(A)+P(B)-P(AB)
How do we describe probability?:
Probability Distribution Function



For finite or countable S, prescribe P({x}) for all xS (table)

p(x)=P({x}) is called the probability mass function
For S=R?

For any AR, define P(A)=length(A)=xAdx. Is this a
probability?

Let f(x) be any nonnegative function defined on R such that
xRf(x)dx=1. Define P(A)= xAf(x)dx. Is this a probability?

Such f is called the probability density function (pdf)

Example: f(x)=exp(x2)/(2)½ the Gaussian or normal density

Similar results hold for RN
Probability mass functions and pdfs are called probability
distribution functions
Random Variables





Suppose a probability P is defined on S
Sometimes we are interested in defining a function X from S
to another space N and doing probabilistic reasoning in N
In this case, probability on N can be induced from that of S
E.g., when you toss three coins you might be interested in
only the number of heads (rather than exact outcome)

S: {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

N: {0,1,2,3}

X:S N summarizes events in S

Probability of X=2 can be induced from P:
P(X-1(2))=P ({HHT,HTH,THH})=3/8
Such function X is called a random variable
Random Variables (Cont’d)





Suppose a probability P is defined on S
For a function X:SN to be a random variable it should
induce a probability G on N based on P:

For each event BN, X-1(B)S and X-1(BC)S

For events B,C N, X-1(BC)S, etc.

Inverse image of σ-algebra on N should be included in the
σ-algebra on S
Probability distribution functions can be induced in N
In practice, virtually any function you can think of satisfies
this condition
Often, output space N is

R: X is called a random variable

RN: X is called a random vector

R: X is called a random process
References


Part of this slide comes from Dr. Jahwan Kim’s
Lecture slide “Probability”
Introductory textbook


Textbooks for measure theoretic probability



S. Ross, A first course in probability, 6th ed.
P. Billingsley, Probability and Measure, 3rd ed.
D. Pollard, User’s Guide to Measure Theoretic
Probability
Reference book

E. Jaynes, Probability Theory: The Logic of Science