Transcript Lecture3

Probability theory
Petter Mostad
2005.09.15
Sample space
• The set of possible outcomes you consider
for the problem you look at
• You subdivide into different outcomes only
as far as is relevant for your problem
• The sample space is the start of a simplified
model for reality
• Events are subsets of the sample space
Set theory
•
•
•
•
The complement of subset A is denoted A
The intersection of A and B is denoted A  B
The union of A and B is denoted A  B
A and B are called mutually exclusive or
disjoint if A  B   where  is the
empty set
• If A1 , A2 ,..., An are subsets of a sample
space S , then they are called collectively
exhaustive if A1  A2  ...  An  S
Venn diagrams
A
A
A
A B
A
B
A B
B
A
B
C
( A  C)  (B  C)
Computations with sets
Examples of rules you can prove:
• ( A  B)  C  A  ( B  C )
• ( A  B)  C  A  ( B  C )
• A  ( A  B)  ( A  B )
• If A1 , A2 , A3 are mutually exclusive and
collectively exhaustive, then
B  ( B  A1 )  ( B  A2 )  ( B  A3 )
Events as subsets
• The union of events corresponds to ”or”, the
intersection to ”and”, and the complement to
”not”. Examples:
–
–
–
–
–
–
A : The patient is given drug X
B : The patient dies
A  B : The patient is given drug X and dies
A  B : The patient is given drug X or s/he dies or both
B : The patient does not die
A  B : The patient is not given drug X, and dies
Definitions of probability
• If A is a set of outcomes of an experiment, and if
when repeating this experiment many times the
frequency of A approaches a limit p, then the
probability of A is said to be p.
Or:
• The probability of A is your ”belief” that A is true
or will happen: It is an attribute of your
knowledge about A. Over time, events given
probability p should happen with frequency p.
Properties of probabilities
• When probabilities are assigned to all
outcomes in a sample space, such that
– All probabilities are positive or zero.
– The probabilities add up to one.
then we say we have a probability model
• The probability of an event A is the sum of
the probabilities of the outcomes in A, and
is written P(A)
Computations with probabilities
Some consequences of the set-theory results:
• P( A)  1  P( A)
• When A and B are mutually exclusive,
P( A  B)  P( A)  P( B)
• In general, P( A)  P( A)  P( B)  P( A  B)
Conditional probability
• If some information limits the sample space
to a subset A, the relative probabilities for
outcomes in A are the same, but they are
scaled up so that they sum to 1.
• We write P(B|A) for the probability of event
B given event A.
P
(
B

A
)
• In symbols: P( B | A) 
P( A)
The law of total probability
• As A  B and A  B are disjoint, we get
P( A)  P( A  B)  P( A  B )
• Together with the definition of conditional
probability, this gives the law of total
probability:
P( A)  P( A | B) P( B)  P( A | B ) P( B )
Statistical independence
• If P(B|A)=P(B), we say that B is statistically
independent of A.
• We can easily see that this happens if and
only if P( A  B)  P( A) P( B)
• Thus B is statistically independent of A if
and only if A is statistically independent of
B.
Bayes theorem
• Bayes theorem says that:
P( A | B) P( B)
P( B | A) 
P( A)
• This can be deduced simply from definition
of conditional probability:
P( B | A) P( A)  P( A  B)  P( A | B) P( B)
• Together with the law of total probability:
P( A | B) P( B)
P( B | A) 
P( A | B) P( B)  P( A | B ) P( B )
Example
• A disease X has a prevalence of 1%. A test
for X exists, and
– If you are ill, the test is positive in 90% of cases
– If you are not ill, the test is positive in 10% of
cases.
• You have a positive test: What is the
probability that you have X?
Joint and marginal probabilities
Assume A1 , A2 ,..., An are mutually exclusive
and collectively exhaustive. Assume the
same for B1 , B2 ,..., Bm . Then:
• P( Ai  B j ) are called joint probabilities
• P( Ai ) or P ( B j ) are called marginal
probabilities
• If every Ai is statistically independent of
every B j then the two subdivisions are
called independent attributes
Odds
• The odds for an event is its probability
divided by the probability of its
complement.
• What is the odds of A if P(A) = 0.8?
• What can you say about the probability of A
if its odds is larger than 1?
Overinvolvement ratios
• If you want to see how A depends
differently on B or C, you can compute the
overinvolvement ratio: P( A | B)
P( A | C )
• Example: If the probability to get lung
cancer is 0.5% for smokers and 0.1% for
non-smokers, what is the overinvolvement
ratio?
Random variables
• A random variable is a probability model
where each outcome is a number.
• For discrete random variables, it is
meaningful to talk about the probability of
each specific number.
• For continuous random variables, we only
talk about the probability for intervals.
PDF and CDF
• For discrete random variables, the probability
density function (PDF) is simply the same as the
probability function of each outcome.
• The cumulative density function (CDF) at a value
x is the cumulative sum of the PDF for values up
to and including x.
• Example: A die throw has outcomes 1,2,3,4,5,6.
What is the CDF at 4?
Expected value
• The expected value of a discrete random variable
is the weighted average of its possible outcomes,
with the probabilities as weights.
• For a random variable X with outcomes x1, x2, …,
xn with probabilities P(x1), P(x2), …, P(xn), the
expected value E(X) is
E ( X )  P( x1 ) x1  P( x2 ) x2  ...  P( xn ) xn
• Example: What is the expected value when
throwing a die?
Properties of the expected value
• We can construct a new random variable
Y=aX+b from a random variable X and
numbers a and b. (When X has outcome x,
Y has outcome ax+b, and the probabilities
are the same).
• We can then see that E(Y) = aE(X)+b
• We can also construct for example the
random variable X*X = X2
Variance and standard deviation
• The variance of a stochastic variable X is
 X2  Var ( X )  E (( X   X )2 )
where X  E( X ) is the expected value.
• The standard deviation is the square root of
the variance.
• We can show that Var (aX  b)  a 2Var ( X )
• We can also show Var ( X )  E ( X 2 )   X2
Example: Bernoulli random variable
• A Bernoulli random variable X takes the
value 1 with probability p and the value 0
with probability 1-p.
• E(X) = p
• Var(X) = p(1-p)
• What is the variance for a single die throw?
Some combinatorics
• How many ways can you make ordered
selections of s objects from n objects?
Answer: n*(n-1)*(n-2)*…*(n-s+1))
• How many ways can you order n objects?
Answer: n*(n-1)*…*2*1 = n! (”n faculty”)
• How many ways can you make unordered
selections of s objects from n objects?
Answer: n  (n  1)   (n  s  1)
n
n!
s!

 
s !(n  s)!  s 