Transcript Document
A SHORT INTRODUCTION TO
PROBABILITY
Because of the stochastic nature of genetics
and evolution, we have to rely on the theory
of probability.
Terminology
The possible outcomes of a stochastic process
are called events. (A deterministic process
has only one possible outcome.)
A stochastic process may have a finite or an
infinite number of outcomes.
The probability of a particular event is the
fraction of outcomes in which the event
occurs. The probability of event A is denoted
by P(A).
Terminology
Probability values are between 0 (the event
never occurs) and 1 (the event always
occurs).
Events may or may not be mutually
exclusive.
Events that are not mutually exclusive are
called independent events.
The birth of a son or a
daughter are
mutually exclusive
events.
The birth of a daughter
and the birth of carrier
of the sickle-cell anemia
allele are not mutually
exclusive (they are
independent events).
Terminology
The sum of probabilities of all mutually
exclusive events in a process is 1. For
example, if there are n possible mutually
exclusive outcomes, then
n
P(i) 1
i1
Simple probabilities
If A and B are mutually exclusive events,
then the probability of either A or B to occur
is the union
P(A B) P(A) P(B)
Example: The probability of a hat being red is ¼, the probability of
the hat being green is ¼, and the probability of the hat being black is
½. Then, the probability of a hat being red OR black is ¾.
Simple probabilities
If A and B are independent events, then the
probability that both A and B occur is the
intersection
P(A B) P(A)P(B)
Simple probabilities
Example: The probability that a US president is bearded is
~14%, the probability that a US president died in office is
~19%, thus the probability that a president both had a beard
and died in office is ~3%. If the two events are independent,
1.3 bearded presidents are expected to fulfill the two
conditions. In reality, 2 bearded presidents died in office. (A
close enough result.)
Harrison, Taylor, Lincoln*, Garfield*, McKinley*, Harding, Roosevelt, Kennedy* (*assassinated)
Conditional probabilities
What is the probability of event A to occur
given than event B did occur. The conditional
probability of A given B is
P(A B)
P(A | B)
P(A)
Example: The probability that a US president dies in office if he is
bearded 0.03/0.14 = 22%. Thus, out of 6 bearded presidents, 22% (or
1.3) are expected to die. In reality, 2 died. (Again, a close enough
result.)
Permutations
The number of possible permutations is the
number of different orders in which particular
events occur. The number of possible
permutations are
n!
Np
(n r )!
where r is the number of events in the series, n is the
number of possible events, and n! denotes the factorial of n
= the product of all the positive integers from 1 to n.
Permutations
In how many ways can 8 CD’s be
arranged on a shelf?
n!
Np
(n r )!
n 8
r 8
8!
8!
Np
40, 320
(8 8)! 1
Permutations
In how many ways can 4 CD’s (out of a
collection of 8 CD’s) be arranged on a
shelf?
n!
Np
(n r )!
n 8
r4
8!
8!
Np
1, 680
(8 4)! 4!
Combinations
When the order in which the events occurred
is of no interest, we are dealing with
combinations. The number of possible
combinations is
n
n!
Nc
r r!(n r)!
where r is the number of events in the series, n is the
number of possible events, and n! denotes the factorial of n
= the product of all the positive integers from 1 to n.
Combinations
How many groups of 4 CDs are there in a
collection of 8 CDs)?
n
n!
Nc
r r!(n r)!
n 8
r4
8
8!
8!
Nc
70
4 4!(8 4)! 4!4!
Probability Distribution
The probability distribution refers
to the frequency with which all
possible outcomes occur. There are
numerous types of probability
distribution.
The uniform distribution
A variable is said to be uniformly distributed if the
probability of all possible outcomes are equal to one
another. Thus, the probability P(i), where i is one of n
possible outcomes, is
1
P(i)
n
The binomial distribution
A process that has only two possible outcomes is called a
binomial process. In statistics, the two outcomes are
frequently denoted as success and failure. The
probabilities of a success or a failure are denoted by p and
q, respectively. Note that p + q = 1. The binomial
distribution gives the probability of exactly k successes in
n trials
n k
n
k
P(k) p 1 p
k
The binomial distribution
The mean and variance of a binomially distributed variable
are given by
np
V npq
The Poisson distribution
Poisson d’April
Siméon Denis Poisson
1781-1840
The Poisson distribution
When the probability of “success” is very small, e.g., the
probability of a mutation, then pk and (1 – p)n – k become
too small to calculate exactly by the binomial distribution.
In such cases, the Poisson distribution becomes useful.
Let l be the expected number of successes in a process
consisting of n trials, i.e., l = np. The probability of
observing k successes is
P(k)
k
l
le
k!
The mean and variance of a Poisson distributed
variable are given by = l and V = l, respectively.
Normal Distribution
Gamma Distribution