Transcript Slides

CS5263 Bioinformatics
Lecture 9: Motif finding
Biological &
Statistical background
Roadmap
• Review of last lecture
• Intro to probability and statistics
• Intro to motif finding problems
– Biological background
Multiple Sequence Alignment
Scoring functions
• Ideally:
x
– Maximizes probability that
sequences evolved from
common ancestor
?
y
z
w
• In practice:
– Sum of Pairs
v
x:
y:
z:
x: ACGCGG-C
y: ACGC-GAC
AC-GCGG-C
AC-GC-GAG
GCCGC-GAG
x: AC-GCGG-C;
z: GCCGC-GAG;
y: AC-GCGAG
z: GCCGCGAG
Algorithms
•
•
•
•
MDP
Progressive alignment
Iterative refinement
Restricted DP
MDP
• Similar to pair-wise alignment
– O(2NLN) running time
– O(LN) memory
(i-1,j-1,k-1)
(i-1,j-1,k)
F(i-1,j-1,k-1) + S(xi, xj, xk),
F(i-1,j-1,k ) + S(xi, xj, -),
F(i-1,j ,k-1) + S(xi, -, xk),
F(i,j,k) = max F(i ,j-1,k-1) + S(-, xj, xk),
F(i-1,j ,k ) + S(xi, -, -),
F(i ,j-1,k ) + S(-, xj, -),
F(i ,j ,k-1) + S(-, -, xk)
(i,j-1,k-1)
(i,j-1,k)
(i-1,j,k-1)
(i-1,j,k)
(i,j,k-1)
(i,j,k)
Progressive alignment
• Most popular multiple alignment algorithm
– CLUSTALW
• Main idea:
– Construct a guide tree based on pair-wise
alignment scores
– Align the most similar sequences first
– Progressively add other sequences
• Pros: fast (O(NL2)
• Cons: initial bad alignment is frozen
Iterative Refinement
• Basic idea:
– Do progressive alignment first
– Iteratively:
• Remove a sequence, and realign it back while
keeping the rest fixed
• A note of its convergence guarantee
– Every time we realign a sequence, we
improve its score
– Therefore, the algorithm must converge to
either a global or local maximum
Restricted MDP
•
Similar to bounded DP in pair-wise alignment
1. Construct progressive multiple alignment m
2. Run MDP, restricted to radius R from m
z
y
Running Time: O(2N RN-1 L)
x
Today
• Probability and statistics
• Biology background for motif finding
Probability Basics
• Definition (informal)
– Probabilities are numbers assigned to events
that indicate “how likely” it is that the event
will occur when a random experiment is
performed
– A probability law for a random experiment is
a rule that assigns probabilities to the events
in the experiment
– The sample space S of a random experiment
is the set of all possible outcomes
Example
0  P(Ai)  1
P(S) = 1
Random variable
• A random variable is a function from a
sample to the space of possible values of
the variable
– When we toss a coin, the number of times
that we see heads is a random variable
– Can be discrete or continuous
• The resulting number after rolling a die
• The weight of an individual
Cumulative distribution function
(cdf)
• The cumulative distribution function FX(x)
of a random variable X is defined as the
probability of the event {X≤x}
F (x) = P(X ≤ x) for −∞ < x < +∞
Probability density function (pdf)
• The probability density function of a
continuous random variable X, if it
exists, is defined as the derivative of
FX(x)
• For discrete random variables, the
equivalent to the pdf is the probability
mass function (pmf):
Probability density function vs
probability
• What is the probability for
somebody weighting 200lb?
• The figure shows about 0.62
– What is the probability of
200.00001lb?
• The right question would be:
– What’s the probability for somebody
weighting 199-201lb.
• The probability mass function is
true probability
– The chance to get any face is 1/6
Some common distributions
• Discrete:
–
–
–
–
–
Binomial
Multinomial
Geometric
Hypergeometric
Possion
• Continuous
–
–
–
–
–
–
Normal (Gaussian)
Uniform
EVD
Gamma
Beta
…
Probabilistic Calculus
• If A, B are mutually exclusive:
– P(A U B) = P(A) + P(B)
• Thus: P(not(A)) = P(Ac) = 1 – P(A)
A
B
Probabilistic Calculus
• P(A U B) = P(A) + P(B) – P(A ∩ B)
Conditional probability
• The joint probability of two events A and B
P(A∩B), or simply P(A, B) is the probability that
event A and B occur at the same time.
• The conditional probability of P(B|A) is the
probability that B occurs given A occurred.
P(A | B) = P(A ∩ B) / P(B)
Example
• Roll a die
– If I tell you the number is less than 4
– What is the probability of an even number?
• P(d = even | d < 4) = P(d = even ∩ d < 4) / P(d < 4)
• P(d = 2) / P(d = 1, 2, or 3) = (1/6) / (3/6) = 1/3
Independence
• P(A | B) = P(A ∩ B) / P(B)
=> P(A ∩ B) = P(B) * P(A | B)
• A, B are independent iff
– P(A ∩ B) = P(A) * P(B)
– That is, P(A) = P(A | B)
• Also implies that P(B) = P(B | A)
– P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A)
Examples
• Are P(d = even) and P(d < 4) independent?
–
–
–
–
P(d = even and d < 4) = 1/6
P(d = even) = ½
P(d < 4) = ½
½ * ½ > 1/6
• If your die actually has 8 faces, will P(d = even)
and P(d < 5) be independent?
• Are P(even in first roll) and P(even in second
roll) independent?
• Playing card, are the suit and rank independent?
Theorem of total probability
• Let B1, B2, …, BN be mutually exclusive events whose union equals
the sample space S. We refer to these sets as a partition of S.
• An event A can be represented as:
•Since B1, B2, …, BN are mutually exclusive, then
P(A) = P(A∩B1) + P(A∩B2) + … + P(A∩BN)
•And therefore
P(A) = P(A|B1)*P(B1) + P(A|B2)*P(B2) + … + P(A|BN)*P(BN)
= i P(A | Bi) * P(Bi)
Example
• Row a loaded die, 50% time = 6, and 10%
time for each 1 to 5
• What’s the probability to have an even
number?
Prob(even)
= Prob(even | d < 6) * Prob(d<6)
+ Prob(even | d=6) * Prob(d=6)
= 2/5 * 0.5 + 1 * 0.5
= 0.7
Another example
• We have a box of dies, 99% of them are
fair, with 1/6 possibility for each face, 1%
are loaded so that six comes up 50% of
time. We pick up a die randomly and roll,
what’s the probability we’ll have a six?
• P(six) = P(six | fair) * P(fair) + P(six |
loaded) * P(loaded)
– 1/6 * 0.99 + 0.5 * 0.01 = 0.17 > 1/6
Bayes theorem
• P(A ∩ B) = P(B) * P(A | B) = P(A) * P(B | A)
Likelihood
=> P(B | A) =
Posterior probability of A
P ( A | B ) P (B )
Prior of B
P( A)
Normalizing constant
This is known as Bayes Theorem or Bayes Rule, and is (one of) the
most useful relations in probability and statistics
Bayes Theorem is definitely the fundamental relation in Statistical Pattern
Recognition
Bayes theorem (cont’d)
• Given B1, B2, …, BN, a partition of the sample
space S. Suppose that event A occurs; what is
the probability of event Bj?
• P(Bj | A) = P(A | Bj) * P(Bj) / P(A)
= P(A | Bj) * P(Bj) / jP(A | Bj)*P(Bj)
Bj: different models
In the observation of A, should you choose a model that maximizes
P(Bj | A) or P(A | Bj)? Depending on how much you know about Bj !
Example
• Prosecutor’s fallacy
– Some crime happened
– The suspect did not leave any evidence, except some
hair
– The police got his DNA from his hair
• Some expert matched the DNA with that of a
suspect
– Expert said that both the false-positive and false
negative rates are 10-6
• Can this be used as an evidence of guilty
against the suspect?
Prosecutor’s fallacy
•
•
•
•
Prob (match | innocent) = 10-6
Prob (no match | guilty) = 10-6
Prob (match | guilty) = 1 - 10-6 ~ 1
Prob (no match | innocent) = 1 - 10-6 ~ 1
• Prob (guilty | match) = ?
Prosecutor’s fallacy
P (g | m) = P (m | g) * P(g) / P (m)
~ P(g) / P(m)
• P(g): the probability for someone to be
guilty with no other evidence
• P(m): the probability for a DNA match
• How to get these two numbers?
– We don’t really care P(m)
– We want to compare two models:
• P(g | m) and P(i | m)
Prosecutor’s fallacy
• P(i | m) = P(m | i) * P(i) / P(m)
= 10-6 * P(i) / P(m)
• Therefore
P(i | m) / P(g | m) = 10-6 * P(i) / P(g)
• P(i) + P(g) = 1
• It is clear, therefore, that whether we can conclude the
suspect is guilty depends on the prior probability P(i)
• How do you get P(i)?
Prosecutor’s fallacy
• How do you get P(i)?
• Depending on what other information you have on the
suspect
• Say if the suspect has no other connection with the
crime, and the overall crime rate is 10-7
• That’s a reasonable prior for P(g)
• P(g) = 10-7, P(i) ~ 1
• P(i | m) / P(g | m) = 10-6 * P(i) / P(g) = 10-6/10-7 = 10
• P(observation | model1) / P(observation | model2):
likelihood-ratio test
• LR test
• Often take logarithm: log (P(m|i) / P(m|i))
• Log likelihood ratio (score)
• Or log odds ratio (score)
• Bayesian model selection:
log (P(model1 | observation) / P(model2 | observation))
= LLR + log P(model1) - log P(model2)
Prosecutor’s fallacy
• P(i | m) / P(g | m) = 10-6/10-7 = 10
• Therefore, we would say the suspect is
more likely to be innocent than guilty,
given only the DNA samples
• We can also explicitly calculate P(i | m):
P(m) = P(m|i)*P(i) + P(m|g)*P(g)
= 10-6 * 1 + 1 * 10-7
= 1.1 x 10-6
P(i | m) = P(m | i) * P(i) / P(m) = 1 / 1.1 = 0.91
Prosecutor’s fallacy
• If you have other evidences, P(g) could be much larger
than the average crime rate
• In that case, DNA test may give you higher confidence
• How to decide prior?
–
–
–
–
–
Subjective?
Important?
There are debates about Bayes statistics historically
Some strongly support, some strongly against
Growing interests in many fields
• However, no question about conditional probability
• If all priors are equally possible, decisions based on
bayes inference and likelihood test are equivalent
• We use whichever is appropriate
Another example
• A test for a rare disease claims that it will
report a positive result for 99.5% of people
with the disease, and 99.9% of time of
those without.
• The disease is present in the population at
1 in 100,000
• What is P(disease | positive test)?
• What is P(disease | negative test)?
Yet another example
• We’ve talked about the boxes of casinos
• 99% fair, 1% loaded (50% at six)
• We said if we randomly pick a die and roll,
we have 17% of chance to get a six
• If we get 3 six in a row, what’s the chance
that the die is loaded?
• How about 5 six in a row?
• P(loaded | 3 six in a row) = P(3 six in a row
| loaded) * P(loaded) / P(3 six in a row) =
0.5^3 * 0.01 / (0.5^3 * 0.01 + (1/6)^3 *
0.99) = 0.21
• P(loaded | 5 six in a row) = P(5 six in a row
| loaded) * P(loaded) / P(5 six in a row) =
0.5^5 * 0.01 / (0.5^5 * 0.01 + (1/6)^5 *
0.99) = 0.71
Relation to multiple testing problem
• When searching a DNA sequence against a database,
you get a high score, with a significant p-value
• P(unrelated | high score) / P(related | high score) =
P(high score | unrelated) * P(unrelated)
P(high score | related) * P(related)
Likelihood ratio
• P(high score | unrelated) is much smaller than P(high
score | related)
• But your database is huge, and most sequences should
be unrelated, so P(unrelated) is much larger than
P(related)
Question
• We’ve seen that given a sequence of
observations, and two models, we can test
which model is more likely to generate the data
– Is the die loaded or fair?
– Either likelihood test or Bayes inference
• Given a set of observations, and a model, can
you estimate the parameters?
– Given the results of rolling a die, how to infer the
probability of each face?
Question
• You are told that there are two dice, one is
loaded with 50% to be six, one is fair.
• Give you a series of numbers resulted
from rolling the two dice
• Assume die switching is rare
• Can you tell which number is generated by
which die?
Question
• You are told that there are two dice, one is
loaded, one is fair. But you don’t know how
it is loaded
• Give you a series of numbers resulted
from rolling the two dice
• Assume die switching is rare
• Can you tell how is the die loaded and
which number is generated by which die?