Transcript Document
COMP 538
Introduction of Bayesian networks
Lecture 16: Wrap-Up
Phylogeny / Slide 2
Nevin L. Zhang, HKUST
Recap
Latent class models
Clustering
Clustering criterion: conditional independence
Drawback: Assumption too strong
Hierarchical latent class (HLC) models
Identifiability issues: regularity, equivalence
Hill climbing algorithm
Phylogeny / Slide 3
Nevin L. Zhang, HKUST
Today
Phylogenetic (evolution) trees
Closely related to HLC models
An example of viewing existing models in the framework of BN
– Another example: HMM
Interesting because
– Ease understanding
– Techniques in one field applied to another
Structural EM for phylogenetic trees
Dynamic BNs for speech understanding
– Development of general purpose algorithms
Bayesian networks for classification
Hand waving only
Phylogeny / Slide 4
Nevin L. Zhang, HKUST
Phylogenetic Tree Outline
Introduction to phylogenetic trees
Probabilistic models of evolution
Tree reconstruction
Phylogeny / Slide 5
Nevin L. Zhang, HKUST
Phylogenetic Trees
Assumption
Phylogeny
All organisms on Earth have a common ancestor
This implies that any set of species is related.
The relationship between any set of species.
Phylogenetic tree
Usually, the relationship can be represented by a tree which is
called a phylogenetic (evolution) tree
– this is not always true
Phylogeny / Slide 6
Nevin L. Zhang, HKUST
Phylogenetic Trees
Phylogenetic trees
Time
giant
panda
lesser
panda
moose
goshawk
duck
vulture
alligator
Current-day species at bottom
Phylogeny / Slide 7
Nevin L. Zhang, HKUST
Phylogenetic Trees
TAXA (sequences) identify species
Edge lengths represent evoluation time
Assumption: bifurcating tree toplogy
AAGGCCT
AAGACTT
AGCACTT
AAGGCAT
AGGGCAT
AGCACAA
TAGACTT
TAGCCCA
AGCGCTT
Time
Phylogeny / Slide 8
Nevin L. Zhang, HKUST
Probabilistic Models of Evolution
Characterize relationship between taxa using substitution probability:
– P(x | y, t): probability that ancestral sequence y evolves into sequence x
along an edge of length t
t5
x5
t1
s1
x7
t2
s2
t3
s3
x6
t6
t4
s4
– P(X7), P(X5|X7, t5), P(X6|X7, t6), P(S1|X5, t1), P(S2|X5, t2), ….
Phylogeny / Slide 9
Nevin L. Zhang, HKUST
Probabilistic Models of Evolution
What should P(x|y, t) be?
Two assumptions of commonly used models
There are only substitutions, no insertions/deletions (aligned)
– One-to-one correspondence between sites in different sequences
Each site evolves independently and identically
P(x|y, t) = Pi=1 to m P(x(i) | y(i), t)
m is sequence length
AAGGCCT
AAGACTT
AGCACTT
AAGGCAT
AGGGCAT
TAGACTT
TAGCCCA
AGCACAA
AGCGCTT
Phylogeny / Slide 10
Nevin L. Zhang, HKUST
Probabilistic Models of Evolution
What should P(x(i)|y(i), t) be?
Jukes-Cantor (Character Evolution) Model [1969]
– Rate of substitution a (Constant or parameter?)
A
A
rt
C
st
G
st
T
st
C
st
rt
st
st
G
st
st
rt
st
T
st
st
st
rt
rt = 1/4 (1 + 3e-4at)
st = 1/4 (1 - e-4at)
Limit values when
t = 0 or t = infinity?
Multiplicativity (lack of memory)
P(c | a, t1 t2 ) P(b | a, t1 ) P(c | b, t2 )
b
Phylogeny / Slide 11
Nevin L. Zhang, HKUST
Tree Reconstruction
Given: collection of
current-day taxa
Find: tree
Tree topology: T
Edge lengths: t
Maximum likelihood
AGGGCAT, TAGCCCA, TAGACTT,
AGCACAA, AGCGCTT
Find tree to maximize
P(data | tree)
AGGGCAT
AGCACAA
TAGACTT
TAGCCCA
AGCGCTT
Phylogeny / Slide 12
Nevin L. Zhang, HKUST
Tree Reconstruction
When restricted to one particular site, a phylogenetic tree is an HLC
model where
The structure is a binary tree and variables share the same state space.
The conditional probabilities are from the character evolution model,
parameterized by edge lengths instead of usual parameterization.
The model is the same for different sites
AAGGCCT
AAGACTT
AGCACTT
AGGGCAT
AGCACAA
TAGACTT
TAGCCCA
AGCGCTT
Phylogeny / Slide 13
Nevin L. Zhang, HKUST
Tree Reconstruction
Current-day Taxa: AGGGCAT, TAGCCCA, TAGACTT, AGCACAA,
AGCGCTT
Samples for HLC model. One Sample per site. The samples are i.i.d.
1st site:
(A, T, T, A, A),
2nd site: (G, A, A, G, G),
3rd site: (G, G, G, C, C),
…
AAGGCCT
AAGACTT
AGCACTT
AGGGCAT
AGCACAA
TAGACTT
TAGCCCA
AGCGCTT
Phylogeny / Slide 14
Nevin L. Zhang, HKUST
Tree Reconstruction
Finding ML phylogenetic tree == Finding ML HLC model
Model space:
Model structures: binary tree where all variables share the same
state space, which is known.
Parameterization: one parameter for each edge. (In general, P(x|y)
has |x||y|-1 parameters).
Phylogeny / Slide 15
Nevin L. Zhang, HKUST
Bayesian Networks for Classification
The problem:
Given data:
Find mapping
– (A1, A2, …, An) |- C
Possible solutions
ANN
Decision tree (Quinlan)
…
A1
A2
…
An
C
0
1
1
0
T
1
0
1
1
F
..
..
..
..
..
Phylogeny / Slide 16
Nevin L. Zhang, HKUST
Bayesian Networks for Classification
Naïve Bayes model
From data, learn
– P(C), P(Ai|C)
Classification
– arg max_c P(C=c|A1=a1, …, An=an)
Very good in practice
Phylogeny / Slide 17
Nevin L. Zhang, HKUST
Bayesian Networks for Classification
Drawback of NB:
Attributes mutually independent given class variable
Often violated, leading to doubling counting.
Fixes:
General BN classifiers
Tree augmented Naïve Bayes (TAN) models
Hierarchical NB
…
Phylogeny / Slide 18
Nevin L. Zhang, HKUST
Bayesian Networks for Classification
General BN classifier
Treat class variable just as another variable
Learn a BN.
Classify the next instance based on values of variables in the
Markov blanket of the class variable.
Pretty bad because it does not utilize all available information
Phylogeny / Slide 19
Nevin L. Zhang, HKUST
Bayesian Networks for Classification
TAN model
Friedman, N., Geiger, D., and Goldszmidt, M. (1997). Bayesian
networks classifiers. Machine Learning, 29:131-163.
Capture dependence among attributes using a tree structure.
During learning,
– First learn a tree among attributes: use Chow-Liu algorithm
– Add class variable and estimate parameters
Classification
– arg max_c P(C=c|A1=a1, …, An=an)
Phylogeny / Slide 20
Nevin L. Zhang, HKUST
Bayesian Networks for Classification
Hierarchical Naïve Bayes models
N. L. Zhang, T. D. Nielsen, and F. V. Jensen (2002). Latent variable
discovery in classification models. Artificial Intelligence in Medicine, to
appear.
Capture dependence among attributes using latent variables
Detect interesting latent structures besides classification
Currently, slow