Slides - TAMU Computer Science Faculty Pages

Download Report

Transcript Slides - TAMU Computer Science Faculty Pages

Learning Multiple Evolutionary Pathways
from Cross-sectional Data
Niko Beerenwinkel, Jorg Rahnenfuhrer, Martin Daumer,
Daniel Hoffmann,Rolf Kaiser, Joachim Selbig,Thomas Lengauer, RECOMB’04
Ning Wei
Texas A&M University
Content
•
•
•
•
•
Introduction
Definition
Algorithm
Discussion
Homework
Introduction
• A directed weighted trees, which generates
the probability distribution on the set of all
patterns of genetic events, to identify directed
dependencies between mutational events in
the evolutionary processes.
• Vertices represent the mutational events.
• Edge weights represent conditional
probabilities between events.
• EM (Expectation--Maximization)-like Learning
Algorithm to generate the mutagenetic trees.
Definition
• Different events l {1, …, l}, initially as “null events”.
• A pattern xi of events: xi = (xi1, . . . , xil)
• A set of observed patterns is represented by the matrix:
• A mutagenetic tree
consists of vertices V,
edges E with at most one entering edge, null events or
no entering edge vertex r, and a map p(e) =Pr(j2|j1) >[0,1] (the conditional probability of event j2 given j1
occurred, if p(e)=0, delete e from E).
Definition (cont’d)
• Here is the example of
the mutagenetic tree.
Definition (cont’d)
• Since the noisy real world data, they have
likelihood zero under estimated mutagenetic
tree and do not capture all of the pathways.
• K-mutagenetic trees mixture model:
• Given
and is a random
variable in [0,1], the model:
• The likelihood of a pattern x
where
and S is the set
of all vertices reachable from r in the sub tree
(V, E’)
Definition (cont’d)
• Example for calculating the
likelihood between two events
70R and 219Q:
• L(x|T)=0.46*0.43*(1-0.46)*(10.65)=0.037
Definition (cont’d)
• Under the mixture model M, the tree would be
a star with all patterns of events positive
likelihood.
• Assume that in addition to different pathways
of accumulation of events, there is a certain
probability  of any event occurring
spontaneously independent of all other events
referred as the noise component of the model.
• The algorithm assigns the  to T1, and T1 is a
star tree with p(e)=  for all eE1.
EM-like Learning Algorithm
• To construct k-mutagenetic mixture model
trees that maximize the log-likelihood of the
data from the observed patterns X.
EM-like Learning Algorithm (cont’d)
• Initialize the parameters, set the
responsibilities for estimating the joint
probabilities:
EM-like Learning Algorithm (cont’d)
• Update the model parameters:
EM-like Learning Algorithm (cont’d)
• Compute the weight w from the previous pk
and find out the maximum weight tree Tk:
• Pr(j) the marginal probability of event j; Pr(j1,
j2) is the joint probability of events j1 and j2
estimated by the previous steps.
EM-like Learning Algorithm (cont’d)
• Update the responsibilities:
EM-like Learning Algorithm (cont’d)
• Iterate the above steps until the log-likelihood
function doe not increase any more.
• According to the distribution of log-likelihood
and the number of trees K, pick K=3.
EM-like Learning Algorithm (cont’d)
• 3-Mutagenetics trees mixture model:
Discussion
• The mixture model maps 87% of the observed patterns
into their identified mutagenetic trees.
• Study the stability of the mutagenetic trees by bootstrap.
• Applications are the resistance pathways by drugs,
designing the treatment protocols, combination therapy
pathways and so on.
• EM learning algorithm could be extended to a modelbased clustering algorithm.
• The mixture model may be adequate to identify a few
parallel pathways, while directed acyclic graph (DAG)
would be expected to be more appropriate for highly
connected network of events aided by the ML (maximum
likelihood) estimation of the model parameters.
Homework
• Modify the K-mutagenetic tress mixture model
learning algorithm to obtain a clustering
algorithm. Explain the reasons and the
predefined conditions for applying your
clustering algorithm. (Hint: section 6.2 of the
paper.)
Thanks
• Any question, please email me at
[email protected]