Combining Inductive Logic Programming, Active
Download
Report
Transcript Combining Inductive Logic Programming, Active
Combining Inductive Logic Programming,
Active Learning and Robotics to Discover
the Function of Genes
by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell,
P. Reiser and R.D. King
Presenter: Mark H. Rich
2/7/2003
University of Wisconsin - Madison
CS 838 Learning and Modeling Biological Networks
Discovering Gene Function
Yeast (S. cerevisiae) has 6,000 proteinencoding genes
Only 60% can be assigned function with
confidence
The cell is a bio-chemical machine
Logic can help us discover these
metabolic functions and networks
Robot Scientist
Background
Knowledge
Learning
Engine
Analysis
New
Knowledge
Experiment
Selection
Results
ASE-Progol
Outline
Introduction
Abduction and Active Learning
Functional Genomics
Metabolism in Logic
Experiments
Results
Logic in AI
Deduction
Given facts with sound and complete proof
theory, show that other facts can be proven
Induction
Given positive and negative examples of
facts and background knowledge, find
hypothesis that explains difference
between positives and negatives
Abduction and TCIE
Given a theory and partial facts,
discover what facts are missing to form
one consistent hypothesis
Lateral Thinking Puzzles
Presented with a confusing situation
There is an Oracle that knows what
happened
You can only ask yes or no questions
The Mysterious Package
One day a man received a parcel in the
post. Carefully packed was a human
arm. He examined it, repacked it and
then sent it on to another man. The
second man also carefully examined the
arm before taking it to the woods and
burying it. Why did they do this?
The Mysterious Package
Was the arm cut off intentionally?
Is the arm’s person still alive?
Is he a doctor?
Did the three men know each other?
Are the other men also missing an arm?
Were they ever stuck on a desert island with
no food, make a pact to each cut off an arm
to eat and survive, but were rescued before
the doctor could cut off his own arm, and the
doctor later fulfilled his commitment? YES!
Lateral Thinking Lessons
Certain questions are valuable and lead
to large leaps of information . . .
How do we form hypotheses?
How can we pick good questions?
probability that question leads to consistent
hypotheses
cost of asking question
We want to find quickest cheapest path
to consistent hypotheses
Hypothesis Generation
Use contra-positives for inverse
entailment
Background Knowledge
hasbeak(X) :- bird(X).
bird(X) :- vulture(X).
Example
hasbeak(tweety).
Hypotheses
bird(tweety).
bird(X).
vulture(tweety).
vulture(X).
Trial Selection Theory
e1 e2 e3 e4
One possible trial path
e1
H1 0
1
1
1
H2 1
1
0
1
e2
t
H3 1
0
1
1
f
t
H2
f
H3
H1
Hypothesis Probability
Each trial partitions H into {H[t],H[t’]}
Assuming optimal encoding scheme…
Prior probability of each hypothesis
2Compression( hi |E )
p(hi | E)
Compression( h|E )
2
hH
Compression is rounded f measure
E
f E
(l1 l2 n)
p
Experiment Cost
Ct is the cost of a trial t
p(t)
p(h)
h H t
Ct p(t)(meant' (T t )Ct' )J H t
EC(H,T) min t T
(1
p(t))(mean
C
)J
t'
(T
t
)
t'
H t
J H p(h)log 2 ( p(h))
hH
Functional Genomics
Want to learn gene-enzyme mapping
Genes encode for
Enzymes that catalyze reactions between
Metabolites to eventually create
Amino Acid Products
Perform auxotrophic growth
experiments to determine phenotype
Functional Genomics: Simple
gene1
X
A
gene2
Y
B
gene3
Z
C
Trp
A, B and C are Enzymes
X is ubiquitous metabolite, Y and Z optional
If we knock out gene2, we need to add nutrient Z to
produce Trp
want to learn codes(gene2, B, [Y], [Z]) but only ask:
pheno_effect(gene2,[Y]) is false
pheno_effect(gene2,[Z]) is true
pheno_effect(gene2,[Y,Z]) is true
Aromatic amino acid pathway
aromatic amino acids
enzymes
metabolites
Metabolism in Logic
Hypotheses:
codes(‘YDR254W’, ‘4.2.1.11’, [‘C00631’],[‘C00074’]).
codes(‘YDR254W’, ‘5.3.1.24’, [‘C04302’],[‘C01302’]).
etc ...
Background Knowledge:
enzyme(‘4.2.1.11’,[‘C00631’],[‘C00074’]).
enzyme(‘5.3.1.24’,[‘C04302’],[‘C01302’]).
etc ...
generated_by_other_pathways([‘C00002’, ‘C00005’,
‘C00006’, ... , ‘C03356’]).
ends([‘C00078’, ‘C00079’, ‘C00082’]).
Metabolism in Logic
What the Oracle answers:
phenotypic_effect(ORF, Growth_medium):generated_by_other_pathways(Ubiquitous_metabolites),
union(Ubiquitous_metabolites, Growth_medium, Starts),
connected(Starts, Wild_products),
ends(Ends),
subset(Wild_products, Ends),
enz(Enzyme, Reactants, Products),
encodes(ORF, Enzyme, Reactants, Products),
connected_without_this_step(Starts, Mutant_products,
Enzyme, Reactants, Products),
not(subset(Mutant_products, Ends)).
Experiments
Learn function of 17 genes by removing ORF
Growth Media
13 optional nutrients, at most 3 at a time
378 possible experiments for each ORF
Cost of Optional Nutrients
Determined from www.sigmaaldrich.com catalog
Strategies for Comparison
Random
Naïve Cheapest
ASE-Progol
Experiments
Remove all codes(…) facts
Loop
Generate random sample of trials
Generate hypotheses using Theory
Completion by Inverse Entailment
Find minimum EC(H,T) trial and perform
Add results to known examples
until hypotheses consistent with trials
Results:
Cost
Results: Time
Conclusions and Future Work
ASE-Progol finds hypotheses inexpensively
and quickly
5 of 17 genes had only negative examples…
why? Look into inhibitors and nonmonotonic
logics.
Limited answers to yes/no. Probabilities?
Can this be applied to gene regulatory
networks, using microarray technology?
What other networks have similar
frameworks?