Monica Sleumer Presentation

Download Report

Transcript Monica Sleumer Presentation

Functional Genomic Hypothesis
Generation and Experimentation
by a Robot Scientist
King et al, Nature 2004 427:247-252
Presented by Monica C. Sleumer
February 5, 2004
Scientific Discovery
“Branch of AI devoted to developing algorithms for
acquiring scientific knowledge”
Current applications:
–
–
–
–
–
Analysis of mass-spec data
Discovering structure-activity relationships for compounds
Making semantic connections in published literature
Predicting mechanisms for chemical reactions
Revising taxonomies to accommodate new data
Connect to laboratory instrumentation
Accomplishment
Automated entire scientific process
Robotic system that uses AI to “carry out
cycles of scientific experimentation”:
– Originates hypotheses
– Designs experiments
– Performs the experiments
– Interprets the results
Application: Functional genomics
Function unknown for 30% of yeast genes
Complete laboratory automation possible
Goal: connect genes to their function
Using:
– Logical model of aromatic amino acid
synthesis pathway
– 8 deletion mutants
– 9 metabolites
– Auxotrophic growth experiments
Aromatic Amino Acid Pathway
Classical vs Robot Science
Classical method:
– Scientific expertise and imagination used to
form hypotheses
– Consequences of hypotheses tested by
experiment
Robot Scientist:
– Hypotheses formed by abduction
– Tested by deduction
Deduction and Abduction
Deduction
– Rule: P  Q, Fact: ~Q, Infer: ~P
– E.g.
If a cell grows on minimal medium, then it can
synthesise tryptophan
– Fact
Cell cannot synthesise tryptophan
– ∴
Cell cannot grow on minimal medium
Abduction
– Rule: P  Q, Fact: ~P, Hypothesize: ~Q
– E.g.
If a cell grows on minimal medium, then it can
synthesise tryptophan
– Fact
Cell cannot grow on minimal medium
– ∴
Cell cannot synthesise tryptophan
Implementation
Software:
–
–
–
–
–
Background knowledge
Logical inference engine
Hypothesis generation code
Experiment selection code
LIMS code
Hardware:
– Liquid-handling robot
– Plate reader
– CPU to do the scientific reasoning
No human intellectual input into:
– Experimental design
– Data interpretation
Robot Scientist
Logical Process
Prolog used to model data
Metabolic pathway represented as a
directed graph
Deduction: a knockout mutant will grow
IFF a path can be found from the given
metabolites to the 3 needed aa.
Abduction: if a knockout mutant doesn’t
grow using the given metabolites:
hypothesize which enzyme is missing
Machine Learning
Improves performance based on prior
experience
Each hypothesis has
– Cost of testing
– Probability of being correct
Goals
– Find out which gene goes with which enzyme
– Use the fewest possible resources
Experiment Choosing
3 ways:
– Intelligent: “ASE”
– Cheapest Experiment: Naïve
– Random Experiment
Performance:
– Accuracy: # of correct predictions made
– Cost and number of experiments required
Both real experiments and simulations
Comparison to human
Accuracy of the Experiment Choosers
ASE
ASE
Naive
Naive
Random
Random
Results of Computer Simulations
No noise
Naive
ASE
Noise
Naive
ASE
Random
Random
Conclusions
Scientific process can be automated
Experiment selection strategies have significant
impact on cost
ASE outperforms
– Naïve by 3 fold
– Random by 100 fold
in terms of cost
Performance is competitive with human
Cost-effectiveness of science can be improved
Future Work
Extend system to uncover function of other
metabolic genes
Would need to:
– Extend model to entire biochemical pathway
in KEGG
– Become more robust in terms of possible
errors in KEGG
– Include prediction of previously unknown
enzymes
Criticisms
De-emphasis on how little of the pathway
was actually tested
Not clear how deletion mutants were
chosen
No example of experiment cycle
Too large of a jump from theory to results
Results graphs too crowded
Discussion Questions
Would computer-generated experiments and
results be accepted?
How much would we have to understand about
a computer-generated discovery process?
Compare this system to currently common
method of:
– Large-scale generation of data
– Extraction of knowledge by data-mining systems
What other aspects of genome analysis could
scientific discovery be applied to?