AI and Bioinformatics - Dave Reed's Home Page

Download Report

Transcript AI and Bioinformatics - Dave Reed's Home Page

AI and Bioinformatics
From Database Mining to the Robot
Scientist
History of Bioinformatics




Definition of Bioinformatics is debated
In 1973, Herbert Boyer and Stanely Cohen
invented DNA cloning.
By 1977, a method for sequencing DNA was
discovered
In 1981 The Smith-Waterman algorithm for
sequence alignment is published
History of Bioinformatics

By 1981, 579 human genes had been
mapped

In 1985 the FASTP algorithm is published.

In 1988, the Human Genome organization
(HUGO) was founded.
History of Bioinformatics



Bioinformatics was fuelled by the need to create
huge databases.
AI and heuristic methods can provide key solutions
for the new challenges posed by the progressive
transformation of biology into a data-massive
science.
 Data Mining
1990, the BLAST program is implemented.
 BLAST: Basic Local Alignment Search Tool.
 A program for searching biosequence databases
History of Bioinformatics



Scientists use Computer scripting languages
such as Perl and Python
By 1991, a total of 1879 human genes had been
mapped.
In 1996, Genethon published the final version of
the Human Genetic Map. This concluded the
end of the first phase of the Human Genome
Project.
History of Bioinformatics
Year
Subject Name
MBP
(Millions of base pairs)
1995
Haemophilus Influenza
1.8
1996
Bakers Yeast
12.1
1997
E.Coli
4.7
2000
Pseudomonas aeruginosa
A. Thaliana
D. Melonagaster
6.3
100
180
2001
Human Genome
3,000
2002
House Mouse
2,500
Bioinformatics Today

There are several important problems where AI
approaches are particularly promising



Prediction of Protein Structure
Semiautomatic drug design
Knowledge acquisition from genetic data
Functional Genomics and the Robot
Scientist




Robot scientist developed by University of
Wales researchers
Designed for the study of functional genomics
Tested on yeast metabolic pathways
Utilizes logical and associationist knowledge
representation schemes
Ross D. King, et al., Nature, January 2004
The Robot Scientist
Source: BBC News
Yeast Metabolic Pathways
Hypothesis Generation and
Experimentation Loop
Ross D. King, et al., Nature, January 2004
Integration of Artificial Intelligence



Utilizes a Prolog database to store
background biological information
Prolog can inspect biological information,
infer knowledge, and make predictions
Optimal hypothesis is determined using
machine learning, which looks at probabilities
and associated cost
Experimental Results


Performance similar to humans
Performance significantly better than “naïve” or
“random” selection of experiments
For 70% classification accuracy:
A hundredth the cost of random
A third the cost of naive
Ross D. King, et al., Nature, January 2004
Major Challenges and Research Issues


Requires individuals with knowledge of both
disciplines
Requires collaboration of individuals from diverse
disciplines
Major Challenges and Research Issues




Data generation in biology/bioinformatics is
outpacing methods of data analysis
Data interpretation and generation of hypotheses
requires intelligence
AI offers established methods for knowledge
representation and “intelligent” data interpretation
Predict utilization of AI in bioinformatics to increase
References and Additional Resources
Ross D. King, Kenneth E. Whelan, Ffion M. Jones, Philip G. K. Reiser, Christopher H.
Bryant, Stephen H. Muggleton, Douglas B. Kell & Stephen G. Oliver. Functional
Genomic Hypothesis Generation and Experimentation by a Robot Scientist. Nature
427 (15), 2004.
A Short History of Bioinformatics - http://www.netsci.org/Science/Bioinform/feature06.html
History of Bioinformatics - http://www.geocities.com/bioinformaticsweb/his.html
National Center for Biotechnology Information - http://www.ncbi.nih.gov
Pubmed - http://www.pubmed.gov