UTACCEL 2010

Download Report

Transcript UTACCEL 2010

UTACCEL 2010
Adventures in Biotechnology
Graham Cromar
Bioinformatics
Bioinformatics is about integrating
biological themes together with the help
of computer tools and biological
databases, and gaining new knowledge
from this.
Sanger sequencing
Automated Sequencing
In the past, the separation of the DNA strands by electrophoresis was a time
consuming process. Today, fluorescent labels and new advances in gel
electrophoresis have made DNA sequencing fast and accurate. Also, the
process is almost fully automated, including the read out of the final
sequence.
Parallelizing Sequencing
Genbank doubles every 14 months
6
Introduction 1.0
(from the National Centre for Biotechnology Information)
Shorter than Moore’s law (computer power doubling
every 20 months!)
Genomes
Number of base pairs
___________________________________________________________
1971
1977
1982
1992
1995
1996
1998
2000
2001
2003
First published DNA sequence
PhiX174
Lambda
Yeast Chromosome III
Haemophilus influenza
Saccharomyces
C. elegans
D. melanogaster
H. sapines (draft)
H. sapiens
12
5,375
48,502
316,613
1,830,138
12,068,000
97,000,000
120,000,000
2,600,000,000
2,850,000,000
Complexity does not always correlate with size. The largest
genome known to date is the amoeba!
7
Introduction 1.0
The next step is to locate all of the genes and
regulatory regions, describe their functions, and
identify how they differ between different groups
(i.e. “disease” vs “healthy”)… …bioinformatics
plays a critical role
Storage, search, retrieval and visualization are key
10
Bioinformatics will help with…….
Structure-Function Relationships

Can we predict the function of protein molecules from their sequence?
sequence > structure > function

Prediction of some simple 3-D structures (a-helix, b-sheet, membrane
spanning, etc.)
BLAST Result
Basic
Local
Alignment
Search
Tool
12
Introduction 1.0
Micro-array analysis:
Science Jan 1 1999: 83-87
The Transcriptional Program in the Response of
Human Fibroblasts to Serum
Vishwanath R. Iyer, Michael B. Eisen, Douglas T. Ross, Greg Schuler,
Troy Moore, Jeffrey C. F. Lee,
Jeffrey M. Trent, Louis M. Staudt, James Hudson Jr.,
Mark S. Boguski, Deval Lashkari, Dari Shalon,
David Botstein, Patrick O. Brown
13Figure
1
Introduction 1.0
Figure 4
PubMed Text Neighboring
Genetic Analysis
of Cancer in
Families
The Genetic
Predisposition to
Cancer
• Common terms could indicate
similar subject matter
• Statistical method
• Weights based on term
frequencies within document
and within the database as a
whole
• Some terms are better than
others
There are over 1 million papers published in the life sciences each year!
14
Top 10 Future Challenges for Bioinformatics
Precise, predictive model of transcription initiation and termination: ability to predict where and
when transcription will occur in a genome
Precise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing
pattern of any primary transcript in any tissue
Precise, quantitative models of signal transduction pathways: ability to predict cellular
responses to external stimuli
Determining effective protein:DNA, protein:RNA and protein:protein recognition codes
Accurate ab initio protein structure prediction
Rational design of small molecule inhibitors of proteins
Mechanistic understanding of protein evolution: understanding exactly how new protein
functions evolve
Mechanistic understanding of speciation: molecular details of how speciation occurs
Continued development of effective gene ontologies - systematic ways to describe the functions
of any gene or protein
15
Education: development of appropriate bioinformatics
curricula for secondary, undergraduate
Introduction 1.0
and graduate education
Tutorial