Aldo Leopold (1887

Download Report

Transcript Aldo Leopold (1887

Bioinformatics
A Biologist’s perspective
Rob Rutherford
1.
The Biologist’s perspective
2.
A survey of tools
3.
Training students for the future
If the biota, in the course of eons, has built
something …..who but a fool would discard
seemingly useless parts? To keep every cog
and wheel is the first precaution of intelligent
tinkering.
-Aldo Leopold (1887 - 1948)
Figure 1.18 Careful observation and measurement provide the raw data for science
Productive Tinkerers
PubMed had 400,000 new research
articles entered in 2002.
NCBI-NLM, 2003
NIH-NLM 2003
NIH-NLM 2003
(Cockerill 2003)
)
“If your experiment needs statistics, you
ought to have done a better experiment.”
-Rutherford
(the other one)
“To consult the statistician after an experiment is finished is …
to ask him to conduct a post mortem examination.
He can perhaps say what the experiment died of.”
RA Fisher 1956, University of Adeliade Archives
2 Wings of Bioinformatics
Housekeeping Bioinformatics
Representation, storage, and distribution of data
Analytical Bioinformatics
New tools for the discovery of knowledge in data
Part 2
A Survey of
Problems/Opportunities
“The Central Dogma”
DNA
Information Warehouse
(4 nucleic acid letters atgc)
RNA
Temporary copy of a gene
Protein
Working Cellular Machine
(20 amino acid letters)
RNA polymerase PDB
A Survey of Problems
Finding Genes and Understanding Genes
Protein Structure and Function
Gene Expression
Networks
Other areas
Finding and Understanding
Genes
Receptors-GPCR (767)
Receptors-NHR (56)
Integrins (33)
Ion Channels (313)
Kinases (713)
Phosphatases (274)
Estimated Gene Number
Human
Genes
~(59538)
Phosphodiesterases (58)
Neurotrans. transporters (34)
P450s (59)
Proteases (527)
Secreted (3621)
Other (53076)
Rutherford
10
20
30
40
....*....|....*....|....*....|....*....|
SPKNTPVVLIPKKGPGKYRPISlvDYKILNKATKKrFSpp 40
SPWNTPLLPVKKPGTNDYRPVQ--DLREVNKRVED-IH-- 117
NPYNTPVFAIKKKDSTKWRKLV--DFRELNKRTQD-FWev 90
NPYNTPTFAIKKKDKNKWRMLI--DFRELNKVTQD-FTei 85
SPWNTPLLPVKKPGTNDYRPVQ--DLREVNKRVED-IH– 103
consen
1MML
1HNI_B
1MU2_B
1D1U_A
1
83
54
49
69
Consen
1MML
1HNI_B
1MU2_B
1D1U_A
50
60
70
80
....*....|....*....|....*....|....*....|
41 qPGFRPGRSLLNKLKGS-KWFLKLDLKKAFDSIPHDPLLR 79
118 -PTVPNPYNLLSGLPPShQWYTVLDLKDAFFCLRLHPTSQ 156
91 qLGIPHPAGL-----KKKKSVTVLDVGDAYFSVPLDEDFR 125
86 qLGIPHPAGL—AKK
-RRITVLDVGDAYFSIPLHEDFR 120
104-PTVPNPYNLLSGLPPShQWYTVLDLKDAFFCLRLHPTSQ 142
CnD3 HIV
Finding Conserved Regions/Domains
HIV protein
Comparing your sequence versus models derived from
curated known protein families
Phylogenetics and Evolution
Thanks to Porterfield
Protein Structure
Imaging Experimental X-ray diffraction data
Predicting structure in silico from sequence
Experimental structures in the Protein Data Bank
Structure is Function
HIV reverse
tanscriptase
DNA (human genome)
RNA (HIV virus)
Protein
Goodsell, PDB
Goodsell, PDB
Figure 17.0 Ribosome
Structural Predictions just from raw
protein sequence?
Figure 17.0 Ribosome
1 ggcacgaggc acggctgtgc aggcacgcat gcaggccagc ….
1 atctgcacgt ggttatgctg ccggagtttg ggccgccact….
An example:
CASP
Community Wide Assessment of techniques for
Protein Structure Prediction
Every two years, contest to test protein structure prediction
from primary sequence
Gene Expression
Sequencing RNA (ESTs)
Sequencing bits of ESTs (SAGE)
Automation of In situ
DNA microarray technology
MicroArray
One spot for each gene
Microarray Expression Analysis
Reference Mixture
Specific Organ
Experimental
SigE SigH Conditions
IdeR
Low O2
H2O2
SDS
Diamide
Iron
Dormancy
Genes
Gene turned on
Gene turned off
NrpR
NO
NO
Figure 1.3 Some properties of life
Figure 1.23x1 Biotechnology laboratory
Metabolic Pathway Map
Building Transcriptional Network Map
Networks
Biochemical Pathways
Signaling Networks
Transcriptional Networks
Computational Neuroscience
Microarrays uncover networks of interactions…
Scientific American 2001
Other Opportunities
Organismal Physiology
Populations
Communities
Ecosystems
Same issues in “Macro” Biology
Long history of mathematical
modeling
Huge datasets from
•GPS/GIS
•Remote sensing
If the biota, in the course of eons, has built
something …..who but a fool would discard
seemingly useless parts? To keep every cog
and wheel is the first precaution of intelligent
tinkering.
-Aldo Leopold (1887 - 1948)
Where is all this leading to?
Part 3
How do we prepare our students for
this future?
Dr. Peter Munson
Head of the Mathematical and
Statistical Computing Laboratory
Division of Computational Biosciences
National Institutes of Health
Ole’ pre 1976
The Tool Builders
• Excellent mathematical skills
(algorithms, linear algebra, data structures)
• Be comfortable in a Linux/Unix environment, and
know Perl and C/C++.
• A deep background in 2+ advanced area of
biology with chemistry prerequisites.
• Graduate training
The systems biologist.
Biologist who is an intelligent and skeptical
consumer of large data sets
•
Probability and Statistics
• SQL and database basics
• Equilibrium and rates of change (Calculus)
• Exposure to system level data
And who knows how and when to collaborate(!)
end