Transcript CMBI
The CMBI: Bioinformatics
Content
Bioinformatics
Bioinformatics@CMBI
Bioinformatics tools & databases
Celia van Gelder
CMBI
UMC Radboud
February 2010
[email protected]
What is bioinformatics?
•
Bioinformatics is the use of computers in solving information problems
in the life sciences
•
You are "doing bioinformatics" when you use computers to store,
retrieve, analyze or predict the sequence, function and/or structure of
biomolecules.
Bioinformatics
2/38
©CMBI 2010
Why do we need Bioinformatics?
Flood of biological data:
–
–
–
–
–
DNA-sequences (genomes)
protein sequences and structures
gene expression profiles (transcriptomics)
cellular protein profiles (proteomics)
cellular metabolite profiles (metabolomics)
We want to :
–
–
–
–
–
Bioinformatics
collect and store the data
integrate, analyze, compare and mine the data
predict genes, protein function and protein structure
predict physiology (models, mechanisms, pathways)
understand how a whole cell works
3/38
©CMBI 2010
Human genome, great expectations
Data ≠ Knowledge, insight !!!
Bioinformatics
4/38
©CMBI 2010
A large fraction of the human genes has an unknown function
(Science, 2001)
Bioinformatics
5/38
©CMBI 2010
What is protein function?
Genomic context
Homology
Bioinformatics
6/38
©CMBI 2010
How can we predict function of proteins?
The importance of sequence similarity and sequence alignment
Similar sequences have:
– A similar evolutionary origin
– A similar function
– A similar 3D structure
“new, unknown
protein”
Compare with
database of proteins
BLAST
“similar sequence with known
function. E.g. proteine kinase”
Extrapolate the function
Bioinformatics
7/38
©CMBI 2010
CMBI - Centre for Molecular and Biomolecular Informatics
•
•Dutch national centre for computational molecular sciences research
•Research groups
–Comparative Genomics (Huynen)
–Bacterial Genomics (Siezen)
–Computational Drug Design (De Vlieg)
–Bioinformatics of Macromolecular Structures (Vriend)
•Training & Education
–MSc, PhD and PostDoc programmes
–International workshops
–Hotel Bioinformatica
–High school courses
•Computational facilities, databases, and software packages via (inter-)national service
platforms (NBIC, EBI, etc)
•NBIC: National BioInformatics Centre.
Bioinformatics
@CMBI
8/38
©CMBI 2010
Computational Drug Discovery (CDD) Group
•
Head: Prof. Jacob de Vlieg
•
Key goal
Develop molecular modeling and computer-based simulation techniques for
structure-based drug design, translational medicine and protein family based
approaches to design and identify drug-like compounds
•
Key
–
–
–
Research Fields
Structural bioinformatics for drug design
Bioinformatics for genomics (microarray analysis, text mining, etc)
Translational medicine informatics
Academic Research
New scientific
approaches
Training & education
CDD
Applications
Exciting real life problems
‘wet’ validation
Bridging academic research and applied genomics
Bioinformatics
@CMBI
9/38
©CMBI 2010
Examples of CDD Projects
•Exploiting Structural Genomics Information To Incorporate Protein Flexibility In Drug
Design
•Protein knowledge building through comparative genomics and data integration
•In silico studies on p63 as a new drug-target protein
Bioinformatics
@CMBI
10/38
©CMBI 2010
International Computational Drug Discovery Course
•Course covers the entire research pipeline
from genomics and proteomics in target
discovery to Structure Based Drug Design
and QSAR in drug optimization.
•Lectures and practicals
•2 week course
•21 June – 2 July 2010
•www.cmbi.ru.nl/ICDD2010
Bioinformatics
@CMBI
11/38
©CMBI 2010
Bacterial Genomics Group
•
Head: Prof Roland Siezen
•
Research interest: Biological questions in the interest of Dutch Food Industry
•
How
–
–
–
•
Micro-organisms studied: Gram-positive food bacteria:
–
lactic acid bacteria (Lactococcus, Lactobacillus)
–
spoilage bacteria (Listeria, Clostridium, Bacillus cereus)
can we improve:
fermentation
safety
health
lactococcus
listeria
Bioinformatics
@CMBI
12/38
©CMBI 2010
Bacterial Genomics: from sequence to predicted function
Key research fields:
– Genome sequencing and interpretation
– Network reconstruction and analysis
– Systems biology, dynamic modelling
Raw sequence data:
2 to 5 million nucleotides
A virtual cell: overview of predicted pathways
AAACACTTAGACAATCAATATAAAGATGAA
GTGAACGCTCTTAAAGAGAAGTTGGAAAAC
TTGCAGGAACAAATCAAAGATCAAAAAAGG
ATAGAAGAACAAGAAAAACCACAAACACTT
AGACAATCAATATAAAGATGAAGTGAACGC
TCTTAAAGAGAAGTTGGAAAACTTGCAGGA
ACAAATCAAAGATCAAAAAAGGATAGAAGA
ACAAGAAAAACCACAAACACTTAGACAATC
AATATAAAGATGAAGTGAACGCTCTTAAAG
AGAAGTTGGAAAACTTGCAGGAACAAATCA
AAGATCAAAAAAGGATAGAAGAACAAGAAA
AACCACAAACACTTAGACAATCAATATAAA
GATGAAGTGAACGCTCTTAAAGAGAAGTTG
GAAAACTTGCAGGAACAAATCAAAGATCAA
AAAAGGATAGAAGAACAAGAAAAACCACAA
ACACTTAGACAATCAATATAAAGATGAAGT
GAACGCTCTTAAAGAGAAGTTGGAAAACTT
GCAGGAA
Bioinformatics
@CMBI
13/38
©CMBI 2010
Bacterial Genomics: Example
Differential NF-κB pathways induction by Lactobacillus plantarum in the duodenum of healthy
humans correlating with immune tolerance
Peter van Baarlen et al., PNAS, Febr 3, 2009
Bioinformatics
@CMBI
14/38
©CMBI 2010
Comparative Genomics Group
•
Head: Prof. Martijn Huynen
•
Research Focus:
– How do the proteins encoded in genomes interact with each other to produce
cells and phenotypes ?
– To predict such functional interactions between proteins as there exist e.g. in
metabolic pathways, signalling pathways or protein complexes
A genome is more than the sum of its genes ->
Use “genomic context” for function prediction
Types of genomic context:
Gene fusion/fission
Chromosomal location
Gene order/neighbourhood
Co-evolution
Co-expression
Bioinformatics
@CMBI
15/38
©CMBI 2010
Turning data into knowledge
Research topics:
• Develop computational genomics techniques that exploit the information in
sequenced genomes and functional genomics data
• Make testable predictions about pathways and the functions of proteins therein.
• Evolution of the eukaryotic cell & the origin and evolution of organelles like the
mitochondria and the peroxisomes
Education:
• Comparative Genomics Course, 3 EC, 8-23 April 2010
website: http://www.cmbi.ru.nl/huynen/
Comparative genomics
Prediction of protein function, pathways
Bioinformatics
@CMBI
16/38
©CMBI 2010
Frataxin Example
•
Frataxin is a well-known disease gene (Friedreich's ataxia) whose function has
remained elusive despite more than six years of intensive experimental research.
•
Using computational genomics we have shown that frataxin has co-evolved with
hscA and hscB and is likely involved in iron-sulfur cluster assembly in conjunction
with the co-chaperone HscB/JAC1.
Prediction
Bioinformatics
@CMBI
Confirmation
17/38
©CMBI 2010
Bioinformatics of macromolecular structures
•Head: Prof. Gert Vriend
•Research Focus: Understanding proteins (and their environment)
•Proteins are the core of life, they do all the work, and they give you
feelings, contact with the outside world, etc.
•Proteins, therefore, are the most important molecules on earth.
•We want to understand life; why are we what we are, why do we do what
we do, how come you can think what you think?
Bioinformatics
@CMBI
18/38
©CMBI 2010
Bioinformatics of macromolecular structures
Research topics Vriend group
•Homology modeling technology and applications
•Application of bioinformatics in medical research (Hanka Venselaar)
•Structure validation and structure determination improvement
•Molecular class specific information systems (e.g. GPCRDB & NucleaRDB)
•Data mining
•WHAT IF molecular modelling and visualization software
Education:
•Introduction in bioinformatics (3 EC (MLW) or 6 EC (biology), 2nd year BSc)
•Structure, Function and Bioinformatics (3rd year BSc)
•Bioinformatics of Protein Structures (MSc)
Bioinformatics
@CMBI
19/38
©CMBI 2010
Homology Modeling
LRTOMT protein:
Hearing loss
MGTPWRKRKGIAGPGLPDLSCALVLQPRAQVGTMSPAI
ALAFLPLVVTLLVRYRHYFRLLVRTVLLRSLRDCLSGLRI
EERAFSYVLTHALPGDPGHILTTLDHWSSRCEYLSHMG
PVKGQILMRLVEEKAPACVLELGTYCGYSTLLIARALPP
GGRLLTVERDPRTAAVAEKLIRLAGFDEHMVELIVGSSE
DVIPCLRTQYQLSRADLVLLAHRPRCYLRDLQLLEAHAL
LPAGATVLADHVLFPGAPRFLQYAKSCGRYRCRLHHTG
LPDFPAIKDGIAQLTYAGPG
Homology modeling:
Prediction of 3D structure based
upon a highly similar structure
Unknown structure
Bioinformatics
@CMBI
Homology Modeling
Prediction of 3D structure based upon a highly similar structure
NSDSECPLSHDG
NSDSECPLSHDG
||
|| | ||
NSYPGCPSSYDG
Unknown structure
Alignment of model
and template
Known structure
sequence
Known structure
Bioinformatics
@CMBI
Model!
Add sidechains, Molecular
Back bone copied
Dynamics simulation on model
Copy backbone and conserved residues
21/38
©CMBI 2010
Homology Modeling
Structure!
LRTOMT:
MGTPWRKRKGIAGPGLPDLSCALVLQPRAQVGTMSPAI
ALAFLPLVVTLLVRYRHYFRLLVRTVLLRSLRDCLSGLRI
EERAFSYVLTHALPGDPGHILTTLDHWSSRCEYLSHMG
PVKGQILMRLVEEKAPACVLELGTYCGYSTLLIARALPP
GGRLLTVERDPRTAAVAEKLIRLAGFDEHMVELIVGSSE
DVIPCLRTQYQLSRADLVLLAHRPRCYLRDLQLLEAHAL
LPAGATVLADHVLFPGAPRFLQYAKSCGRYRCRLHHTG
LPDFPAIKDGIAQLTYAGPG
Hearing loss
Bioinformatics
@CMBI
Homology Modeling
Mutation:
Arginine 81 ->
Glutamic acid
Mutation:
Glutamic acid 110 ->
Lysine
Saltbridge between Arginine and
Glutamic acid is lost in both cases
Bioinformatics
@CMBI
23/38
©CMBI 2010
Homology Modeling
Mutation: Tryptophan 105 -> Arginine
Hydrophobic contacts from the Tryptophan are lost
Introduction of a hydrophilic and charged residue
Bioinformatics
@CMBI
24/38
©CMBI 2010
Homology Modeling
The three mutated residues are all
important for the correct
positioning of Tyrosine 111
Tyrosine 111 is important
for substrate binding
Ahmed et al.,
Mutations of LRTOMT, a fusion gene
with alternative reading frames, cause
nonsyndromic deafness in humans.
Nat Genet. 2008 Nov;40(11):1335-40.
Interested? Contact Hanka Venselaar ([email protected])
Project HOPE: Have yOur Protein Explained
Bioinformatics
@CMBI
25/38
©CMBI 2010
Hotel Bioinformatica
Hotel functions
•
Temporary housing, teaching and
supervision of experimentalists for
data analysis at the CMBI
•
Centralization of UMC-wide
bioinformaticians
•
Shared (weekly) seminars of CMBI
with ‘inhouse bioinformaticians’
•
Collaboration/advice in acquiring
grants with a Bioinformatics aspect
Interested? Contact Martijn Huynen ([email protected])
Bioinformatics
@CMBI
26/38
©CMBI 2010
Bioinformatics data types
mRNA
expression
profiles
MS data
Large amount of data
Growing very very fast
Heterogeneous data types
Bioinformatics
Tools &
Databases
27/38
©CMBI 2010
Biological Databases
•
•
Information is the core of bioinformatics
Literally thousands of databases exist that are relevant for biology,
medicine, and/or chemistry
Content
Database
protein sequences
SwissProt
UniProt
trEMBL
nucleotide sequences
EMBL
GenBank
DDBJ
structures (protein, DNA, RNA) Protein Data Bank (PDB)
Genomes
Ensembl
UCSC
Mutations
OMIM
Patterns, Motifs
PROSITE
Protein Domains
InterPro
SMART
Pathways
Bioinformatics
Tools &
Databases
KEGG
28/38
©CMBI 2010
Important records in SwissProt/UniProt (1)
Bioinformatics
Tools &
Databases
29/38
©CMBI 2010
Important records in SwissProt/UniProt (2)
Cross references
Features
Direct hyperlinks to:
•
EMBL
•
PDB
•
OMIM,
•
InterPro
•
etc. etc.
•
•
•
•
•
•
•
Bioinformatics
Tools &
Databases
post-translational modifications
signal peptides
binding sites,
enzyme active sites
domains,
disulfide bridges
etc. etc.
30/38
©CMBI 2010
Protein Databank & Structure Visualization
•
PDB structures have a unique identifier, the PDB Code:
4 digits (often 1 digit & 3 letters, e.g. 1CRN).
•
Download PDB structures, give correct file extension: 1CRN.pdb
•
Structures from PDB can directly be visualized with:
1.
2.
3.
4.
Bioinformatics
Yasara (www.yasara.org) (developed at CMBI)
SwissPDBViewer (http://spdbv.vital-it.ch/)
Protein Explorer (http://www.umass.edu/microbio/rasmol/)
Cn3D (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml)
Tools &
Databases
31/38
©CMBI 2010
OMIM Database
(O)MIM - Online Mendelian Inheritance in Man
•
•
•
a large, searchable, current database of human genes, genetic traits,
and hereditary disorders
contains information on all known mendelian disorders and over
12,000 genes
focuses on the relationship between phenotype and genotype
http://www.ncbi.nlm.nih.gov/omim/
Bioinformatics
Tools &
Databases
32/38
©CMBI 2010
Browsing genomes
NCBI
UCSC: http://genome.ucsc.edu/
Only eukaryotic genomes
Ensembl:
http://www.ensembl.org/
Genome browsers can be used to examine
•Genomic sequence conservation
•Duplications en deletions of pieces chromosome (Copy Number
Variations, CNVs)
•Single Nucleotide Polymorphisms (SNPs)
•Alternative splicing
•And much more...
Bioinformatics
Tools &
Databases
33/38
©CMBI 2010
Sequence Retrieval with MRS (1)
Google = Thé best generic search and retrieval system
= Maarten’s Retrieval System (http://mrs.cmbi.ru.nl )
MRS
MRS is the Google of the biological database world
Search engine (like Google)
Input/Query = word(s)
Output = entry/entries from database
Searching is very intuitive:
–
Select database(s) of choice
–
Formulate your query
–
Hit “Search”
–
The result is a “query set” or “hitlist”
–
Analyze the results
Bioinformatics
Tools &
Databases
34/38
©CMBI 2010
Sequence Retrieval with MRS (2)
Select database
Formulate query.
But think about your query first!!
MRS hitlist
Bioinformatics
Tools &
Databases
35/38
©CMBI 2010
BLAST and CLUSTAL with MRS
Blast
brings you to the MRS-page from which you can
do Blast searches.
Blast results
brings you to the page where MRS stores your Blast
results of the current session.
Clustal
brings you to the MRS page from which you can do
Clustal sequence alignments.
Bioinformatics
Tools &
Databases
36/38
©CMBI 2010
Your Exercise Today
FAMILIAL VISCERAL AMYLOIDOSIS
You will study Lysozyme:
•Protein
•Gene
•Mutations causing familial visceral amyloidosis
•3D structure
HAVE FUN!!
Bioinformatics
Tools &
Databases
37/38
©CMBI 2010
The Practical
You can find the practical at http://swift.cmbi.ru.nl/teach/lyso/
Work with MRS
Work with Yasara
Read the text carefully
User login = c(your pc number) e.g. c07
User password = t0psp0rt (with zero’s)
The program Yasara is on your desktop
38/38
©CMBI 2010