Picornainformatics

Download Report

Transcript Picornainformatics

Picormatics
Today’s goal:
Give you an overview of some recent technological
bioinformatics developments that can be applied to
picornaviruses.
Where possible in less than a day's work, I have applied
those techniques, as an example, to 'my' virus: R14.
This seminar is available (without © ) from:
http://swift.cmbi.ru.nl/gv/seminars/
Some notes up front
Your community is not very WWW oriented.
This is concluded from a low number of cross pointers,
high numbers of dead links and incomplete sites, and
from a lack of update dates,contact addresses,
references, etc.
Your community is not very bioinformatics oriented
either.
Example, www.iah.bbsrc.ac.uk holds a beautifully
complete list of VP1 sequences, one-by-one....
Your 'simple' bioinformatics options
Many protein structures
Many protein sequences
This allows for structure based sequence alignments
that are very precise, and therefore allow for novel
sequence analysis techniques
1) Correlated mutation analysis
2) Sequence variability analysis
Structure-based alignment
This is top left corner of alignment of ~1000 sequences of ~300 residues
Correlated mutations
APGADSFGDFHKM
ALGADSFRDFRRL
ARGLDPFGMNHSI
AGGLDPFRMNRRV
Gray is conserved
Black is variable
Red/green are
correlated mutations
Correlated mutations guarantee a function.
Function is determined by the position in the structure;
not by the residue type.
Correlated mutations
Pilot indicates this works for VP1,2,3 too.
Correlated mutations and drug design
Correlated mutations and drug design
Correlated mutations and drug design
Correlations between residues and ligands.
Automatic structure comparison
Rhino
Polio
FMDV
Mengo
9
2
1
1
Automatic structure comparison
Automatic structure comparison
Automatic structure comparison
R14 drug placed in R16
Automatic structure comparison
agonist
antagonist
Example from nuclear hormone
receptor drug design study
Back to sequences
First rule of sequence analysis:
If a residue is conserved, it is important.
Sequence analysis (continued)
Second rule of sequence analysis:
If a residue is very conserved, it is very important.
But what about the variable residues
20
Ei = S pi ln(pi)
i=1
But what about the variable residues
Sequence variability is the number of residues that
is present in more than 0.5% of all sequences.
Entropy - Variability
Entropy
Variability
=
=
Information
Chaos
11 main function
Entropy - Variability
12 first shell around main function
22 core residues (signal transduction)
23 modulator
33 mainly surface
Mutation information
Most information about mutations is carefully hidden in the
literature.
Automatic extraction of this information is no longer
science-fiction.
More than 90% of the 2226 mutations used for the previous
few slides were extracted automatically from the literature.
We extracted160 more mutations 'by hand'.
Problems are mainly related to protein/gene nomenclature,
residue numbering, and unclear description of the effects.
Mutation information
Mutation information
Mutation data
Transcription
Diseases
20%
60%
50%
15%
40%
10%
30%
20%
5%
10%
0%
0%
Box 11
Box 12
Box 22
Box 23
Box 11
Box 33
Coregulator
Box 12
Box 22
Box 23
Box 33
Dimerization
40%
40%
30%
30%
20%
20%
10%
10%
0%
0%
Box 11
Box 12
Box 22
Box 23
Box 33
Box 11
Box 12
Box 22
Box 23
Box 33
Mutation data
No effect
Ligand binding
6%
30%
5%
4%
20%
3%
10%
2%
0%
0%
1%
Box 11
Box 12
Box 22
Box 23
Box 33
Box 23
Box 33
No mutations
25%
20%
15%
10%
5%
0%
Box 11
Box 12
Box 22
Box 11
Box 12
Box 22
Box 23
Box 33
Picorna mutation information
A PubMed search gives:
picornavirus mutation
rhinovirus mutation
poliovirus mutation
mengovirus mutation
1176
101
600
30
(2)
(62)
(144)
(29)
About 1 in 5 (in a small manually checked subset)
contained identifiable mutation information in the
abstract. But unfortunately often with nomenclature that
'our' software doesn't understand yet.
Now something totally different
Motion is the main ingredient for protein function. Even if
that function is as 'dumb' as being a container for the RNA.
For example, all early Rhino directed drugs were aimed at
reducing the mobility of its VP1…
Protein dynamics calculation
The simulation of protein motion is normally called
molecular dynamics, or MD.
MD is commonly known as a very difficult technique for
which you need the help of an army of mathematicians.
That is no longer true.
Dynamite (based on Bert de Groot's CONCOORD
software) predicts protein motions via the WWW.
Protein dynamics calculation
A short break for a word from our sponsors
Laerte
Oliveira
Wilma Kuipers
Weesp
Bob Bywater
Copenhagen
Nora vd Wenden
The Hague
Mike Singer
New Haven
Ad IJzerman
Leiden
Margot Beukers
Leiden
Fabien Campagne
New York
Øyvind Edvardsen
TromsØ
Simon Folkertsma
Frisia
Henk-Jan Joosten
Wageningen
Joost van Durma
Brussels
David Lutje Hulsik
Utrecht
Tim Hulsen
Goffert
Manu Bettler
Lyon
Adje
F
L
O
R
E
N
C
E
H
O
R
N
Margot
Our industrial
sponsor:
David
Elmar
Tim
Fabien
Manu
Simon Folkertsma
Krieger