Метод поиска SDP
Download
Report
Transcript Метод поиска SDP
SDPpred:
a method for identification of amino acid residues
that determine differences in functional specificity
of homologous proteins
and application thereof to the MIP family
of membrane transporters
Olga V. Kalinina
Pavel S. Novichkov
Andrey A. Mironov
Mikhail S. Gelfand
Aleksandra B. Rakhmaninova
Large families of proteins:
generally similar biochemical function
but many different specificities…
Example: ~800 transcription factors of the LacI family.
Average sequence identity 30%.
Bind different effectors and operators.
Some effectors:
• lactose (LacI)
• D-fructose-6-phosphate (FruR)
• guanine, hypoxantine (PurR)
• cytidine, adenosine (CytR)
• trehalose-6-phosphate (TreR)
• D-gluconate (GntR)
•
•
•
•
•
•
D-galactose (GalR)
D-ribose (RbsR)
maltose (MalR)
raffinose (RafR)
…….
Х??
Q9KDW9
Q8Y6Z1
Q97JG6
GLPF_ECOLI
Q8ZJK5
GLPF_HAEIN
GLPF_PSEAE
AQPZ_BRUME
Q92NM3
Q8UJW4
AQPZ_ECOLI
Description of specificity groups :
Group А: No. 1-10,13…
Group В: No.12, 14-16…
Group С: No. 17-45…
…
----------MSPFLGEVIGTMILIILGGGVVAGVVLKGTK
----MIDTSLATQFLGEVIGTAILIILGAGVVAGVSLKRSK
----------MTIFFAELVGTLLLILLGDGVVANVVLKNSK
MSQT---STLKGQCIAEFLGTGLLIFFGVGCVA--ALKVAG
MSQTA-SSTLKGQCIAEFLGTGLLIFFGAGCVA--ALKLAG
MDKS-----LKANCIGEFLGTALLIFFGVGCVA—-ALKVAG
MTTAAPTPSLFGQCLAEFLGTALLIFFGTGCVA--ALKVAG
---------MLNKLSAEFFGTFWLVFGGCGSAILAA--AFP
---------MFRKLSVEFLGTFWLVLGGCGSAVLAA--AFP
---------MGRKLLAEFFGTFWLVFGGCGSAVFAA--AFP
---------MFRKLAAECFGTFWLVFGGCGSAVLAA--GFP
SDPpred
Testing on families that
include proteins with
resolved 3D structure
Positions that
account
for specificity
Assignment of
specificity to
new proteins
?
Experiment
What are SDPs?
(SDP = Specificity Determining Position)
• Specificity group = group of proteins that have the same
specificity (experimental data, genome analysis, etc.)
• SDP = alignment position that is conserved within specificity
groups but differs between them
SDP is not
equivalent to a
functionally
important
position!
Algorithm
• Mutual information Ip reflect the extent to which an alignment position
tends to be a SDP.
N - number of groups, f (i) - fraction of proteins in group i.
N 20
f p ( , i)
f ( , i) - ratio of occurrences of amino acid
In group i in position p to the length of the whole alignment column,
I p f p ( , i) log
f ( ) - frequency of amino acid in the whole alignment
f p ( ) f (i)
i 1 1
column in position p,
• Statistical significance of Ip.
Expected mutual information Ipexp of an alignment column.
I p I exp
Z-score.
p
Z
p (I exp)
p
p
p
(Mirny&Gelfand, 2002, J Mol Biol, 321(1))
• Smoothed amino acid frequencies: a leucine is more a methionine than a
valine, and any arginine has a dash of lysine…
f ( , i) n( , i) n(i)
20
n( , i) n( , i)m( )
~
1
f ( , i)
n(i) n(i)
n(i)
• Are 5 SDP with Z-score >10.5 better than 10 SDP with Z-score >9.0?
Bernoulli estimator for selection of proper number of SDPs
*
k
arg min Pthereare at least k observed Z - scores Z Z k
Z• 1 ы Z 2
…
k
n
arg min1 C ni q i p n i
k
i n k 1
p P( Z Z k )
Zk
1
exp( Z 2 )dZ
2
q 1 p
• Kalinina OV, Mironov AA, Gelfand MS,
Rakhmaninova AB. (2004)
Automated selection of positions determining
functional specificity of proteins by comparative
analysis of orthologous groups in protein families.
Protein Sci 13(2): 443-56
• http://math.belozersky.msu.ru/~psn/
Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS,
Rakhmaninova AB. (2004) SDPpred: a tool for
prediction of amino acid residues that determine
differences in functional specificity of homologous
proteins. Nucl Acids Res 32(Web Server issue):
W424-8.
Web interface
Input: multiple alignment of proteins
divided into specificity groups
=== AQP ===
%sp|Q9L772|AQPZ_BRUME
-------------------------------------mlnklsaeffgtfwlvfggcgsa
ilaa--afp-------elgigflgvalafgltvltmayavggisg--ghfnpavslgltv
iiilgsts------------------------------slap-----------------qlwlfwvaplvgavigaiiwkgllgrd-------------------------------------%sp|P48838|AQPZ_ECOLI
-------------------------------------mfrklaaecfgtfwlvfggcgsa
vlaa--gfp-------elgigfagvalafgltvltmafavghisg--ghfnpavtiglwa
lvihgatd------------------------------kfap-----------------qlwffwvvpivggiiggliyrtllekrd------------------------------------%tr|Q92ZW9
-------------------------------------mfkklcaeflgtcwlvlggcgsa
vlas--afp-------qvgigllgvsfafgltvltmaytvggisg--ghfnpavslglav
iiilgsth------------------------------rrvp-----------------qlwlfwiaplfgaaiagivwksvgeefrpvd---------------------------------=== GLP ===
%sp|P11244|GLPF_ECOLI
----------------------------msqt---stlkgqciaeflgtglliffgvgcv
aalkvag---------a-sfgqweisviwglgvamaiyltagvsg--ahlnpavtialwl
glilaltd------------------------------dgn--------------g-vpr
-flvplfgpivgaivgafayrkligrhlpcdicvveek--etttpseqkasl------------%sp|P44826|GLPF_HAEIN
----------------------------mdks-----lkancigeflgtalliffgvgcv
…
Web interface
Output
Alignment of the
family with the
SDPs highlighted
(Alignment view)
Detailed description Plot of probabilities,
of each SDP
used by the Bernoulli
(List of SDPs)
estimator to set the
cutoff
(Probability plot view)
Examples: the LacI family of bacterial
transcription factors
• Training set: 459 sequences,
average length: 338 amino acids,
85 specificity groups
– 44 SDPs
10 residues contact NPF (analog of
the effector)
7 residues in the effector contact zone
(5Ǻ<dmin<10Ǻ)
6 residues make up intersubunit
contacts
5 residues in the intersubunit
contact zone (5Ǻ<dmin<10Ǻ)
7 residues contact the operator
sequence
6 residues in the operator contact
zone (5Ǻ<dmin<10Ǻ)
LacI from E.coli
Examples: bacterial membrane channels of
the MIP family
• Training set: 17 sequences,
average length 280 amino acids,
2 specificity groups:
Aquaporines & glyceroaquaporines
– 21 SDPs
8 residues contact glycerol
(substrate) (dmin<5Ǻ)
8 residues oriented to the
channel
GlpF from E.coli
5 residues make up contacts
with other subunits
Why does the prediction make sense?
LacI from E.coli
• Total 348 amino
acids
Non-contacting residues
(distance to the DNA,
effector, or the other
subunit >10Ǻ)
Contact zone
(may be
functional)
• 44 SDP
Contacting residues
(distance to the DNA,
effector, or the other
subunit <5Ǻ)
Why does the prediction make sense?
GlpF from E.coli
• Total 281 amino
acids
Non-contacting residues
(distance to the
substrate, or another
subunit >10Ǻ)
Contact zone
(may be
functional)
• 21 SDP
Contacting residues
(distance to the
substrate, or another
subunit <5Ǻ)
GlpF from E.coli, a membrane channel
from the MIP family:
SDPs either interact with the substrate or are
located on the outer surface of the monomer
Structure of the GlpF monomer
Predicted SDPs
Glycerol
SDPs located on the outer surface
of the GlpF monomer form subunit contacts
20Leu, 24Ile,
108Tyr of one
subunit, 193Ser
from another
subunit
Glu43 from
all four
subunits
SDPs located on the outer surface
of the GlpF monomer (continued)
Subunit I
Subunit I
Subunit II
Subunit II
Subunit IV
Residue
Atom
Residue
Atom
Residue
Atom
(Ǻ)
Residue
Atom
Residue
Atom
(Ǻ)
Glu43
OE1
Ser38
O
4.8
Leu20
CD2
Ile158
CD1
4.3
Glu43
OE2
Glu43
OE2
4.1
Leu20
CD1
Leu162
CD2
4.5
Glu43
CG
Trp42
CD1
3.7
Phe24
CZ
Ile158
CG2
3.9
Glu43
OE2
Glu43
OE2
4.1
Phe24
CZ
Leu186
CD1
3.9
Phe24
CE2
Val189
CG2
3.8
Phe24
CE2
Ile190
CG1
3.7
Phe24
CA
Ser193
CB
3.9
Phe24
O
Ser193
OG
4.2
Phe24
O
Ser193
CB
3.3
Gly27
O
Ser193
O
3.2
Cys28
CA
Ser193
CA
3.8
Tyr108
OH
Ser193
O
2.6
Tyr108
CE1
Met194
CE
3.7
Tyr108
CE1
Leu197
CD1
3.9
SDPs located on the outer surface
of the GlpF monomer (continued)
Structure of contacts in
the type A cluster
Structure of contacts in
the type B cluster
Conclusions I. SDPpred: the SDP
prediction method
• A method for identification of amino acid residues that account
for differences in protein functional specificity
– Does not rely on the protein 3D structure
– Automatically determines the number of significant positions
– Considers substitutions according to the chemical properties of
substituted amino acids
• Results agree with available structural and experimental data
• Applicable to any protein family in a standard way
Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection
of positions determining functional specificity of proteins by comparative analysis of
orthologous groups in protein families. Protein Sci 13(2): 443-56
http://math.belozersky.msu.ru/~psn/
Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004)
SDPpred: a tool for prediction of amino acid residues that determine differences in
functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue):
W424-8.
Conclusions II. SDPs for GlpF from E.coli
• In protein families, whose members function as oligomers,
predicted SDPs are often localized on the contact surface
between subunits
• 5 “surface” SDPs in GlpF: 20Leu, 24Ile, 43Glu, 108Tyr,
193Ser. All of them participate in forming the quaternary
structure
Evolutionary pressure on amino acids that establish intersubunit
contacts correlates with evolutionary pressure on amino acids that
account for the correct recognition of the substrate
• These residues form compact spatial clusters
“structural clasps” for recognition of proper subunits
•
•
•
•
•
Olga V. Kalinina
• Acknowledgements
Pavel S. Novichkov
– Leonid A. Mirny
Andrey A. Mironov
– Olga Laikova
Mikhail S. Gelfand
– Vsevolod Makeev
Aleksandra B. Rakhmaninova
– Roman Sutormin
– Shamil Sunyaev
– Department of Bioengineering
– Aleksey Finkelstein
and Bioinformatics, Moscow
State University, Moscow, Russia
– Institute for Information
Transmission Problems RAS,
Moscow, Russia
– State Scientific Center
GosNIIGenetika, Moscow, Russia