No Slide Title

Download Report

Transcript No Slide Title

Alignment & Secondary Structure
You have learned about:
Data & databases
Tools
Amino Acids
Protein Structure
Today we will discuss: Aligning sequences
After this:
You know how to perform alignments
You are ready to apply this knowledge in your bioinformatics
research project!
©CMBI 2009
Why align sequences?
The problem:
There a lots of sequences with unknown structure and/or function
There are a few sequences with known structure and/or function
Alignment can help:
•
If one of them has known structure/function, then alignment gives us
insight in structural and/or functional aspects of the aligned
sequence(s)
•
Transfer of information!
©CMBI 2011
Sequence Alignment (1)
Classically:
A sequence alignment is a representation of a whole series of
evolutionary events, which left traces in the sequences.
And:
The purpose of a sequence alignment is to line up all residues in the
sequence that were derived from the same residue position in the
ancestral gene or protein.
But,… we want a sequence alignment to show us which residues are
located at equivalent positions in their respective structures, because
then we can transfer information.
And that is not always the same as the classical alignment approach.
©CMBI 2011
Sequence Alignment (2)
A
B
A
B
gap = insertion or deletion (indel)
©CMBI 2009
Structural alignment
To carry over information, we need a structural alignment.
The implicit meaning of placing amino acid residues below each
other in the same column of a protein (multiple) sequence
alignment is that they are at the equivalent position in the 3D
structures of the corresponding proteins!!
©CMBI 2011
Examples
1) the 3 active site residues H, D, S, of the serine protease we saw
earlier
2) Cysteine bridges (disulfide bridges):
STCTKGALKLPVCRK
TSCTEG--RLPGCKR
©CMBI 2009
Transfer of information
Such information can be:
Phosphorylation sites
Glycosylation sites
Stabilizing mutations
Membrane anchors
Ion binding sites
Ligand binding residues
Cellular localization
Typically what one finds in the feature (FT) records of Swissprot!
©CMBI 2009
Significance of alignment
One can only transfer information between homologs. Homology
can be guaranteed (but is not restricted to) if the percentage
identity is ‘high enough’.
The ‘threshold curve’ for transferring structural information from one
known protein structure to another protein sequence:
If the sequences are > 80 aa long, then >25% sequence identity is
enough to reliably transfer structural information.
Structure is much more conserved than sequence!
©CMBI 2011
Significance of alignment (2)
©CMBI 2009
Aligning sequences by hand
Examples: which is the better alignment (left or right)?
1)
CPISRTWASIFRCW
CPISRT---LFRCW
CPISRTWASIFRCW
CPISRTL---FRCW
2)
CPISRTRASEFRCW
CPISRTK---FRCW
CPISRTRASEFRCW
CPISRT---KFRCW
©CMBI 2009
Aligning sequences by hand (2)
Procedure of aligning depends on information available:
1) In most cases you will start with an alignment program (e.g. CLUSTAL);
2) Use your knowledge of the amino acids to improve the alignment, for
instance by correcting the position of gaps;
3) Use explicitly the secondary structure preference of the amino acids,
especially for N-termini of helices and for beta-turns;
4) Use 3D information if one or more of the structures in the alignment are
known.
©CMBI 2011
Helix
©CMBI 2009
Positional preferences in helices (1)
ASP
-4
-3
-2
-1
1
2
3
4
5
-
-
-
-
H
H
H
H
H
110
121
260
98
197
167
49
86
98
total
1186
Position 1 in helix
Dataset of good helices from PDB files
Count all Asp residues in & before helices
Identify preferential positions for Asp residues
©CMBI 2009
Positional preferences in helices (2)
Fill this table for all 20 amino acids
Use this information when aligning helices who have low
percentage of sequence identity
-4
-3
-2
-1
1
2
3
4
5
total
-
-
-
-
H
H
H
H
H
ALA
143
148
99
58
189
205
187
241
CYS
24
31
29
22
14
17
18
33
17
ASP
98
110
121
260
98
197
167
49
86 1186
GLU
91
100
71
71
152
287
269
70
147 1258
TRP
29
25
29
14
30
26
28
30
29
240
TYR
66
65
75
33
58
44
56
72
48
517
268 1538
205
(…)
Position 1 in helix
©CMBI 2009
Aligning 2 sequences when sequence identity is low
Helix 1:
S G V S P D Q L A A L K L I L E L A L K
Helix 2:
G T S L E T A L L M Q I A Q K L I A G
©CMBI 2009
Protein threading
The word threading implies that one drags the sequence
(ACDEFG...) step by step through each location on the template
©CMBI 2009
Aligning 2 helices when sequence identity is low
S G V S P D Q L A A L K L I L E L A L K
-1-4-4-1-4-1 3-2 1 1-2 2
-3-2 -3 2 5 1 2 2 1 5
4 -2 3
4 3 3 4
1
5 4 4 5
5 5
G T S L E T A L L M Q I A Q K L I A G
-4-1-1-2 2-1 1-2
-3 3
1 3 3 2 1
4
3 4
5
4 5
5
©CMBI 2009
Aligning 2 helices when sequence identity is low
S G V S P D Q L A A L K L I L E L A L K
-1-4-4-1-4-1 3-2 1 1-2 2
-3-2 -3 2 5 1 2 2 1 5
4 -2 3
4 3 3 4
1
5 4 4 5
5 5
G T S L E T A L L M Q I A Q K L I A G
-4-1-1-2 2-1 1-2
-3 3
1 3 3 2 1
4
3 4
5
4 5
5
Final alignment:
S G V S P D Q L A A L K L I L E L A L K
- G T S L E T A L L M Q I A Q K L I A G
©CMBI 2009
Use of 3D structure info (1)
1
2
If you know that in structure 1 the Ala is pointing outside and the Ser is
pointing inside:
Where does the Arg in structure 2 go?
(and what will CLUSTAL choose?)
©CMBI 2009
Use of 3D structure info (2)
A
B1
B2
1
2
3
4
5
6
7
8
9 10
ILE CYS ARG LEU PRO GLY SER ALA GLU ALA
VAL CYS ARG THR PRO --- --- --- GLU ALA
VAL CYS ARG --- --- --- THR PRO GLU ALA
11
VAL
ILE
ILE
©CMBI 2009
An even more real example
A
B1
B2
1
2
3
4
5
6
7
8
9 10
ILE CYS ARG LEU PRO GLY SER ALA GLU ALA
VAL CYS ARG THR PRO --- --- --- GLU ALA
VAL CYS ARG --- --- --- THR PRO GLU ALA
PP-
11
VAL
ILE
ILE
G- S-T
LT-
A-P
RRR
IVV
CCC
EEE
VII
AAA
©CMBI 2009
What you have learned today
• A good sequence alignment is necessary to carry over information
between proteins.
•Putting amino acids below each other in a sequence alignment
implies that you predict that they are on equivalent positions in the
structures.
•Alignments can be optimized by using
•secondary structure preferences (especially for helix positioning and
prediction of beta-turns)
•3D structure info
• If the aligned sequences are > 80 aa long, then >25% sequence
identity is enough to reliably transfer structural information.
©CMBI 2011
Alignment videos
Swift.cmbi.ru.nl/teach/B1M
=> Seminars
=> Link to Aligning video page
©CMBI 2011
You are ready to…
• Applying these lessons to the practical exercises
• Performing your own bioinformatics research project!
Take home lessons:
Please remember to always use all structural information available to
you to optimize a sequence alignment. This can be real 3D data, but
can also be “just” your own knowledge about the properties and
preferences of the amino acids.
Sequence don’t exist; but structures do.
©CMBI 2009