miesaytdsw qfeksyvtdy

Download Report

Transcript miesaytdsw qfeksyvtdy

Alignment
Most alignment programs create an alignment
that represents what happened during evolution
at the DNA level.
To carry over information from a well studied to
a newly developed sequence, we need an
alignment that represents the protein structures
today.
©CMBI 2001
The amino acids
Most information that enters the alignment
procedure comes from the physico-chemical
properties of the amino acids. Example: which
is the better alignment (left or right)?
CPISRTWASIFRCW
CPISRT---LFRCW
CPISRTWASIFRCW
CPISRTL---FRCW
©CMBI 2001
A difficult alignment problem
AYAYAYAYSY
LGLPLPLPLP
©CMBI 2001
A difficult alignment problem solved
AYAYAYAYSY
AGAPAPAPSP
LGLPLPLPLP
©CMBI 2001
Alignment order
MIESAYTDSW
QFEKSYVTDY
-MIESAYTDSW
QFEKSYVTDY-
©CMBI 2001
Alignment order
MIESAYTDSW
QFEKSYVTDY
QWERTYASNF
-MIESAYTDSW
QFEKSYVTDYQWERTYASNF-
©CMBI 2001
Alignment order
Conclusion:
Align first the sequences that
look very much like each other.
So you ‘build up information’
while making the alignments
most likely to be correct.
©CMBI 2001
Alignment order
In order to know which
sequences look most like each
other, you need to do all
pairwise alignments first.
This is what CLUSTAL does.
©CMBI 2001
Step 1
A B C
D E
A
0
6
9 11
9
D
B
6
0
7
9
7
E
C
9
7
0
8
6
D 11
9
8
0
4
E
7
6
4
0
9
©CMBI 2001
Step 2
A B C DE
A
0
6
9
10
B
6
0
7
8
C
9
7
0
7
DE 10
8
7
0
D
E
A
B
©CMBI 2001
Step 3
AB C DE
AB
0
8
9
C
8
0
7
DE
9
7
0
C
D
E
A
B
©CMBI 2001
Step 4
AB CDE
AB
CDE
0
8.5
8.5
0
C
D
E
A
B
©CMBI 2001
Other algorithms
Multi-sequence alignment can
also be done with an iterative
‘profile’ alignment.
A) Make alignment of few, wellaligned sequences
B) Align all sequences using
this profile
©CMBI 2001
1. What is a profile?
Normally, we use a PAM-like
matrix to determine the score
for each possible match in an
alignment.
This assumes that each match I
<-> E is the same. But it
isn’t.
©CMBI 2001
2. What is a profile?
QWERTYIPASEF
QWEKSFIPGSEY
NWERTMVPVSEM
QFEKTYLPSSEY
NFIKTLMPATEF
QYIRSLIPAGEM
NYIQSLIPSTEL
QFIRSLFPSSEI
1
2
3
At 1, E and I are
both OK.
At 2, I is OK,
but E surely not.
At 3, E is OK,
but I surely not.
©CMBI 2001
3. What is a profile?
The knowledge about which
residue types are good for a
certain position can be
expressed in a profile.
A profile holds for each
position 20 scores for the 20
residue types, and sometimes
also two values for gap open
and gap elongation.
©CMBI 2001
Back to other algorithms
Multi-sequence alignment can
also be done with an iterative
‘profile’ alignment.
A) Make alignment of few, wellaligned sequences
B) Align all sequences using
this profile
©CMBI 2001
Conserved, variable, or in-between
QWERTYASDFGRGH
QWERTYASDTHRPM
QWERTNMKDFGRKC
QWERTNMKDTHRVW
Gray = conserved
Black = variable
Green = correlated mutations
©CMBI 2001
Correlated mutations determine the tree shape
1
2
3
4
AGASDFDFGHKM
AGASDFDFRRRL
AGLPDFMNGHSI
AGLPDFMNRRRV
©CMBI 2001
Correlation = Information
1, 2 and 5 bind calcium; 3 and 4 don’t.
Which residues bind calcium?
1
2
3
4
5
123456789012345
ASDFNTDEKLRTTYI
ASDFSTDEKLKTTYI
LSFFTTDTKLATIYI
LSHFLTDLKLATIYI
ASDFTTDEKLALTYI
©CMBI 2001