Chapter 5 – Scoring matrices

Download Report

Transcript Chapter 5 – Scoring matrices

Chapter 5 – Scoring matrices
• Used for comparing (aligning) sequences
• Gives scores between each pair of amino acids
• Should reflect
– The degree of ”biological relatedness”
– The ”probability” that two amino acids occurring in different sequences
have common ancestor
• Should be symmetric
• Substitution matrices
– The probability that an amino acid a is changed to amino acid b (in a
certain evolutionary time)
– Is generally not symmetric
Chapter 5. Scoring matrices
1
Scoring matrices
•
•
•
Identity matrix (scoring 0/1)
Use of the distances in the genetic codes
Use of the amino acid similarities based on physio-chemical
properties
•
Scoring matrices based on experimental data (PAM – BLOSUM)
Chapter 5. Scoring matrices
2
DAYHOFF’s PAM-MATRICES
•
•
•
Based on experimental data
t – evolutionary time interval
Sequences from 34 superfamilies were used
1.
Divide the sequences into groups (71) of homologous sequences,
and make a multiple alignment for each of them
Construct evolutionary trees for each group, and estimate the
mutations that have occurred
Define an evolutionary model to explain the evolution
Construct substitution matrices, for each amino acid pairs (a,b) an
estimate of the probability that an amino acid a has mutated to an
amino acid b in time interval t
Construct scoring matrices from the substitution matrices.
2.
3.
4.
5.
Note that a and b are variables that mean any amino acid.
Chapter 5. Scoring matrices
3
Example multiple alignment and evolutionary tree
Chapter 5. Scoring matrices
4
Example
Chapter 5. Scoring matrices
5
The model of the evolution
• The probability of a mutation in a position is independent on
– Position and neighbour residues
– Previous mutations in the position
• The biological (evolutionary) clock is assumed (meaning constant
rate of mutations)
• This means that evolutionary time can be measured in number of
mutations (here substitutions)
• The measure is PAM (Point Accepted Mutations)
• 1 PAM is one accepted mutation per 100 residues
Chapter 5. Scoring matrices
6
Substitution matrix M1
Chapter 5. Scoring matrices
7
Calculate Mz by matrix multiplication, show for z=2
•
•
Z=2 mean two mutations per 100 residues
A residue a can be changed to residue b after 2 PAM of following
reasons:
1.
2.
3.
a is mutated to b in first PAM, unchanged in the next, with probability
MabMbb
a is unchanged in first PAM, changed in the next, probability MaaMab
a is mutated to an amino acid x in the first PAM, and then to b in the next,
probability MaxMxb, x being any amino acid unequal (a,b)
These three cases are disjunctive, hence
2
M ab
 M ab M bb  M aa M ab 
M
x{a , b}
ax
M xb 
M
xM
Chapter 5. Scoring matrices
ax
M xb
8
M250
Chapter 5. Scoring matrices
9
PAM-250 scoring matrix
Chapter 5. Scoring matrices
10
BLOSUM (Henikoff & Henikoff)
• Make multiple alignments and discover blocks not containing gaps
(used over 2,000 blocks)
...KIFIMK.......GDEVK...
...NLFKTR
GDSKK...
KIFKTK
GDPKA
KLFESR
GDAER
KIFKGR
GDAAK
• For each column in each block they counted the number of
occurrences of each pair of amino acids (210 different pairs
(20*21/2) )
• A block of length w from an alignment of n sequences has wn(n-1)/2
occurrences of amino acid pairs
– Let hab be the number of occurrences of the pair (ab) in all blocks
(hab=hba)
– T total number of pairs
– fab=hab/T
Chapter 5. Scoring matrices
11
Constructing logodds matrix
•
•
•
The observed frequences fab must be treated against the expected
probability that the observed pairs occur just by chance: eab
Then Rab=log(fab/eab)
Procedure for finding eab
– Assume that the observed frequences is equal to the frequences in the actual
population of sequences we want to align
– Total number of residues in the data is 2T
2haa   hax
– Amino acid a occurs
times (hax=hxa)
xa
– a occur with frequence
pa 
2haa   hax
xa
2T
 f aa  
xa
f ax
2
– Assume now that the pairs are separated, and new pairs drawn with these
probabilities
– eaa=papa
– eab=papb + pbpa = 2 papb for a <> b
Chapter 5. Scoring matrices
12
Constructing BLOSUM-x
• Use the blocks
• Combine segments inside a block which are x% similar
• Use the same procedure as for BLOSUM-1, using the revised blocks
Chapter 5. Scoring matrices
13