Transcript Document

Construction of Substitution matrices
• BLOSUM
• BLOCKS SUBSTITUTION MATRIX
• PAM
• POINT ACCEPTED MUTATIONS
Substitution matrices
• Substitution matrix contains values proportional
to the probability that amino acid A mutates into
amino acid B for all pairs of amino acids through a
period of evolution
• Substitution matrices are constructed from a
large and diverse sample of sequence alignments
How to construct substitution matrices
• Multiple alignment of well studies gene
sequences from different species
• use orthologs: functionally similar
• observed substitutions tend to preserve functions
• minimal gaps
How to construct substitution matrices ?
• Tabulate substitutions
• A to A: 9867 times
• A to R: 2 times
•A to N: 9 times
• etc….
How to construct substitution matrices ?
Construction of Substitution matrices
• BLOSUM
Construction of Substitution matrices
• BLOSUM
How to construct substitution matrices ?
Substitution matrix score =
Log Observed mutation rate in alignment
Expected random mutation rate
How do we find the random
mutation rate?
The random mutation rate
• compute the overall occurrence of an amino acid in a
protein database
The random mutation rate
• compute the overall occurrence of an amino acid in a
protein database
http://www.ebi.ac.uk/swissprot/sptr_stats/index.html
The random mutation rate
Example:
Expected random mutation rate is 1 in 10000 and
observed mutation rate of W to R is 1 in 10
Score = log (0.1/0.0001) = log (1000) = +3
Calculating BLOSUM62 scores
Calculating BLOSUM62 scores
Calculating BLOSUM62 scores
Calculating BLOSUM62 scores
Calculating BLOSUM62 scores
Calculating BLOSUM62 scores
Calculating BLOSUM62 scores
Calculating BLOSUM62 scores
Calculating BLOSUM62 scores
Calculating BLOSUM62 scores
PAM matrices
• Point Accepted Mutations
[1 point mutation per 100 amino acids]
• does not take into account different
evolutionary rates between conserved and nonconserved regions
• PAM1 is 1% average change in amino acids
• PAM 250:??
Why use substitution matrices?????
Why use substitution matrices?
• Database searches
Database searching
Database searching
Database searching
• Query Sequence; Database sequences
Database searching: Filtering
• Dynamic programming is computationally
expensive
• Apply DP to sequence pairs that are likely to be
similar
• find short words: query-database
• DNA 7-28bases (BLAST?)
• PROTEIN 3 amino acids (BLAST?)
BLAST
• Basic Local Alignment Search Tool
• Heuristic method?
Blast output parameter
E value
E value
• number of alignments one can expect see by
chance.
• Number of alignments having the same or greater
score.
• Dependent on size of database and length of
query seq.