Data, data everywhere…

Download Report

Transcript Data, data everywhere…

Construction of Substitution
Matrices
• BLOSUM: BLOcks SUbstitution Matrix
• PAM: Point Accepted Mutations
Substitution Matrices
• Contain values proportional to the probability that
amino acid A mutates into amino acid B for all
pairs of amino acids through a period of
evolution
Substitution Matrices
• Contain values proportional to the probability that
amino acid A mutates into amino acid B for all
pairs of amino acids through a period of
evolution
• Are constructed from a large and diverse sample
of sequence alignments
Substitution Matrices
• Contain values proportional to the probability that
amino acid A mutates into amino acid B for all
pairs of amino acids through a period of
evolution
• Are constructed from a large and diverse sample
of sequence alignments
• Multiple alignment of well studied gene
sequences from different species
Substitution Matrices
• Contain values proportional to the probability that
amino acid A mutates into amino acid B for all
pairs of amino acids through a period of
evolution
• Are constructed from a large and diverse sample
of sequence alignments
• Multiple alignment of well studied gene
sequences from different species
• Use orthologs - functionally similar
Substitution Matrices
• Contain values proportional to the probability that
amino acid A mutates into amino acid B for all
pairs of amino acids through a period of
evolution
• Are constructed from a large and diverse sample
of sequence alignments
• Multiple alignment of well studied gene
sequences from different species
• Use orthologs - functionally similar
• Observed substitutions tend to preserve
functions
Substitution Matrices
• Contain values proportional to the probability that
amino acid A mutates into amino acid B for all
pairs of amino acids through a period of
evolution
• Are constructed from a large and diverse sample
of sequence alignments
• Multiple alignment of well studied gene
sequences from different species
• Use orthologs - functionally similar
• Observed substitutions tend to preserve
functions
• Minimal gaps
How to Construct Substitution
Matrices
Tabulate substitutions
• A to A: 9867 times
• A to R: 2 times
• A to N: 9 times
• etc….
How to Construct Substitution
Matrices
How to Construct Substitution
Matrices (BLOSUM)
How to Construct Substitution
Matrices (BLOSUM)
How to Construct Substitution
Matrices
Finding the Random Mutation
Rate
• Compute overall occurrence of an amino acid in a protein
database
Finding the Random Mutation
Rate
• Compute overall occurrence of an amino acid in a protein
database
http://www.ebi.ac.uk/swissprot/sptr_stats/index.html
Finding the Random Mutation
Rate
• Compute overall occurrence of an amino acid in a protein
database
http://www.ebi.ac.uk/swissprot/sptr_stats/index.html
Finding the Random Mutation
Rate
Example:
Expected random mutation rate is 1 in 10000 and
observed mutation rate of W to R is 1 in 10
Score = log (0.1/0.0001) = log (1000) = +3
PAM Matrices
[1 point mutation per 100 amino acids]
• does not take into account different evolutionary
rates between conserved and non-conserved
regions
•
PAM1 is 1% average change in amino acids
• PAM 250:??
PAM Matrices
PAM vs. BLOSUM
Basic Local Alignment Search
Tool (BLAST)
• Heuristic method
BLAST Algorithm
BLAST Algorithm
BLAST Algorithm
What can we search and
compare?
DNA vs DNA
Protein vs Protein
DNA vs Protein
Protein vs DNA
Reading Frames
The best BLAST program