BMMB597E Protein Evolution

Download Report

Transcript BMMB597E Protein Evolution

BMMB597E
Protein Evolution
Protein classification
1
Protein families
• The first protein structures determined by X-ray
crystallography, myoglobin and haemoglobin,
were solved (in 1959—60) before the amino acid
sequences were determined
• It came as a surprise that the structures were
quite similar
• Soon it became clear, on the basis of both
sequences and structures, that there were
families of proteins
2
myoglobin
haemoglobin
3
50 years earlier, there were some hints
…
• E.T. Reichert & A.P. Brown. The differentiation
and specificity of corresponding proteins and
other vital substances in relation to biological
classification and organic evolution: the
crystallography of hemoglobins. (Carnegie
Institution of Washington, 1909)
• Crystallography 3 years before discovery of Xray diffraction?
4
Reichert and Brown studied interfacial
angles in haemoglobin crystals
• Stenö’s law (1669): different crystals of the same
substance may have differerent sizes and shapes,
but the angles between faces are constant for
each substance
• They found that the angles differed from species
to species
• Similarities in values of interfacial angles were
consistent with classical taxonomic tree
• They even found differences between oxy- and
deoxyhaemoglobin
5
Most premature scientific result ever?
• These results implied:
– That proteins adopted (or at least could adopt)
unique structures, to form a crystal
– That protein structures varied between species
– That this variation was parallel with the evolution
of the species
– That proteins could change structure as a result of
changes in state of ligation
• In 1909!
6
M.O. Dayhoff
•
•
•
•
Pioneer of bioinformatics
Collected protein sequences
First curated ‘database’
Recognized that proteins form families, on the
basis of amino acid sequences
• Computational sequence alignments
• First evolutionary tree
• First amino-acid substitution matrix (later
replaced by BLOSUM)
7
Can relationships among proteins be
extended beyond families?
• Families = sets of proteins with such obvious
similarities that we assume that they are
related
• One question: how much similarity do we
need to believe in a relationship?
• How far can evolution go?
• Convergent evolution?
• Cautionary tale: chymotrypsin / subtilisin
8
Chymotrypsin-subtilisin
• Both proteolytic enzymes
– Chymotrypsin mammalian
– subtilisin from B. subtilis
• Both have catalytic triads
• Same function – same mechanism
• Sequences 12% similar (near noise level)
• However, structures show them to be unrelated
9
Chymotrypsin / Subtilisin
10
Catalytic triad in serine proteinases
11
Chymotrypsin and subtilisin have
similar catalytic triads
12
How can we classify proteins that
belong to families?
• Align sequences
• Calculate phylogenetic tree (various ways to
do this, depend on sequence alignment)
• Usually, phylogenetic tree of homologous
proteins from different species follow
phylogenetic tree based on classical taxonomy
• That is reassuring
• But what happens as divergence proceeds?
13
How can we classify proteins that do
not obviously belong to families?
• Base this on structure rather than sequence
• Structural similarities are maintained as
divergence proceeds, better than sequence
similarities
• For closely related proteins, expect no
difference between sequence-based and
structure based classification
• How far can classification be extended?
14
SCOP
Structural Classification of Proteins
• Idea of A.G. Murzin, based on old work by C.
Chothia and M. Levitt
• Even if two proteins are not obviously
homologous, they may share structural
features, to a greater or lesser degree.
• For instance, the secondary structures of
some proteins are only -helices
• Others, have -sheets but no -helices
15
SCOP
• SCOP is a database that gives a hierarchical
classification of all protein domains
• Recall that a domain is a compact subunit of a
protein structure that ‘looks as if’ it would
have independent stability
Fragment of fibronectin
16
Dissection of structure into domains
• It is not always quite so obvious how to divide
a protein into domains
• There is some (not a lot) of room for argument
• Note that sometimes the chain passes back
and forth between domains
• In these cases one or both domains do not
consist entirely of a consecutive set of
residues
17
lactoferrin
18
SCOP, CATH, DALI Database classify
protein structures
• SCOP (Structural Classification of Proteins)
• CATH (Class, Architecture, Topology, Homologous
superfamily)
• DALI Database
• These web sites have many useful features:
– information-retrieval engines, including
search by keyword or sequence
– presentation of structure pictures
– links to other related sites including bibliographical
databases.
19
SCOP
http://www.scop.mrc-lmb.cam.ac.uk
• SCOP organizes protein structures in a
hierarchy according to evolutionary origin and
structural similarity.
• Domains -- extracted from the Protein Data
Bank entries.
• Sets of domains are grouped into families: sets
domains for which imilarities in structure,
function and sequence imply a common
evolutionary origin.
20
The SCOP hierarchy
• Families that share a common structure, or even a
common structure and a common function, but lack
adequate sequence similarity – so that the evidence
for evolutionary relationship is suggestive but not
compelling – are grouped into superfamilies
• Superfamilies that share a common folding topology,
for at least a large central portion of the structure,
are grouped as folds.
• Finally, each fold group falls into one of the general
classes.
21
Major classes in SCOP
•  – secondary structure all helical
•  – secondary structure all sheet
• / – helices and sheets, but in different parts of
structure
• + – contain -- supersecondary structure
• ‘small proteins’ – which often have little
secondary structure and are held together by
disulphide bridges or ligands; for instance, wheatgerm agglutinin)
22
Summary of SCOP hierarchy
•
•
•
•
•
Class
Fold
Superfamily
Family
Domain
23
SCOP classification of flavodoxin
Protein: Flavodoxin from Clostridium beijerinckii [TaxId: 1520]
Lineage:
Root: scop
Class: Alpha and beta proteins (a/b) [51349]
Mainly parallel beta sheets (beta-alpha-beta units)
Fold: Flavodoxin-like [52171]
3 layers, a/b/a; parallel beta-sheet of 5 strand, order 21345
Superfamily: Flavoproteins [52218]
Family: Flavodoxin-related [52219] binds FMN
Protein: Flavodoxin [52220]
Species: Clostridium beijerinckii [TaxId: 1520] [52226]
PDB Entry Domains:
5nul complexed with fmn; mutant chain a [31191]
2fax complexed with fmn; mutant chain a [31194]
… many others
24
Clostridium beijerinckii Flavodoxin
(stereo pair)
25
Flavodoxin
NADPH-cytochrome
P450 reductase
same superfamily, different family
26
Flavodoxin
CHEY
same fold, different superfamily
27
Flavodoxin
Spinach ferredoxin
reductase
same class, different folds
28
Flavodoxin in the SCOP hierarchy
• To give some idea of the nature of the similarities
expressed by the different
levels of the hierarchy
• Flavodoxin from Clostridium beijerinckii and NADPHcytochrome P450 reductase are in the same superfamily,
but different families.
• Flavodoxin and the signal transduction protein CHEY are in
the same fold category, but different superfamilies.
• Flavodoxin and Spinach ferredoxin reductase are in the
same class – + – but have different folds.
29
CATH presents a classification scheme
similar to that of SCOP
• CATH = Class, Architecture, Topology, Homologous superfamily, the levels of
its hierarchy.
• In CATH, proteins with very similar structures, sequences and functions are
grouped into sequence families.
• A homologous superfamily contains proteins for which similarity of sequence
and structure gives evidence of common ancestry
• A topology or fold family comprises sets of homologous superfamilies that
share the spatial arrangement and connectivity of helices and strands
• Architectures are groups of proteins with similar arrangements of helices and
sheets, but with different connectivity. For instance, different four -helix
bundles with different connectivities would share the same architecture but
not the same topology in CATH
• General classes of architectures in CATH are: . , - (subsuming the /
and + classes of SCOP), and domains of low secondary structure content.
30
Do different classification schemes agree?
• To classify protein structures (or any other set of objects) you
need to be able to measure the similarities among them.
• The measure of similarity induces a tree-like representation of
the relationships.
• CATH, SCOP, DALI and the others, agree, for the most part, on
what is similar, and the tree structures of their classifications
are therefore also similar.
• However, even an objective measure of similarity does not
specify how to define the different levels of the hierarchy.
• These are interpretative decisions, and any apparent
differences in the names and distinctions between the levels
disguise the underlying general agreement about what is
similar and what is different.
31