mcsis - What If

Download Report

Transcript mcsis - What If

What can (many) sequences tell us?
Vriend’s first rule of sequence analysis
If it is conserved,
it is important.
Regulation is most important, and thus most conserved. Second
most conserved is the location of function. Third is function,
Fourth is structure. And sequence is least conserved in evolution.
However, sequence conservation is easiest to determine, so that is
what people do research into...
Vriend’s second rule of sequence analysis
If it is very conserved,
it is very important
Consequences:
If something is conserved in each sub-family,
it is involved in a sub-family specific function.
What is CMA?
Functions never is just one residue
QWERTYASDFGRGH
QSLMTYLNDFHRPM
QAGTTNMKDTRRKC
QPRSTNRGDTRRVW
Red
= conserved
Green = variable
Blue = correlated
Part of the big alignment
We see correlations between columns and between ‘things’.
Correlations
Residues can correlate with residues, and when
that happens we found a function, no matter the
conservation or variability.
Residues that have a function, correlate with that
function.
Wilma
Example correlation: Which cysteines form a
pair in this protein family? Shown are aligned
peptides from five different bacteria.
ASDFGCHIKLMCNPQRSCTVW
YSDYGCNIKLFCQPQRSCT-ATDYPVQIKLMCNPQKSCSMW
YTDFGCHVKLLVQPNRSVTVW
-TDFGVHVKLMCNPQKSCSFW
Wilma Kuipers Thesis
Wilma
Summary correlation
If its conserved its important; if its important it remains conserved.
If residue positions show correlation with ‘something’ it is
involved in that ‘something’.
‘Something’ can be any of a very large number of functions
(optimal wavelength of an opsin; cellular localisation; binding an
ion; binding over an interface; involved in the same internal
motion; collaborating to bind the substrate; etcetera).
Wilma Kuipers Thesis
ConservedWilma
or very conserved? Recalcitrant.
VT1V1TVC11TRC1RT1C?VV
ASDFGCHIKLMCNPQRSCTVW
YSDYGCNIKLFCNPQRSCT-ATDLPVQIKLMANPQKSCSVW
LSDFGCHIKLMCNPQRSCTVW
YTDFGCHVKLLVQPNRSVAFW
-SDAGVHVKLMVQPNKSVSFYTDFGCHVKLLVQPNRSVVFW
-TDSGVHVKLMIQPDKSVSFW
V = Variable / not important
T = Conserved type
1 = Conserved
? = No idea
R= Recalcitrant
Left R is certainly recalcitrant. Left one is, or is not.
What is the concept?
Entropy and variability
So far we saw that conservation and correlation
can help us find functionally important residues.
Can variability patterns also tell us something?
Entropy
Sequence entropy Ei at position i is calculated
from the frequency pi of the twenty amino acid
types (p) at position i:
20
Ei =
S
i=1
pi ln(pi)
Variability
Sequence variability Vi is the number of
amino acid types observed at position i in
more than 0.5% of all sequences.
Summary variability analysis
Variability patterns hold information.
Entropy and Variability are two (of the) ways to measure
variability patterns.
Entropy and Variability patterns can say something
about the type of function, and thus add detail to
correlation studies.
Conclusions:
Data is difficult, but we need it (sic); life would be so
nice if we could do without it. PDB files are the worst.
Nomenclature is not homogeneous. Ontologies….
Much data has been carefully hidden in the literature,
where it can only be found back with great difficulty.
Residue numbering is difficult but very necessary.
Variability-entropy analysis is powerful, but requires
very 'good' alignments.