Transcript disordered

The FREAKS
of
PROTEIN SEQUENCE
Session 3.1: Repeats
Session 3.2: Biased regions
Miguel Andrade
Johannes-Gutenberg University of Mainz
[email protected]
Definition
14% proteins contains repeats (Marcotte et al,
1999)
1: Single amino acid repeats.
2: Longer imperfect tandem repeats. Assemble in
structure.
Definition CBRs
Perfect repeat: QQQQQQQQQQQ
Imperfect: QQQQPQQQQQQ
Amino acid type: DDDDDEEEDEDEED
Compositionally biased regions (CBRs)
High frequency of one or two amino acids in a
region.
Particular case of low complexity region
Function CBRs
Conservation => Function
Length, amino acid type not necessarily conserved
Frequency: 1 in 3 proteins contains a
compositionally biased region (Wootton, 1994),
~11% conserved (Sim and Creamer, 2004)
Function CBRs
Conservation => Function
Length, amino acid type not necessarily conserved
Functions:
Passive: linkers
Active: binding, mediate protein interaction,
structural integrity
(Sim and Creamer, 2004)
Structure of CBRs
Often variable or flexible: do not easily crystalize
1CJF: profilin bound to polyP
2IF8: Inositol Phosphate Multikinase Ipk2
2IF8: Inositol Phosphate Multikinase Ipk2
RVSETTTSGSL
2CX5: mitochondrial cytochrome c
B subunit N-terminal
FFFFIFVFNF
2CX5: mitochondrial cytochrome c
B subunit N-terminal
Types of CBRs
More than 6 aa in length, 1.4% of all, 87% of them in Euk (Faux et al 2005)
Types of CBRs
Distribution is not random:
Eukaryota:
Most common: poly-Q, poly-N, poly-A, poly-S, poly-G
Prokaryota:
Most common: poly-S, poly-G, poly-A, poly-P
Relatively rare: poly-Q, poly-N
Very rare or absent in both eukaryota and prokaryota:
Poly-I, Poly-M, Poly-W, Poly-C, Poly-Y
Toxicity of long stretches of hydrophobic residues.
(Faux et al 2005)
Filtering out CBRs
Normally filtered out as low complexity region: they
give spurious BLAST hits
QQQQQQQQQQ
||||||||||
QQQQQQQQQQ
10/10 id
IDENTITIES
||||||||||
IDENTITIES
10/10 id
Filtering out CBRs
Normally filtered out as low complexity region: they
give spurious BLAST hits
QQQQQQQQQQ
||||||||||
QQQQQQQQQQ Shuffle: 10/10 id
IDENTITIES
||||||||||
IDENTITIES
10/10 id
Filtering out CBRs
Normally filtered out as low complexity region: they
give spurious BLAST hits
QQQQQQQQQQ
||||||||||
QQQQQQQQQQ Shuffle: 10/10 id
IDENTITIES
| |
SIINDIETTE Shuffle: 2/10 id
Filtering out CBRs
Option for pre-BLAST treatment
SEG algorithm:
1) Identify sequence regions with low information
content over a sequence window
2) Merge neighbouring regions
Eliminates hits against common acidic-, basic- or
proline-rich regions
(Wootton and Federhen, 1993)
Exercise 1. Filtering CBRs for BLAST using SEG
•Obtain this protein sequence from NCBI. This is a
hypothetical protein from Nematocida sp., a microsporidia
(spore-forming fungi) that infects the worm Caenorhabditis
elegans.
•Can you see funny things in this sequence?
•Go to the NCBI's BLAST web page and go to the "protein
blast" option
•Search for homologs of the protein
•Keep the output
•Do the same search in another NCBI's BLAST window
selecting the filter low complexity regions using SEG option
•Compare the outputs: Can you identify different hits? Do
matches to the same sequence have relevant differences in
the E-value? Comment on the relevance of the differences.
A particular analysis…
AIR9
Ser rich
+ basic
LRR
(1708 aa)
A9 repeats
Δ1
conserved
region
Δ3
Δ2
Δ15
Δ9
Δ10
Δ6
Δ11
Δ12
Δ14
Δ16
Microtubule localization of Δx-GFP
Buschmann, et al (2006).
Current Biology.
Buschmann, et al (2007).
Plant Signaling & Behavior
…triggers a tool
A particular analysis…
…triggers BiasViz
http://sourceforge.net/projects/biasviz/
Huska, et al. (2007). Bioinformatics
A particular analysis…
…triggers BiasViz
http://sourceforge.net/projects/biasviz/
Huska, et al. (2007). Bioinformatics
ADAM15
Binds SH3 of endophilin and SH3 PX1 PMID:10531379
Binds SH3 of endophilinI and SH3 PX1 PMID:10531379
Binds SH3 of Fish PMID:12615925
Binds SH3 of Grb2 PMID:11127814
Binds SH3 of Fish PMID:12615925
Binds SH3 of Fish PMID:12615925
Binds SH3 of ArgBP1/ABI2 PMID:12463424
a
b
0.4
0.3
0.4
0.3
0.2
0.2
0.1
0.1
0.0
0.0
0.4
0.3
c
ADAM19
ADAM9
0.4
0.3
0.2
0.2
0.1
0.1
0.0
0.0
ADAM11
ADAM20
Exercise 2. Viewing CBRs in an alignment with
BiasViz2
•Go to the BiasViz2 web page
•Launch BiasViz2
•Load this alignment on the step 1 section
•Hit the "Go to graphical view" button
•Try to find combinations of parameters that reveal CBRs
•Try hydrophobic residues and window size 10. If I tell you this is a
transmembrane protein, what is this result telling you?
•Can you see other biased regions?
Function of polyQ
Martin
Schaefer
polyQ in Huntingtin
Human
Dog
Mouse
Opossum
Chicken
Frog
Zebrafish
Trout
Fugu
Stickleback
Lancelet
Capitella
Limpet
Nematostella
Trichoplax
Ciona intestinalis
Ciona savignyi
D. melanogaster
D. mojavensis
D. sechellia
D. erecta
D. yakuba
D. grimshawi
D. pseudoobscura
D. persimilis
D. ananassae
D. willistoni
D. virilis
Schaefer et al (2012) Nucleic Acids Res.
1
5 10
50 100 500 1000
partners
human
polyQ
TFs long
non
polyQ
5 10
5 10
50 100 500 1000
50 100 500 1000
yeast
1
1
partners
human
non
polyQ
polyQ
TFs long
non
polyQ
1
2
5 10
50 100 500 1000
5 10 20 50 100 200 500
TFs long
1
partners
polyQ
polyQ
>14
polyQ
4-14
no
polyQ
polyQ
>14
polyQ
4-14
no
polyQ
polyQ protein
unbound
N-terminal
coiled
coil
polyQ
disordered
polyP
C-terminal
polyQ protein
polyQ protein
unbound
bound
protein X
N-terminal
coiled
coil
coiled
coil
polyQ
polyQ
disordered
polyP
polyP
C-terminal
ATXN1Q82NT is toxic
ATXN1Q82NT aggregates
Spyros
Petrakis
Petrakis et al (2012) PLoS Genetics
interactors that change ATXN1Q82NT toxicity
Normal polyQ protein
CC
polyQ
disordered
CC partner
Normal polyQ protein
CC
polyQ
disordered
CC partner
Normal polyQ protein
CC
polyQ
disordered
CC partner
polyQ
alpha-helix
non-CC partner
Normal polyQ protein
Toxic polyQ protein
CC
polyQ
disordered
CC partner
polyQ
alpha-helix
non-CC partner
Normal polyQ protein
Toxic polyQ protein
CC
polyQ
disordered
CC partner
polyQ
alpha-helix
non-CC partner
Normal polyQ protein
Toxic polyQ protein
CC
polyQ
disordered
CC partner
polyQ
alpha-helix
non-CC partner
polyQ
beta-aggregates
Normal polyQ protein
Toxic polyQ protein
CC
polyQ
disordered
polyQ
beta-aggregates
CC partner
polyQ
alpha-helix
non-CC partner
polyQ
beta-aggregates
Normal polyQ protein
Toxic polyQ protein
CC
polyQ
disordered
polyQ
beta-aggregates
CC partner
polyQ
alpha-helix
non-CC partner
polyQ
beta-aggregates
Normal polyQ protein
Toxic polyQ protein
CC
polyQ
disordered
polyQ
beta-aggregates
CC partner
polyQ
alpha-helix
non-CC partner
polyQ
increased beta-aggregates
Normal polyQ protein
Toxic polyQ protein
CC
polyQ
disordered
polyQ
beta-aggregates
CC partner
polyQ
alpha-helix
non-CC partner
polyQ
increased beta-aggregates
Normal polyQ protein
Toxic polyQ protein
CC
polyQ
disordered
polyQ
beta-aggregates
CC partner
polyQ
alpha-helix
non-CC partner
polyQ
increased beta-aggregates
Normal polyQ protein
Toxic polyQ protein
CC
polyQ
disordered
polyQ
beta-aggregates
CC partner
polyQ
alpha-helix
non-CC partner
polyQ
increased beta-aggregates
BiasViz2
Exercise 3. All together! View repeats, CBRs, and
secondary structure in the N-terminal of
huntingtin with BiasViz2
•Go to the BiasViz2 web page
•Load this alignment of N-terminal huntingtins on the step 1 section
•Load this file with secondary structure predicted for the human fragment in
the step 2 section
•Load this file with ARD2 predictions for all sequences of the alignmnent in
the step 2 section "raw values for each amino acid"
•Hit the "Go to graphical view" button
•Find the CBRs we have discussed for huntingtin
•Compare the relative position of the predicted repeats and the predicted
secondary structure