bi6103-20feb04
Download
Report
Transcript bi6103-20feb04
Recognition of Protein Features
Limsoon Wong
Institute for Infocomm Research
BI6103 guest lecture on ?? March 2004
Copyright 2003 limsoon wong
Lecture Plan
• Membrane proteins
• Subcellular localization
Copyright 2003 limsoon wong
Recognition of
Transmembrane Helices
Copyright 2003 limsoon wong
Eukaryotic Cells
• Eukaryotic cells have membrane-bound
compartments with specialized functions
Copyright 2003 limsoon wong
Lipids & Membrane
• Membrane is a double layer of lipids and associated proteins
which define subcellular compartments or enclose the cell
• Lipids consist of a “polar head group” and long-chain fatty acids
• This dual nature promotes formation of lipid bilayers
• “Hydrophobic tails” are shielded from aqueous environment
• Water-soluble (i.e., charged or polar) molecules cant pass through
this impermeable barrier
• Permeability across the bilayer is regulated by membrane proteins
that span the bilayer and function like channels or pores
Copyright 2003 limsoon wong
Membrane Proteins
• Two types of
membrane proteins:
Integral vs peripheral
• Two types of integral
membrane proteins:
all- vs -barrel
all-
-barrel
Copyright 2003 limsoon wong
Topography & Topology
• topography: predict
location of
transmembrane
segment
• topology: predict
location of N- and Ctermini wrt lipid
bilayer
• We focus on topography
prediction for all-
Lipid molecules
membrane proteins
Copyright 2003 limsoon wong
Datasets
• Jayasinghe et al. Protein Sci, 10:455-458, 2001
– 59 high resolution membrane proteins
– www.biocomp.unibo.it/gigi/ENSEMBLE
• Moller et al. Bioinformatics, 16:1159--1160, 2000
– 151 low resolution membrane proteins
• Jones et al., Biochem., 33(10):3038--3049, 1994
– 38 multi-spanning and 45 single-spanning membrane proteins
– topologies experimentally determined
• Sonnhammer et al., ISMB, 6:175-182, 1998
– 108 multi-spanning and 52 single-spanning membrane proteins
– most of experimentally determined topologies, but less reliably
determined than Jones et al.
Copyright 2003 limsoon wong
Monne et al., JMB, 288:141--145, 1999:
Turn Propensity Scale for TM Helices
ER
• E. coli Lep protein contains
two TM domains (H1, H2)
and C-terminal doman P2
• Translocation of P2 to
lumenal side is easy to test
by glycoslation
• Replace H2 by 40 residue
poly-L segment
LIK4L21XL7VL10Q3P
• The poly-L segment can
form either one long TM or
2 closely-spaced TM
helices, depending on what
is substituted for X
Copyright 2003 limsoon wong
Monne et al., JMB, 288:141--145, 1999:
Turn Propensity Scale for TM Helices
glycoslated
non-glycoslated
• Using the poly-L segment,
measure “turn” propensity of
the 20 amino acids by
substituting them for the X in
the poly-L segment
• Hydrophobic residues (I, V, L, F,
C, M, A) do not induce turn
• Charged and polar residues
(except S & T) induce turn
• Exercise:
– What are the charged/polar
residues?
– What could be reason of S & T
not inducing turn?
Copyright 2003 limsoon wong
Monne et al., JMB, 288:141--145, 1999:
Turn Propensity Scale for TM Helices
• In all- membrane
proteins,
– hydrophobic residues
prefer membrane env and
have low turn propensity
– charged & polar residues
induce turn formation to
avoid membrane interior
prediction of TM helix
distinction of 1 long TM
helix vs 2 closely spaced
TM helices
Monne et al., JMB, 288:141--145, 1999
Copyright 2003 limsoon wong
Wiess et al, ISMB, 1:420--421, 1993
Hydrophobicity Approach
• Inside of cellular
membrane is
hydrophobic
• Segment of protein that
spans membrane is
expected to contain
many hydrophobic amino
acids
Locate segments that
have high average
“hydrophobicity” score
Monne et al., JMB, 288:141--145, 1999
Copyright 2003 limsoon wong
Wiess et al, ISMB, 1:420--421, 1993
Hydrophobicity Approach
• Caveats:
– may be unable to
distinguish hydrophobic
core of nonmembrane
proteins vs.
transmembrane regions
– what are the right
thresholds?
•
•
•
•
find a segment of 10 to 70aa with hp > 0.71
expand to longer segment with hp > 0.35
mark this segment as TM
repeat above starting from position after previous segment
Adjustable
thresholds
Copyright 2003 limsoon wong
An Example: Bacteriorhodopsin
1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag
61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp
121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii
181 gtegagvvgl gvetlafmvl dvta
7 transmembrane helices
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=protein&list_uids=461610&dopt=GenPept&term=bacteriorhodopsin&qty=1
Copyright 2003 limsoon wong
An Example: Bacteriorhodopsin
• After applying
hydrophobicity
scale...
1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag
61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp
121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii
181 gtegagvvgl gvetlafmvl dvta
Copyright 2003 limsoon wong
An Example: Bacteriorhodopsin
• Compute
hydrophobicity
score, hp > 7
1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag
61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp
121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii
181 gtegagvvgl gvetlafmvl dvta
TM identified: 6/7, TM FP: 0
TM residue identified: 62/117, TM residue FP: 4
Copyright 2003 limsoon wong
An Example: Bacteriorhodopsin
• Expand segment,
maintain hp > 5,
avoid low
hydrophobicity
1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag
61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp
121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii
181 gtegagvvgl gvetlafmvl dvta
TM identified: 6/7, TM FP: 0
TM residue identified: 100/117, TM residue FP:15
Copyright 2003 limsoon wong
Sonnhammer et al., ISMB, 6:175-182, 1998:
TMHMM, A HMM Approach
• There are 3 main locations of a residue:
– TM helix core (viz., in hydrophobic tail of membrane
– TM helix cap (viz., in head of membrane)
• cytoplasmic vs
• non-cytoplasmic side of the helix core
cyto
– loops
• cytoplasimc vs
• non-cytoplasmic (short) vs
• non-cytoplasmic (long)
non-cyto
So needs HMM with 7 states
• Exercise: What is the 7th state for?
Copyright 2003 limsoon wong
Sonnhammer et al., ISMB, 6:175-182, 1998:
TMHMM, Architecture
cyto
non-cyto
Each state has an associated probability
distribution over the 20 amino acids
characterizing the variability of amino
acids in the region it models
Copyright 2003 limsoon wong
Sonnhammer et al., ISMB, 6:175-182, 1998:
TMHMM, Architecture
• The first 3 and last 2 core states have to be traversed.
But all other core states can be bypassed.
• This models core regions of 5--25 residues
Copyright 2003 limsoon wong
Sonnhammer et al., ISMB, 6:175-182, 1998:
TMHMM, Architecture
To model neutral amino
acid distribution
To model bias in amino
acid usage near cap
• The states of globular, loop, & cap regions.
• The caps are 5 residues each. Since core is 5--25
residues, this allows for helices 15--35 residues long
Copyright 2003 limsoon wong
Sonnhammer et al., ISMB, 6:175-182, 1998:
TMHMM, Training the HMM
• Stage 1: Baum-Welch is used for maximum likelihood estimation from
“diluted” labeled training data. As precise end of TM is only
approximately known, we “dilute” by unlabeling 3 residues on each
side of a helix boundary to accommodate this
• Stage 2: Baum-Welch is used for maximum likelihood estimation from
“relabeled” training data. The original training data are diluted as by
unlabeling 5 residues on each side of a helix boundary. Model from
Stage 1 is used to produce “relabeled training data” by relabeling this
part under constraints of remaining labels
• Stage 3: Model from Stage 2 is further tuned by a method for
“discriminative” training, to maximize probability of correct prediction
(Krogh, ISMB, 5:179--186, 1997)
Copyright 2003 limsoon wong
Krogh, ISMB, 5:179--186, 1997:
Discriminative HMM Training
Copyright 2003 limsoon wong
Sonnhammer et al., ISMB, 6:175-182, 1998:
TMHMM, Example
Non-cytoplasmic
TM segment
Cytoplasmic
Datasets
• Jones et al., Biochem., 33(10):3038--3049, 1994
• Sonnhammer et al., ISMB, 6:175-182, 1998
Copyright 2003 limsoon wong
Sonnhammer et al., ISMB, 6:175-182, 1998:
TMHMM, Accuracy (10-CV)
All TM segments
& their orientation
correctly predicted
All TM segments
correctly predicted,
ignoring orientation
precision
Copyright 2003 limsoon wong
Martelli et al. Bioinformatics, 19:i205--i211, 2003
ENSEMBLE
NN
HMM1
HMM2
ENSEMBLE
Copyright 2003 limsoon wong
ENSEMBLE:
The Neural Network Part
1
h1
17 * 20
input units
h2
HMM
Input
layer
17*2
inputs
LOOP
15 hidden
units
17
h5
Feed-forward
back-propagation
neural network
• The NN part is a cascade shown above, a la Rost
et al., Protein Science, 1995
Copyright 2003 limsoon wong
ENSEMBLE:
The HMM1 Part
• HMM1 models the hydrophobic nature of most
TM helices, a la Krogh et al. JMB 2001 & Sonnhammer et al.,
ISMB 1998
Copyright 2003 limsoon wong
ENSEMBLE:
The HMM2 Part
• HMM2 models TM helices that are mix of
hydrophobic and hydrophilic residues, ala Martelli
et al., Bioinformatics 2002.
Copyright 2003 limsoon wong
ENSEMBLE:
Predicting if a residue is in TM
NN
helix
•
•
•
•
HMM1
HMM2
ENSEMBLE
loop (inner I, outer O)
NN(p,i) = NN(H,p,i) NN(L,p,i)
HMM1(p,i) = AP1(H,p,i) AP1(I,p,i) AP1(O,p,i)
HMM2(p,i) = AP2(H,p,i) AP2(I,p,i) AP2(O,p,i)
E(p,i) = (NN(p,i) + HMM1(p,i) + HMM2(p,i)) / 3
position
E(p,i) > 0 means residue i
of protein p is in TM helix
Copyright 2003 limsoon wong
Ensemble: Topography Prediction
Fariselli et al., Bioinformatics, 2003
TM helix found by
MaxSubSeq but
would be missed
w/o it
NN
HMM1
ENSEMBLE
HMM2
MaxSubSeq
This path is
taken means
positions
m to j form
a helix
Copyright 2003 limsoon wong
Ensemble:
Topography Prediction Results
90%
85%
80%
75%
70%
65%
60%
Jayasinghe
(CV)
Moller
NN
HMM1
HMM2
ENSEMBLE
TMHMM2.0
MEMSAT
PHD
HMMTOP
A prediction is considered correct if
(a) the number of TM segments is correct and
(b) the overlap between a predicted and a real TM segment > 8aa
Copyright 2003 limsoon wong
Topology Prediction: Postive-Inside
Gavel et al., FEBS, 282:41--46, 1991
Rule
• Positivelycharged
residues (Lys
and Arg) are
enriched
more than 2
fold in
stromal vs
luminal loops
Copyright 2003 limsoon wong
Topology Prediction:
Ensemble
“positive-inside” rule
Copyright 2003 limsoon wong
Ensemble:
Topology Prediction Results
80%
75%
70%
65%
60%
55%
50%
45%
40%
ENSEMBLE
(rule 4)
TMHMM2.0
MEMSAT
PHD
HMMTOP
Jayasinghe
(CV)
Moller
ENSEMBLE
(rule 1)
Copyright 2003 limsoon wong
Short Break
Copyright 2003 limsoon wong
Subcellular Localization
Copyright 2003 limsoon wong
Compartments and Sorting
• Eukaryotic cells requires
proteins be targeted to their
subcellular destinations
• Protein sorting is
determined by specific
amino acid sequences,
or “signals”, within the
protein
• Secretory pathway
targets proteins to
plasma membrane,
some membranebound organelles such
as lysosomes, or to
export proteins from
the cell
Copyright 2003 limsoon wong
Secretory Pathway
• The secretory pathway consists
of the endoplasmic reticulum
(ER), Golgi apparatus and
transport vesicles
• The transport vesicles carry
proteins from one compartment
to the other
• Exocytosis is mediated by fusion
of secretory vesicles with the
plasma membrane.
• Endocytosis is the opposite of
exocytosis and involves the
uptake of extracellular material
by pinching off vesicles from the
plasma membrane
• The contents of the endocytic
vesicles are delivered to the
lysosomes by membrane fusion
• Lysosomes contain hydrolytic
enzymes that breakdown
macromolecules into the smaller
subunits which can be utilized by
the cell for its own biosynthesis
Copyright 2003 limsoon wong
Datasets
• Reinhartdt & Hubbard, NAR, 26:2230--2236, 1998
– 2427 eukaryotic proteins for 4 locations (cytoplasmic, extracellular, nuclear,&
mitochondrial)
– 997 prokaryotic proteins for 3 locations (cytoplasmic, extracellular, &
periplasmic)
• Park & Kanehisa, Bioinformatics, 19:1656--1663, 2003
– 7589 eukaryotic proteins from 709 organisms for 12 locations
(chloroplast, cytoplasmic, cytoskeleton, ER, extracellular, golgi, lysosomal,
mitochondrial, nuclear, peroxisomal, plasma membrane, vacuolar)
• Chou & Cai, JBC., 277:45765--45769, 2002
– 2191 proteins for 12 locations
• Emanuelsson et al., JMB, 300:1005--1016, 2000
• Gardy et al., NAR, 31:3613--3617, 2003
Copyright 2003 limsoon wong
Common Eukaryotic Protein
Sorting Signals
For a comprehensive list of cellular localization sites, see
http://mendel.imp.univie.ac.at/CELL_LOC/index.html
Copyright 2003 limsoon wong
~25aa
Schematic
View of
Sorting
Signals
cleavage site
Copyright 2003 limsoon wong
SP
signal peptide
Sequence Logos of
SP, mTP, & cTP
mTP
mitochondrial
transfer peptide
cTP
chloroplast
transit peptide
Copyright 2003 limsoon wong
Neural Network Approach: TargetP
Emanuelsson et al., JMB, 300:1005--1016, 2000
• cTP, mTP, SP
– 4 hidden units
– feedforward NNs
– input windows:
• 55aa (cTP), 35aa
(mTP), 27aa (SP)
• sparsely encoded
• Integrating Network
– 0 hidden unit
– feedforward NN
– input is taken from the
outputs of cTP, mTP, SP
networks over 100aa at
N-terminal
cTP: chloroplast transit peptide,
mTP: mitochondria transfer peptide, SP: signal peptide
Copyright 2003 limsoon wong
TargetP:
Performance
Dataset: Emanuelsson et al., JMB, 2000
Copyright 2003 limsoon wong
Expert System Approach: PSORT
Horton & Nakai, ISMB, 1997
A simplified
version of the
decision tree that
PSORT uses to
check and reason
over various
sorting signals
Copyright 2003 limsoon wong
A Refinement: PSORT-B
Gardy et al., NAR, 31:3613--3617, 2003
• Sites considered
–
–
–
–
–
Localization sites
cytoplasm
or “unknown”
inner membrane
periplasm
Bayesian
outer membrane
Network
extracellular space
SCLMotifs
BLAST
HMMTOP
Outer
Signal
Membrane SubLocC
Peptides
Protein
Copyright 2003 limsoon wong
PSORT-B:
SCL-BLAST
• Homology to a protein of
known localization is good
indicator of a protein’s
actual localization site
BLAST target protein
against a database of
proteins whose localization
sites are known
Return localization sites of
hits at E-value of 10e-10
over 80% of length
Copyright 2003 limsoon wong
PSORT-B:
Motifs
• Some motifs in PROSITE
may be able to identify
subcellular localization with
100% precision
Scan target protein against
a database of such motifs
(28 such 100%-precision
motifs are known)
Return localization sites
corresponding to the motif
hits
Copyright 2003 limsoon wong
PSORT-B:
HMMTOP
• -helical transmembrane
region is reliable indicator
of localization to inner
membrane
Scan target protein for
transmembrane helices
using HMMTOP
Return localization site as
“inner membrane” if >2
helices found
Copyright 2003 limsoon wong
PSORT-B:
Outer Membrane Proteins
• Outer-membrane proteins
have characteristics barrel structure
Identify freq seq occurring
only in -barrel proteins
(279 such freq seq known)
Scan target protein for
these freq seq
Return localization site as
“outer membrane” if >2
such freq seq found
Copyright 2003 limsoon wong
PSORT-B:
SubLocC
• Overall amino acid
composition is useful for
recognizing cytoplasmic
proteins
Trained SVM on overall
amino acid composition to
predict cytoplasmic vs noncytoplasmic, as in SubLoc
Analyze target protein’s
amino acid composition
using this SVM
Copyright 2003 limsoon wong
PSORT-B:
Signal Peptides
• Presence of signal peptide at Nterminal means protein not
cytoplasmic
Train HMM and SVM to recognize
signal peptides and their cleavage
sites
If high-confidence cleavage site
found by HMM in first 70aa of target
protein, then “non-cytoplasmic”
If low-confidence cleavage site found,
pass candidate signal peptide to SVM
to confirm
If confirmed, then “non-cytoplasmic”
Otherwise, “unknown”
Copyright 2003 limsoon wong
PSORT-B:
Bayesian Network
• Bayesian Network integrates
results from the 6 modules
• Produces a score for each of
the 5 possible localization
sites
• If a site scores >7.5, then
predicts as a localization site
of the target protein
• If no site scores >7.5, then
makes no prediction
Copyright 2003 limsoon wong
PSORT-B:
Performance of Individual Modules
Dataset: Gardy et al., NAR, 2003
Copyright 2003 limsoon wong
PSORT-B:
Performance wrt Localization Sites
PSORT-B is a considerable improvement over original PSORT
Dataset: Gardy et al., NAR, 2003
Copyright 2003 limsoon wong
PSORT vs PSORT-B:
Some Remarks
• PSORT considers various signal/features
in a top-down way driven by its reasoning
tree
• PSORT-B generates all signal/features in
a bottom-up way, then integrate them for
decision making using Bayesian Network
• Machine learning “beats” human expert?
Probably the number of features/rules
needed is too much/complicated
Copyright 2003 limsoon wong
Amino acid
composition
of proteins
residing in
different
sites are
different
Copyright 2003 limsoon wong
Amino Acid Composition Differences
• each cellular location
• If the above is true,
has own characteristic
the amino acid
physio-chemical
composition
environment
differences wrt
cellular location sites
• proteins in each
should be more
location have adapted
pronounced on
thru evolution to that
protein surfaces than
environment
protein interior
• thus reflected in the
• Exercise: Why?
protein structure and
amino acid composition
Copyright 2003 limsoon wong
Adaptation of Protein Surfaces
Andrade et al., JMB, 1998
• To test the theory
of adaptation of
protein surfaces
to subcellular
localization, we
do a plot of 3
types of
composition
vectors along
their first two
principal
components
Proportion of
jth amino acid
type in ith protein
Copyright 2003 limsoon wong
Adaptation of Protein Surfaces
Andrade et al., JMB, 1998
Total amino acid
composition vector
Surface amino acid
composition vector
• Clearly total & surface
composition vectors show
better separation than interior
composition vectors
Interior amino acid
composition vector
Copyright 2003 limsoon wong
Amino Acid Composition
• This means can use amino acid
composition vectors, especially those
from protein surfaces, to predict
subcellular localization!
• Let’s see how this turn out….
Copyright 2003 limsoon wong
Neural Networks: NNPSL
Reinhardt & Hubbard, NAR, 26:2230--2236, 1998
Input1
fraction of
each amino
acid in
the input
protein
cytoplasmic
extracellular
mitochodrial
nuclear
Input20
Copyright 2003 limsoon wong
NNPSL:
Performance
• Outputs NNPSL
have values 0 to 1.
The difference ()
between the
highest and the
next highest nodes
can be used as a
reliability index
0 < < 0.2
0.2 < < 0.4
0.4 < < 0.6
0.6 < < 0.8
0.8 < < 1
Dataset:
Reinhardt & Hubbard,
NAR, 1998
Copyright 2003 limsoon wong
Performance
Emanuelsson, BIB, 3:361--376, 2002
(940 proteins)
(2738 proteins)
Dataset: Emanuelsson et al., JMB, 2000
Copyright 2003 limsoon wong
Markov Chain
Yuan, FEBS Letters, 451:23--26, 1999
Why?
Copyright 2003 limsoon wong
Markov Chain:
Performance
(Eukaryotic)
NNPSL
4th Order Markov
Dataset:
Reinhardt & Hubbard,
NAR, 1998
Copyright 2003 limsoon wong
Support Vector Machines: SubLoc
Hua & Sun, Bioinformatics, 17:721--728, 2001
SVM
nuclear
vs rest
20-dimensional
vector giving amino
acid composition
of the input protein
SVM
mitochondrial
vs rest
SVM
extracellular
vs rest
SVM
cytoplasmic
vs rest
ArgmaxX X-vs-rest
The SVMs use
• polynomial kernel with d = 9 (prokaryotic),
K(Xi,Xj) = (Xi ·Xj + 1)d
• RBF kernel with =16 (eukaryotic),
K(Xi, Xj) = exp(- |Xi - Xj|2
Copyright 2003 limsoon wong
SubLoc:
Performance
NNPSL
SubLoc
(Eukaryotic)
Dataset: Reinhardt & Hubbard, NAR, 1998
Copyright 2003 limsoon wong
SubLoc: Robustness
of
Amino Acid Composition Approach
• Amazingly, accuracy of SubLoc is virtually unaffected
when the first 10, 20, 30, & 40 amino acids in a protein
are deleted
• Amino acid composition is a robust indicator of
subcellular localization, and is insensitive to errors in
N-terminal sequences
Copyright 2003 limsoon wong
Amino Acid Composition:
Taking it Further
• How about pairs of consecutive amino
acids? (a.k.a 2-grams) How about 3grams, …, k-grams?
• How about pseudo amino acid
composition?
• How about presence of entire functional
domains? (I.e. think of the presence/absence of a
functional domain as a summary of amino acid
sequence info...)
Copyright 2003 limsoon wong
Functional Domain Composition
Chou & Cai, JBC, 277:45765--45769, 2002
Training seqs of
various localization
sites
Train SVM
using these vectors
xi = 1 means ith
domain is present
BLAST against
db of known
functional domains
(SBASE-A)
+
amino
acid
composition
Copyright 2003 limsoon wong
Functional Domain Composition:
Performance
Dataset: Reinhardt & Hubbard, NAR, 1998
• Not so good
• Why?
Number of known domains in SBASE-A too small
Need to handle situation where a protein has no
hit in known domains
Copyright 2003 limsoon wong
Functional Domain Composition
Cai & Chou, BBRC, 305:407--411, 2003
If a protein got a hit in Interpro,
use NN-5875D; else use NN-40D
Training seqs of
various localization
sites
BLAST against
db of known
functional domains
(Interpro)
NN-5875D:
NN-40D:
Train k-NN (k=1)
using these vectors
Train k-NN (k=1)
using these vectors
or, if no
hit found
Amino
acid
composition
Pseudo amino
acid composition
Copyright 2003 limsoon wong
Functional Domain Composition:
Performance
Dataset: Reinhardt & Hubbard, NAR, 1998
Copyright 2003 limsoon wong
Notes
Copyright 2003 limsoon wong
References (Transmembrane)
• Wiess et al. “Transmembrane segment prediction from
protein sequence data”, ISMB, 420--421, 1993
• Gavel et al. “The positive-inside rule applies to thylakoid
membrane proteins”, FEBS 282:41--46, 1991
• Monne et al. “A turn propensity scale for transmembrane
helices”, JMB, 288:141--145, 1999
• Sonnhammer et al. “A hidden Markov model for predicting
transmembrane helices in protein sequences”, ISMB,
6:175--182, 1998
• Martelli et al. “An ENSEMBLE machine learning approach
for the prediction of all-alpha membrane proteins”,
Bioinformatics, 19(suppl):i205--i211, 2003
Copyright 2003 limsoon wong
References (Transmembrane)
• Von Heijne. “Membrane protein structure prediction”, JMB,
225: 487--494, 1992
• Jacoboni et al. “Prediction of the transmembrane regions of
beta-barrel membrane proteins with a neural networkbased predictor”, Protein Sci., 10:779--787, 2001
• Martelli et al. “a sequence-profile-based HMM for predicting
and discriminating beta barrel membrane proteins”,
Bioinformatics, 18:S46--S53, 2002
• Moller et al. “Evaluation of methods for the prediction of
membrane spanning regions”, Bioinformatics, 17:646--653,
2001
• Fariselli et al. “MaxSubSeq: an algorithm for segmentlength optimization. The case study of the transmembrane
spanning segments”, Bioinformatics, 19:500--505, 2003
Copyright 2003 limsoon wong
References (Transmembrane)
• Rost et al. “Transmembrane helices predicted at 95%
accuracy”, Protein Sci., 4:521--533, 1995
• Krogh et al. “Predicting transmembrane protein topology
with a hidden Markov model: Application to complete
genomes”, JMB, 305:567--580, 2001
• Andersson et al. “Different positively charged amino acids
have similar effectson the topology of a polytopic
transmembrane protein in E. coli”, JBC, 267:1491--1495,
1992
Copyright 2003 limsoon wong
References (Subcellular Localization)
• Horton & Nakai, “Better prediction of protein cellular
localization sites with the k-nearest neighbours
classifier”, ISMB, 5:147--152, 1997
• Gardy et al., “PSORT-B: Improving protein subcellular
localization for Gram-negative bacteria”, NAR,
31:3613--3617, 2003
• Emanuelsson, “Predicting protein subcellular
localization from amino acid sequence information”,
BIB, 3:361--376, 2002
• Andrade et al., “Adaptation of protein surfaces to
subcellular location”, JMB, 276:517--525, 1998
• Yuan, “Prediction of protein subcellular locations using
Markov chain models”, FEBS Letters, 451:23--26, 1999
Copyright 2003 limsoon wong
References (Subcellular Localization)
• Emanuelsson et al., “ChloroP, a neural network-based
method for predicting chloroplast transit peptides and
their cleavage sites”, Protein Sci., 8:978--984, 1999
• Emanuelsson et al., "Predicting subcellular localization
of proteins based on their N-terminal amino acid
sequence", JMB, 300:1005-1016, 2000
• Hua & Sun, “Support vector machine approach for
protein subcellular localization prediction”,
Bioinformatics, 17:721--728, 2001
• Reinhardt & Hubbard, “Using neural networks for
prediction of the subcellular location of proteins”, NAR,
26:2230--2236, 1998
Copyright 2003 limsoon wong
References (Subcellular Localization)
• Cai & Chou, “Nearest neighbour algorithm for
predicting protein subcellular location by combining
functional domain composition and pseudo-amino acid
composition”, BBRC, 305:407--411, 2003
• Chou & Cai, “Using functional domain composition and
support vector machines for prediction of protein
subcellular location”, JBC, 277:45765--45769, 2002
• Park & Kanehisa, “Prediction of protein subcellular
locations by support vector machines using
compositions of amino acids and amino acid pairs”,
Bioinformatics, 19:1656--1663, 2003
Copyright 2003 limsoon wong