Artificial Neural Networks for Secondary Structure Prediction

Download Report

Transcript Artificial Neural Networks for Secondary Structure Prediction

Artificial Neural Networks for
Secondary Structure Prediction
CSC391/691 Bioinformatics
Spring 2004
Fetrow/Burg/Miller
(slides by J. Burg)
Artificial Neural Networks





A problem-solving paradigm modeled after the
physiological functioning of the human brain.
Synapses in the brain are modeled by
computational nodes.
The firing of a synapse is modeled by input, output,
and threshold functions.
The network “learns” based on problems to which
answers are known (in supervised learning).
The network can then produce answers to entirely
new problems of the same type.
Applications of
Artificial Neural Networks




speech recognition
medical diagnosis
image compression
financial prediction
Existing Neural Network Systems for
Secondary Structure Prediction
First systems were about 62% accurate.
 Newer ones are about 70% accurate when
they take advantage of information from
multiple sequence alignment.
 PHD
 NNPREDICT
(web links given in your book)

Applications in Bioinformatics



Translational initiation sites and promoter
sites in E. coli
Splice junctions
Specific structural features in proteins such
as α-helical transmembrane domains
Neural Networks Applied to
Secondary Structure Prediction




Create a neural network (a computer program)
“Train” it uses proteins with known secondary
structure.
Then give it new proteins with unknown structure
and determine their structure with the neural
network.
Look to see if the prediction of a series of residues
makes sense from a biological point of view – e.g.,
you need at least 4 amino acids in a row for an αhelix.
Example Neural Network
Training pattern
One of n inputs, each with 21 bits
From Bioinformatics by David W. Mount, p. 453
Inputs to the Network







Both the residues and target classes are encoded in
unary format, for example
Alanine: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cysteine: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Helix: 1 0 0
Each pattern presented to the network requires n 21-bit
inputs for a window of size n. (One bit is required per
residue to indicate when the window overlaps the end of
the chain).
The advantage of this sparse encoding scheme is that it
does not pay attention to ordering of the amino acids
The main disadvantage is that it requires a lot of input.
Weights





Input values at each layer are multiplied by weights.
Weights are initially random.
Weights are adjusted after the output is computed
based on how close the output is to the “right”
answer.
When the full training session is completed, the
weights have settled on certain values.
These weights are then used to compute output for
new problems that weren’t part of the training set.
Neural Network Training Set




A problem-solving paradigm modeled after the
physiological functioning of the human brain.
A typical training set contains over 100 nonhomologous protein chains comprising more than
15,000 training patterns.
The number of training patterns is equal to the total
number of residues in the 100 proteins.
For example, if there are 100 proteins and 150
residues per protein there would be 15,000 training
patterns.
Neural Network Architecture




A typical architecture has a window-size of n and 5
hidden layer nodes.*
Then a fully-connected would be 17(21)-5-3
network, i.e. a net with an input window of 17, five
hidden nodes in a single hidden layer and three
outputs.
Such a network has 357 input nodes and 1,808
weights.
((17 * 21) * 5) + (5 * 3) + 5 + 3 = 1808?
*This information is adapted from “Protein Secondary Structure Prediction with
Neural Networks: A Tutorial” by Adrian Shepherd (UCL),
http://www.biochem.ucl.ac.uk/~shepherd/sspred_tutorial/ss-index.html.)
Window



The n-residue window is moved across the
protein, one residue at a time.
Each time the window is moved, the center
residue becomes the focus.
The neural network “learns” what secondary
structure that residue is a part of. It keeps
adjusting weights until it gets the right answer
within a certain tolerance. Then the window
is moved to the right.
Artificial Neuron (aka “node”)
ai = g(ini)
aj
Wi,j
Input
Links
ini

g (s j )
Input
Trigger
Function Function
aj
Output
Trigger Function



Each hidden layer node sums its weighted inputs and “fires”
an output accordingly.
A simple trigger function (called a threshold function): send 1
to the output if the inputs sum to a positive number; otherwise,
send 0.
The sigmoid function is used more often:
1
(1  e


 k *s j
)
sj is the sum of the weighted inputs.
As k increases, discrimination between weak and strong
inputs increases.
Adjusting Weights With
Back Propagation




The inputs are propagated through the
system as described above.
The outputs are examined and compared to
the right answer.
Each weight is adjusted according to its
contribution to the error.
See page 455 of Bioinformatics by Mount.
Refinements or Variations of
Method

Use more biological information

See
http://www.biochem.ucl.ac.uk/~shepherd/sspr
ed_tutorial/ss-pred-new.html#beyond_bioinf
Predictions Based on Output


Predictions are made on a winner-takes-all
basis.
That is, the prediction is determined by the
strongest of the three outputs. For example,
the output (0.3, 0.1, 0.1) is interpreted as a
helix prediction.
Performance Measurements

How do you know if your neural network performs
well?



Test it on proteins that are not included in the training set
but whose structure is known.
Determine how often it gets the right answer.
What differentiates one neural network from
another?


Its architecture – whether or not it has hidden layers, how
many nodes are used.
Its mathematical functions – the trigger function, the backpropagation algorithm.
Balancing Act in
Neural Network Training



The network should NOT just memorize the
training set.
The network should be able to generalize
from the training set so that it can solve
similar but not identical problems.
It’s a matter of balancing the # of training
patterns vs. # network weights vs. # hidden
nodes vs. # of training iterations
Disadvantages to
Neural Networks

They are black boxes. They cannot explain
why a given pattern has been classified as x
rather than y. Unless we associate other
methods with them, they don’t tell us anything
about underlying principles.
Summary


Perceptrons (single-layer neural networks)
can be used to find protein secondard
structure, but more often feed-forward multilayer networks are used.
Two frequently-used web sites for neuralnetwork-based secondary structure prediction
are PHD (http://www.emblheidelberg.de/predictprotein/predictprotein.html ) and
NNPREDICT
(http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html)