Discovery of RNA Structural Elements Using Evolutionary
Download
Report
Transcript Discovery of RNA Structural Elements Using Evolutionary
Discovery of RNA Structural Elements
Using Evolutionary Computation
Authors: G. Fogel, V. Porto, D. Weekes, D.
Fogel, R. Griffey, J. McNeil, E. Lesnik, D.
Ecker, R. Sampath,
Natural Selection Inc. and Ibis Therapeutics
Presenter: Elena Zheleva
April 2, 2004
Introduction
Problem Statement
Background
Evolutionary Computation
Population initialization
Variation
Fitness
Selection
Results
Conclusion
Problem Statement
Computational Biology problem: given a
RNA secondary structure description, search
for similar secondary structures
Currently, exhaustive search techniques are
used to narrow down search space
Authors focus on presentation and set of
operators to search via evolution
Outline
Problem Statement
Background
Evolutionary Computation
Population initialization
Variation
Fitness
Selection
Results
Conclusion
Background
RNA (ribonucleic acid)
directs middle steps of
protein production
single-stranded, certain
parts are folded
RNA Secondary
Structure - accounts
for diverse functional
activities
Background
RNA Secondary Structure:
Recurs in multiple genes within a single
organism
Recurs in across the same gene in several
organism
Why a computational tool for RNA
secondary structure search?
Discover new structures
Improve understanding of functional and
regulatory relationships amongst related RNAs
Background – RNAMotif
RNAMotif: mines nucleotide sequence databases
for repeating structure motifs
RNAMotif Input: descriptor contains details about
pairing information, length, sequence
Background - RNAMotif
RNAMotif Output: list of real structures
RNAMotif may return a very high number of
motifs when descriptor is more flexible
Input to the EA: RNAMotif Output
Outline
Problem Statement
Background
Evolutionary Computation
Population initialization
Variation
Fitness
Selection
Results
Conclusion
Evolutionary Computation
Population Initialization
P parent bins
B – bin size
Bin = a contending solution
Each bin contains structures
from different organisms
Structures chosen at random
from RNAMotif Output file
Figure 1
Outline
Problem Statement
Background
Evolutionary Computation
Population initialization
Variation
Fitness
Selection
Results
Conclusion
Evolutionary Computation
Variation
P parent bins are copied to O offspring bins
Variables: operator, number of times to apply it
Variation Operator 1: structure replacement
within a specified organism
Replacement taken from RNAMotif Output File
Local – neighboring replacement structure
Global – random replacement structure
Example: P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus}
Evolutionary Computation
Variation
Variation Operator 2:
Structure replacement from different organisms
Variable: # of structures to be replaced
Example: # = 2
P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus}
O organisms = {H. Sapiens, C. Griseus, E. Coli, S. Scrofa}
Evolutionary Computation
Variation
Variation Operator 3: random single-point bin
recombination
Generates a second parent from RNAMotif output and
applies single-point bin recombination
Chooses randomly one of the two offsprings
Example: P = {H, S, E, G} P = {D, E, O, B}
O = {H, S, E, B} O = {D, E, O, G}
Variation Operator 4: random multi-point bin
recombination
Outline
Problem Statement
Background
Evolutionary Computation
Population initialization
Variation
Fitness
Selection
Results
Conclusion
Evolutionary Computation
Fitness
Fitness Function Scoring Components:
Structure nucleotide sequence similarity
Structure length similarity
Structure thermodynamic stability similarity
These measures are applied pairwise by each
structural component and summed into a
final bin score
Outline
Problem Statement
Background
Evolutionary Computation
Population initialization
Variation
Fitness
Selection
Results
Conclusion
Evolutionary Computation
Selection
Selection: For every bin in population,
A set of R rival bins is randomly selected
Calculate score = # rivals with lower fitness
Lower bins are removed
Iterations continue until number of
generations (G) or CPU time is satisfied, or
until expected change of fitness/gen 0
Outline
Problem Statement
Background
Evolutionary Computation
Population initialization
Variation
Fitness
Selection
Results
Conclusion
Results
Experiment 1:
8
7.6x10 possible
bins
Exhaustive search:
125 days
EA examined
4
~10 bins before
converging
< 3 minutes
Results
Results
To test the utility of this method
Run on newly discovered genomes (S. Pyogenes)
Compare to database which has an alignment for
this RNA secondary structure for previously
discovered genomes (S. Mutans)
Found similar sequence and structure to close
organisms
Outline
Problem Statement
Background
Evolutionary Computation
Population initialization
Variation
Fitness
Selection
Results
Conclusion
Conclusion
Evolutionary Algorithm can be applied to
find RNA secondary structures over a wide
range of organisms
Converges quickly and reliably
Algorithm comes up with a solution which
contains information about structural
elements for different organisms/genomes