Discovery of RNA Structural Elements Using Evolutionary

Download Report

Transcript Discovery of RNA Structural Elements Using Evolutionary

Discovery of RNA Structural Elements
Using Evolutionary Computation
Authors: G. Fogel, V. Porto, D. Weekes, D.
Fogel, R. Griffey, J. McNeil, E. Lesnik, D.
Ecker, R. Sampath,
Natural Selection Inc. and Ibis Therapeutics
Presenter: Elena Zheleva
April 2, 2004
Introduction
 Problem Statement
 Background
 Evolutionary Computation




Population initialization
Variation
Fitness
Selection
 Results
 Conclusion
Problem Statement
 Computational Biology problem: given a
RNA secondary structure description, search
for similar secondary structures
 Currently, exhaustive search techniques are
used to narrow down search space
 Authors focus on presentation and set of
operators to search via evolution
Outline
 Problem Statement
 Background
 Evolutionary Computation




Population initialization
Variation
Fitness
Selection
 Results
 Conclusion
Background
 RNA (ribonucleic acid)


directs middle steps of
protein production
single-stranded, certain
parts are folded
 RNA Secondary
Structure - accounts
for diverse functional
activities
Background
 RNA Secondary Structure:


Recurs in multiple genes within a single
organism
Recurs in across the same gene in several
organism
 Why a computational tool for RNA
secondary structure search?


Discover new structures
Improve understanding of functional and
regulatory relationships amongst related RNAs
Background – RNAMotif
 RNAMotif: mines nucleotide sequence databases
for repeating structure motifs
 RNAMotif Input: descriptor contains details about
pairing information, length, sequence
Background - RNAMotif
 RNAMotif Output: list of real structures

 RNAMotif may return a very high number of
motifs when descriptor is more flexible
 Input to the EA: RNAMotif Output
Outline
 Problem Statement
 Background
 Evolutionary Computation




Population initialization
Variation
Fitness
Selection
 Results
 Conclusion
Evolutionary Computation
Population Initialization




P parent bins
B – bin size
Bin = a contending solution
Each bin contains structures
from different organisms
 Structures chosen at random
from RNAMotif Output file
Figure 1
Outline
 Problem Statement
 Background
 Evolutionary Computation




Population initialization
Variation
Fitness
Selection
 Results
 Conclusion
Evolutionary Computation
Variation
 P parent bins are copied to O offspring bins
 Variables: operator, number of times to apply it
 Variation Operator 1: structure replacement
within a specified organism

Replacement taken from RNAMotif Output File
Local – neighboring replacement structure
Global – random replacement structure

Example: P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus}


Evolutionary Computation
Variation
 Variation Operator 2:



Structure replacement from different organisms
Variable: # of structures to be replaced
Example: # = 2
P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus}
O organisms = {H. Sapiens, C. Griseus, E. Coli, S. Scrofa}
Evolutionary Computation
Variation
 Variation Operator 3: random single-point bin
recombination



Generates a second parent from RNAMotif output and
applies single-point bin recombination
Chooses randomly one of the two offsprings
Example: P = {H, S, E, G} P = {D, E, O, B}
O = {H, S, E, B} O = {D, E, O, G}
 Variation Operator 4: random multi-point bin
recombination
Outline
 Problem Statement
 Background
 Evolutionary Computation




Population initialization
Variation
Fitness
Selection
 Results
 Conclusion
Evolutionary Computation
Fitness
 Fitness Function Scoring Components:



Structure nucleotide sequence similarity
Structure length similarity
Structure thermodynamic stability similarity
 These measures are applied pairwise by each
structural component and summed into a
final bin score
Outline
 Problem Statement
 Background
 Evolutionary Computation




Population initialization
Variation
Fitness
Selection
 Results
 Conclusion
Evolutionary Computation
Selection
 Selection: For every bin in population,


A set of R rival bins is randomly selected
Calculate score = # rivals with lower fitness
 Lower bins are removed
 Iterations continue until number of
generations (G) or CPU time is satisfied, or
until expected change of fitness/gen  0
Outline
 Problem Statement
 Background
 Evolutionary Computation




Population initialization
Variation
Fitness
Selection
 Results
 Conclusion
Results
 Experiment 1:
8


7.6x10 possible
bins
Exhaustive search:
125 days
 EA examined


4
~10 bins before
converging
< 3 minutes
Results
Results
 To test the utility of this method



Run on newly discovered genomes (S. Pyogenes)
Compare to database which has an alignment for
this RNA secondary structure for previously
discovered genomes (S. Mutans)
Found similar sequence and structure to close
organisms
Outline
 Problem Statement
 Background
 Evolutionary Computation




Population initialization
Variation
Fitness
Selection
 Results
 Conclusion
Conclusion
 Evolutionary Algorithm can be applied to
find RNA secondary structures over a wide
range of organisms
 Converges quickly and reliably
 Algorithm comes up with a solution which
contains information about structural
elements for different organisms/genomes