Protein Structure Prediction With Evolutionary Algorithms

Download Report

Transcript Protein Structure Prediction With Evolutionary Algorithms

Protein Structure Prediction
With Evolutionary Algorithms
Natalio Krasnogor,
William Hart,
Jim Smith,
David Pelta,
U of the West of England
Sandia National Laboratories
U of the West of England
Universidad de Granada
Presenter: Elena Zheleva
Introduction


Problem Description
Biology Background
–
–

Genetic Algorithm (GA) Design Factors
–
–
–


Protein Folding
HP Protein Folding Model
Encodings for Internal Coordinates
Potential Energy Formulation
Constraint Management
Methods and Results
Conclusion
Problem Description




Computational Biology open problem: protein
structure prediction
Genetic algorithms have been used in the
research literature
Authors analyze 3 algorithm parameters that
impact performance and behavior of GAs
Goal: make suggestions for future algorithm
design
Outline


Problem Description
Biology Background
–
–

GA Design Factors
–
–
–


Protein Folding
HP Protein Folding Model
Encodings for Internal Coordinates
Potential Energy Formulation
Constraint Management
Methods and Results
Conclusion
Protein Folding




Proteins: driving force behind all of the
biochemical reactions which make biology
work
Protein is an amino acid chain!
Amino acid chain -> Structure of a protein
Structure of a protein -> Function of a protein
Protein Folding



Protein Folding: connection
between the genome
(sequence) and what the
proteins actually do (their
function).
Currently, no reliable
computational solution for
protein folding (3D structure)
problem.
Chemistry, Physics, Biology, CS
Outline


Problem Description
Biology Background
–
–

GA Design Factors
–
–
–


Protein Folding
HP Protein Folding Model
Encodings for Internal Coordinates
Potential Energy Formulation
Constraint Management
Methods and Results
Conclusion
HP Protein Folding Model



Amino acid chains
(proteins) are represented
as connected beads on a
2D or 3D lattice
HP: hydrophobic –
hydrophilic property
Hydrophobic amino acids
can form a hydrophobic
core w/ energy potential
HP Protein Folding Model

Model adds energy value e to each pair of
hydrophobics that are adjacent on lattice AND
not consecutive in the sequence

Goal of GA: find low energy configurations!
Outline


Problem Description
Biology Background
–
–

GA Design Factors
–
–
–


Protein Folding
HP Protein Folding Model
Encodings for Internal Coordinates
Potential Energy Formulation
Constraint Management
Methods and Results
Conclusion
Encodings for Internal Coordinates
Proteins are represented using internal
coordinates (vs. Cartesian)
 Absolute vs. Relative encoding
 Absolute Encoding: specifies an absolute
direction
n-1
cubic lattice: {U,D,L,R,F,B}
 Relative Encoding: specifies direction relative
to the previous amino acid
cubic lattice: {U,D,L,R,F} n-1

Encodings for Internal Coordinates




Encoding impacts global search behavior of GA
Example: One-point Mutations
Relative Encoding:
FLLFRRLRLLR->
FLLFRFLRLLR
Absolute Encoding:
RULLURURULU->
RULLUULULDL
Outline


Problem Description
Biology Background
–
–

GA Design Factors
–
–
–


Protein Folding
HP Protein Folding Model
Encodings for Internal Coordinates
Potential Energy Formulation
Constraint Management
Methods and Results
Conclusion
Potential Energy Formulation

Problem: same energy but different potential
(Picture )

Augment energy function to allow a distancedependent hydrophobic-hydrophobic potential
(Formula)
Outline


Problem Description
Biology Background
–
–

GA Design Factors
–
–
–


Protein Folding
HP Protein Folding Model
Encodings for Internal Coordinates
Potential Energy Formulation
Constraint Management
Methods and Results
Conclusion
Constraint Management


Methods for penalizing infeasible conformations
Method 1: Consider only feasible conformations
–

Weakness: shortest path from one feasible
conformation to another may be very long
Method 2: Fixed Penalty Approach
–
Violations:


–
2 amino acids lying on the same lattice point
Lattice point at which there are 2 or more amino acids
Penalty per violation = 2*number of hydrophobics + 2
(any infeasible conformation has positive energy)
Outline


Problem Description
Biology Background
–
–

GA Design Factors
–
–
–


Protein Folding
HP Protein Folding Model
Encodings for Internal Coordinates
Potential Energy Formulation
Constraint Management
Methods and Results
Conclusion
Methods and Results




1-point and 2-point Mutation operators
1-point, 2-point and Uniform Crossover
operators
5 polymer sequences (< 50 amino acids)
Each run of GA: 200 generations
Methods and Results

Relative vs. Absolute Encoding
(Diagram )
Distribution of relative ranks on the 3 lattices
Methods and Results




Standard vs. Distant Energy
Does the modified energy potential improve the
search capabilities of the GA?
No significant difference on test sequences
A guess: there might be on longer sequences
Conclusion




GAs applied to Protein Structure Prediction
problem have 3 important factors to consider
Relative encoding is at least as good as
absolute encoding, in some cases much better
Modified energy potential does not improve
search capabilities of GA
The proposed constraint/penalty method
ensures feasibility of the optimal solution
PE (Post Exhibitum)
PE
PE
PE
PE