- Cal State LA - Instructional Web Server

Download Report

Transcript - Cal State LA - Instructional Web Server

“Genetic algorithm-based optimization of
hydrophobicity tables” by Moti Zviling, Hadas
Leonov and Isaiah T. Arkin
presented by:
Nam Tonthat
Background: Membrane Proteins
●
●
●
●
●
constitute 20 to 35% of a genome
part of the signal transduction pathway
major target for pharmaceutical agents
hard to crystallize due to their hydrophobic
segments
importance of membrane protein promotes effort
to predict their presence with sequence
information
Background: Hydropathy Analysis
●
Kyte & Doolittle used hydropathy analysis to
predict the trans-membrane regions of
bacteriorhodopsin (1982)
●
developed the hydrophobicity scale
●
scale was improved upon by using
–
the water-vapor free energy transfer
–
interior-exterior distribution of amino acids
–
free energy of amino acids when transferred
from water to oil
Background: Predicting with HMM &
NN
●
●
●
Hidden Markov models and neural networks are
popular because of their probabilistic nature
generally have a higher level of accuracy in
comparison to hydropathy analysis
HMM, NN, and GA are old concepts that have
recently moved from theory and implemented in
biological research
Background: Genetic Algorithm (GA)
●
Origin: John Holland and his colleague at the University of Michigan
●
a search technique used to find approximate solutions
●
use techniques inspired by evolutionary biology
–
●
inheritance, mutation, natural selection, and recombination
General algorithm:
–
Choose initial population
–
Evaluate the fitness of the population
–
Select the “best” individuals to reproduce
–
Apply crossover and mutation operator
–
stop if algorithm converges, else repeat
Goal
●
To show that by applying a genetic algorithm to
the existing hydrophobicty tables, they can
improve the success rate of hydropathy analysis
in predicting alpha helical membrane proteins.
Methods: Constructing the Datasets
●
●
consisted of alpha helical membrane and water
soluble proteins
selected proteins with unambiguous topology
assignment, so that the training set will not bias due
to an abundance of a certain topology
●
ratio of 1:3
●
Training Set=> 90%
●
–
learning set: 90%
–
validation set: 10%
Testing Set=> 10%
Methods: Matthew's Correlation
Coefficient
●
●
●
used as a measure of predictive power
ranges from -1 ≤ C ≤ 1
worst=> -1 , best=>1, random=>0
Methods: Genetic Algorithm Scheme
●
●
●
●
input: 2 hydrophobicity tables
–
Kyte-Doolittle scale (Kyte
and Doolittle, 1982)
–
Goldman-Engelman-Steitz
scale (Engelman et al.,
1986).
The 2 tables are then bred to
create 20 random tables.
Read in the dataset and create a
Final Testing Set (10%) and a
Learning Set (90%)
Learning set is partitioned into a
Training Set (90%) and
Methods: Genetic Algorithm Scheme
●
●
●
●
Each table is then evaluated
against the Training Set
Best 2 tables are chosen
Cross validation with the
Validation Set
Success: If the calculated C
value is greater than the C value
of the previous round
–
●
current 2 tables are used
Failure:
–
previous 2 tables will be
chosen
Methods: Genetic Algorithm Scheme
●
test for convergence:
–
no
●
–
the process will be
repeated for one more
generation
yes:
●
the algorithm will stop
●
select the best 200 tables
●
evaluate against the
Final Test Set
Methods: Population Generation
Process
●
crossing over event:
–
●
●
mutation event:
–
the number of mutation event
is picked randomly
–
±.05
who to replace:
–
●
the number of crossing over
and positions are picked
randomly
C < .5
rate of replacement:
–
20%-80%, replace with
randomized tables
Results
●
are statistically based methods better?
–
depends on the person testing
–
depends on the training and datasets
–
HMM, NN, and GA are only as good as the person
who wrote them
Sources
●
“Genetic Algorithm”. Wikipedia, the free encyclopedia. July 1, 2005.
<http://en.wikipedia.org/wiki/Genetic_algorithm>
●
“Introduction to Genetic Algorithms”, Matthew Wall. July 1, 2005.
<http://lancet.mit.edu/~mbwall/presentations/IntroToGAs/>
●
Moti Zviling, Hadas Leonov and Isaiah T. Arkin. “Genetic algorithm-based
optimization of hydrophobicity tables.” Bioinformatics Vol 21 no. 11 (2005):
2651-2656.