Transcript Document
Protein Folding
Atlas F. Cook IV & Karen Tran
1
Overview
What is Protein Folding?
Motivation
Experimental Difficulties
Simulation Models:
Configuration Spaces
Triangular Lattice models
Probabilistic Roadmaps
Pull Moves
Map-Based Master Equation (MME)
Map-Based Monte Carlo (MMC)
Conclusion
2
Motivation
What is protein folding?
Folding/Morphing process
1D Amino Acid Chain 3D Folded protein
3
Motivation
Why study protein folding?
Proteins regulate almost all cellular functions
1D chain dictates 3D shape (NP-Hard)
3D Shape determines protein’s function
1D amino acid chain
3D folded protein
4
Motivation
Holy grail of Protein Folding
Build amino acid chain that:
folds into a desired shape
and has a nice function (e.g., kill cancer cells)
How would we do this?
Kill
Cancer
Cells
Required Amino Acid Chain
Required Shape
Desired Function
5
Motivation
Another reason to study protein folding:
Unfolded protein = vulnerable protein
6
Motivation
Misfolded proteins cause diseases:
Alzheimer’s
Mad Cow
Parkinson’s
Understand protein folding cure diseases!
7
Terminology
Primary Structure
Secondary Structure
1D Amino Acid Chain (string)
MGDVEKGKKIFIMKCSQCH
Local patterns in a global folding
Helices and Strands
Tertiary Structure
Global 3D folded shape
8
Experimental Difficulties
Levinthal Paradox
Why is folding so fast?
Unfolded protein = vulnerable protein
Experimental observation
Exponentially many ways to fold, yet
folding occurs rapidly (milliseconds to seconds)
Too slow to capture all significant motions
Our Goal:
Simulate protein folding on computer!
9
Simulation Models
HP Lattice Model: [Böckenhauer08]
HP = Hydrophobic-Polar
Models forces between Hydrophobic amino acids
10
Simulation Models
HP Lattice Model: [Böckenhauer08]
Amino acid vertex in a grid
Protein self-avoiding chain in a grid
Amino Acid
Chain
11
Simulation Models
HP Lattice Model: [Böckenhauer08]
Spring-like forces are modeled between
neighboring amino acids.
Sum of forces for a state Energy.
+3
+2
1
+2
1
+1
+10
2
4
+0 3
+2
+1
5
2
+2
5
Energy = 16
4
+1
Energy = 8
3
12
Simulation Models
HP Lattice Model: [Böckenhauer08]
Global min energy
“native state” = final folded state
Native state is stable.
Global minimum is MUCH smaller than local minima.
+2
1
+1
5
2
+2
+2
Global min Energy = 8
4
+1
3
13
Simulation Models
HP Lattice Model: [Böckenhauer08]
A state is defined by the position of every amino
acid in the chain
A State
Another State
14
Simulation Models
HP Lattice Model: [Böckenhauer08]
Configuration space = set of all possible states
Exponential to protein length
Protein folding simulation:
“Move” from start state goal state.
15
Simulation Models
HP Lattice Model: [Böckenhauer08]
Move Properties:
Complete – moves can reach all feasible states
Reversible – every move has an inverse
16
Simulation Models
HP Lattice Model: [Böckenhauer08]
Forward Pull Move
Pull vertex 5 to a new position
Before move
After move
17
Simulation Models
HP Lattice Model: [Böckenhauer08]
Tabu Search
Greedy, heuristic search
Simulates protein folding
Pull moves transform start state local minimum
Records recent moves in a Tabu list
Fast backtracking to different paths
Summary of HP Lattice Model:
Input: Amino acid sequence
Output: Heuristically folded protein
18
Probabilistic
Roadmap Model
19
Simulation Models
Probabilistic Roadmap [Song04]
2D Graph (Configuration space):
Each point represents an entire state (all amino acids).
Obstacles are infeasible states
20
Simulation Models
Probabilistic Roadmap [Song04]
Goal:
Given start & goal states
Find “best path” from start goal
21
Simulation Models
Probabilistic Roadmap [Song04]
3 Steps:
1. Node generation:
Generate points randomly (dense near the goal state)
2. Roadmap Construction
Connect nearest neighbors graph
3. Query roadmap
Dijkstra’s algorithm shortest path
Shortest path = set of states
Describes the dynamic folding process
22
Simulation Models
Probabilistic Roadmap [Song04]
1. Node generation:
Generate random points
“Obstacles” are infeasible (self-overlapping) states
23
Simulation Models
Probabilistic Roadmap [Song04]
2. Roadmap Construction
Connect nearest neighbors graph
24
Simulation Models
Probabilistic Roadmap [Song04]
3. Query roadmap
Dijkstra’s algorithm shortest path
Path = set of states that describes the folding process
25
Molecular
Dynamics Model
26
Simulation Models
Molecular Dynamics Models [Tapia07]
Model forces based on Newton’s laws of motion
Very accurate
Very slow!
Simulating one microsecond of folding for a 36 residue
protein = Months of supercomputer time!
Cannot handle full length proteins
27
Simulation Models
Map-based Master Equation (MME) [Tapia07]
Fast enough to study full length proteins
More accurate than simplistic lattice models
MME is an extension of a Probabilistic Roadmap
Probabilistic roadmap ≈ Viterbi algorithm
returns one optimal path
MME ≈ Baum-Welch algorithm
Maintains transition probabilities for every state
Learning is executed until probabilities stabilize.
Can return the probability of any state at time t.
28
Simulation Models
Map-based Monte-Carlo (MMC) [Tapia07]
MMC = Probabilistic Roadmap + Monte-Carlo
Monte-Carlo [Wiki08_MC]
random sampling + algorithms = result
Example: Battleship
Make random shots
Apply prior knowledge
Battleship = 4 vertical/horizontal dots
Apply algorithms to quickly sink the ship
29
Simulation Models
Map-based Monte-Carlo (MMC) [Tapia07]
Fast & reasonably accurate
Models the protein as an articulated figure
Each joint = set of angles
Movement-based (kinetic) statistics
Results suggest that:
Local helix structures form first
Folding occurs around hydrophobic core
30
Conclusion
Protein Folding:
1D Amino acid chain folds into 3D structure
Misfolding Alzheimer’s, Parkinson’s, Mad Cow diseases
Folding is too fast to observe experimentally
Four Simulation Models:
1.
Triangular Lattice model (2D Graph)
Vertex = one amino acid
“Moves” transition between states
31
Conclusion
Four Simulation Models (cont.)
2.
Probabilistic Roadmaps
Vertex represents state of entire protein
Random sampling + Dijkstra’s alg Best folding route
≈ Viterbi (returns one path)
3.
Map-Based Master Equation (MME)
4.
Learn probabilities
≈ Baum-Welch (confidence level for each state)
Map-Based Monte Carlo (MMC)
Articulated figures with joints model proteins
32
References:
[Böckenhauer08]
Hans-Joachim Böckenhauer, Abu Zafer M. Dayem
Ullah, Leonidas Kapsokalivas, and Kathleen
Steinhöfel. A local move set for protein folding in
triangular lattice models. In Keith A. Crandall and
Jens Lagergren, editors, WABI, volume 5251 of
Lecture Notes in Computer Science, pages 369–381.
Springer, 2008.
[Dobson99]
C. Dobson and M. Karplus. The fundamentals of
protein folding: bringing together theory and
experiment. Current Opinion in Structural Biology,
9:928–101, 1999.
33
References:
[Song04]
G. Song and N. M. Amato. A motion planning
approach to folding: From paper craft to protein
folding. Proc. IEEE Transactions on Robotics and
Automatics, 20:60–71, 2004.
[Tapia07]
Lydia Tapia, Xinyu Tang, Shawna Thomas, and
Nancy M. Amato. Kinetics analysis methods for
approximate folding landscapes. Bioinformatics,
23(13):i539–i548, 2007.
34
References:
[˘Sali94]
[Wiki08]
A., E. Shakhnovich, and M. Karplus. How does a
protein fold? Nature, 369:248–251, 1994.
Wikipedia. Protein folding — Wikipedia, the free
encyclopedia, 2008.
http://en.wikipedia.org/wiki/Protein_folding.
[Wiki08_MC]
Wikipedia. Monte-Carlo method — Wikipedia, the
free encyclopedia, 2008.
http://en.wikipedia.org/wiki/Monte_Carlo_method
35
Thank you for
your attention.
Questions
36
Extra Slides
37
Simulation Models
Map-based Master Equation (MME) [Tapia07]
MME = Probabilistic roadmap + Master Equation
Master Equation – set of equations defining the
probability of a system to be in a discrete set of
states at a given time.
38