Transcript Document

Protein Folding
Atlas F. Cook IV & Karen Tran
1
Overview
What is Protein Folding?
Motivation
Experimental Difficulties
Simulation Models:




Configuration Spaces
Triangular Lattice models



Probabilistic Roadmaps




Pull Moves
Map-Based Master Equation (MME)
Map-Based Monte Carlo (MMC)
Conclusion
2
Motivation

What is protein folding?


Folding/Morphing process
1D Amino Acid Chain  3D Folded protein
3
Motivation

Why study protein folding?



Proteins regulate almost all cellular functions
1D chain dictates 3D shape (NP-Hard)
3D Shape determines protein’s function
1D amino acid chain
3D folded protein
4
Motivation
Holy grail of Protein Folding

Build amino acid chain that:




folds into a desired shape
and has a nice function (e.g., kill cancer cells)
How would we do this?
Kill
Cancer
Cells
Required Amino Acid Chain
Required Shape
Desired Function
5
Motivation

Another reason to study protein folding:

Unfolded protein = vulnerable protein
6
Motivation

Misfolded proteins cause diseases:




Alzheimer’s
Mad Cow
Parkinson’s
Understand protein folding  cure diseases!
7
Terminology

Primary Structure



Secondary Structure



1D Amino Acid Chain (string)
MGDVEKGKKIFIMKCSQCH
Local patterns in a global folding
Helices and Strands
Tertiary Structure

Global 3D folded shape
8
Experimental Difficulties

Levinthal Paradox



Why is folding so fast?


Unfolded protein = vulnerable protein
Experimental observation


Exponentially many ways to fold, yet
folding occurs rapidly (milliseconds to seconds)
Too slow to capture all significant motions
Our Goal:

Simulate protein folding on computer! 
9
Simulation Models

HP Lattice Model: [Böckenhauer08]


HP = Hydrophobic-Polar
Models forces between Hydrophobic amino acids
10
Simulation Models

HP Lattice Model: [Böckenhauer08]


Amino acid  vertex in a grid
Protein  self-avoiding chain in a grid
Amino Acid
Chain
11
Simulation Models

HP Lattice Model: [Böckenhauer08]


Spring-like forces are modeled between
neighboring amino acids.
Sum of forces for a state  Energy.
+3
+2
1
+2
1
+1
+10
2
4
+0 3
+2
+1
5
2
+2
5
Energy = 16
4
+1
Energy = 8
3
12
Simulation Models

HP Lattice Model: [Böckenhauer08]

Global min energy
“native state” = final folded state


Native state is stable.
Global minimum is MUCH smaller than local minima.

+2
1
+1
5
2
+2
+2
Global min Energy = 8
4
+1
3
13
Simulation Models

HP Lattice Model: [Böckenhauer08]

A state is defined by the position of every amino
acid in the chain
A State
Another State
14
Simulation Models

HP Lattice Model: [Böckenhauer08]

Configuration space = set of all possible states


Exponential to protein length
Protein folding simulation:

“Move” from start state  goal state.
15
Simulation Models

HP Lattice Model: [Böckenhauer08]

Move Properties:


Complete – moves can reach all feasible states
Reversible – every move has an inverse
16
Simulation Models

HP Lattice Model: [Böckenhauer08]

Forward Pull Move

Pull vertex 5 to a new position
Before move
After move
17
Simulation Models

HP Lattice Model: [Böckenhauer08]

Tabu Search




Greedy, heuristic search
Simulates protein folding
Pull moves transform start state  local minimum
Records recent moves in a Tabu list


Fast backtracking to different paths
Summary of HP Lattice Model:


Input: Amino acid sequence
Output: Heuristically folded protein
18
Probabilistic
Roadmap Model
19
Simulation Models

Probabilistic Roadmap [Song04]

2D Graph (Configuration space):


Each point represents an entire state (all amino acids).
Obstacles are infeasible states
20
Simulation Models

Probabilistic Roadmap [Song04]

Goal:


Given start & goal states
Find “best path” from start  goal
21
Simulation Models

Probabilistic Roadmap [Song04]

3 Steps:
1. Node generation:

Generate points randomly (dense near the goal state)
2. Roadmap Construction

Connect nearest neighbors  graph
3. Query roadmap



Dijkstra’s algorithm  shortest path
Shortest path = set of states
Describes the dynamic folding process
22
Simulation Models

Probabilistic Roadmap [Song04]
1. Node generation:


Generate random points
“Obstacles” are infeasible (self-overlapping) states
23
Simulation Models

Probabilistic Roadmap [Song04]
2. Roadmap Construction

Connect nearest neighbors  graph
24
Simulation Models

Probabilistic Roadmap [Song04]
3. Query roadmap


Dijkstra’s algorithm  shortest path
Path = set of states that describes the folding process
25
Molecular
Dynamics Model
26
Simulation Models

Molecular Dynamics Models [Tapia07]



Model forces based on Newton’s laws of motion
Very accurate
Very slow!


Simulating one microsecond of folding for a 36 residue
protein = Months of supercomputer time!
Cannot handle full length proteins
27
Simulation Models

Map-based Master Equation (MME) [Tapia07]

Fast enough to study full length proteins
More accurate than simplistic lattice models
MME is an extension of a Probabilistic Roadmap

Probabilistic roadmap ≈ Viterbi algorithm




returns one optimal path
MME ≈ Baum-Welch algorithm



Maintains transition probabilities for every state
Learning is executed until probabilities stabilize.
Can return the probability of any state at time t.
28
Simulation Models

Map-based Monte-Carlo (MMC) [Tapia07]

MMC = Probabilistic Roadmap + Monte-Carlo

Monte-Carlo [Wiki08_MC]


random sampling + algorithms = result
Example: Battleship



Make random shots
Apply prior knowledge
 Battleship = 4 vertical/horizontal dots
Apply algorithms to quickly sink the ship
29
Simulation Models

Map-based Monte-Carlo (MMC) [Tapia07]

Fast & reasonably accurate

Models the protein as an articulated figure

Each joint = set of angles
Movement-based (kinetic) statistics

Results suggest that:



Local helix structures form first
Folding occurs around hydrophobic core
30
Conclusion
Protein Folding:


1D Amino acid chain folds into 3D structure

Misfolding  Alzheimer’s, Parkinson’s, Mad Cow diseases

Folding is too fast to observe experimentally
Four Simulation Models:

1.


Triangular Lattice model (2D Graph)
Vertex = one amino acid
“Moves” transition between states
31
Conclusion
Four Simulation Models (cont.)

2.
Probabilistic Roadmaps

Vertex represents state of entire protein
Random sampling + Dijkstra’s alg  Best folding route

≈ Viterbi (returns one path)

3.
Map-Based Master Equation (MME)


4.
Learn probabilities
≈ Baum-Welch (confidence level for each state)
Map-Based Monte Carlo (MMC)

Articulated figures with joints model proteins
32
References:

[Böckenhauer08]


Hans-Joachim Böckenhauer, Abu Zafer M. Dayem
Ullah, Leonidas Kapsokalivas, and Kathleen
Steinhöfel. A local move set for protein folding in
triangular lattice models. In Keith A. Crandall and
Jens Lagergren, editors, WABI, volume 5251 of
Lecture Notes in Computer Science, pages 369–381.
Springer, 2008.
[Dobson99]

C. Dobson and M. Karplus. The fundamentals of
protein folding: bringing together theory and
experiment. Current Opinion in Structural Biology,
9:928–101, 1999.
33
References:

[Song04]


G. Song and N. M. Amato. A motion planning
approach to folding: From paper craft to protein
folding. Proc. IEEE Transactions on Robotics and
Automatics, 20:60–71, 2004.
[Tapia07]

Lydia Tapia, Xinyu Tang, Shawna Thomas, and
Nancy M. Amato. Kinetics analysis methods for
approximate folding landscapes. Bioinformatics,
23(13):i539–i548, 2007.
34
References:

[˘Sali94]


[Wiki08]


A., E. Shakhnovich, and M. Karplus. How does a
protein fold? Nature, 369:248–251, 1994.
Wikipedia. Protein folding — Wikipedia, the free
encyclopedia, 2008.
http://en.wikipedia.org/wiki/Protein_folding.
[Wiki08_MC]

Wikipedia. Monte-Carlo method — Wikipedia, the
free encyclopedia, 2008.
http://en.wikipedia.org/wiki/Monte_Carlo_method
35
Thank you for
your attention.
Questions
36
Extra Slides
37
Simulation Models

Map-based Master Equation (MME) [Tapia07]


MME = Probabilistic roadmap + Master Equation
Master Equation – set of equations defining the
probability of a system to be in a discrete set of
states at a given time.
38