Transcript bioposter
Cluster Based
Protein Folding
Douglas Fuller and Brandon
McKethan
Overview: What is
Folding@Home?
Screensaver-based distributed
computing program run by Stanford
Utilizes unused processing power to fold
proteins through a finite number of
frames
The same work units are run on
numerous computers to confirm accuracy
What Does F@H
Accomplish?
Helps in showing the process of linear
amino acid chain to 3D protein structure
Is used on proteins involved with many
diseases in order to elucidate how
misfolding occurs
Will eventually lead to mutation-tophenotype simulations
Computational Aspects
Monte Carlo simulation using lowest
energy state calculations
Non-parallel, unimolecular program
Heuristic approaches
Implications of
Heuristics/Unimolecular
The environment of the cell and
molecular interactions
Solvents and extramolecular interactions
cannot be ignored in the process of
folding
Many diseases arise from misfolding that
is not influenced by the internal energy
state
Diseases of Protein
Misfolding May Require
Multimolecular Interactions
Cancers – B-RAF, Hsp-90 and 17-AAG
Prions – Infectious Proteins
BSE (Mad Cow Disease)
Kuru
Sheep Scrapie
Computational Weight of
Multimolecular Interactions
The number of energy states and
inter/intra-molecular interactions are much
higher than unimolecular
Pushes the computational return time
above appreciable limits for the F@H
project
Desktop computing and the Lowest
Common Denominator
Cluster Based Folding and
Future Aspects of Folding
Use cluster computing as the testing
ground for truly parallel simulations
Individual proteins are discrete units
Allows the program to be refined while
highly parallel desktop computing comes
to fruition
5-10 year timeframe
Post-Multithreading
Possibilities
Next truly discrete unit is the atom itself
Atom-per-processor modeling vs. Monte
Carlo
Requires incredibly high number of
processors – 100’s of thousands
Once again clusters provide testing
ground
Parallelizing a
computation
Considered “re-bugging” your code
Distribute work to multiple processors
Requires communication to deal with
dependencies
Requires computation to distribute work
and recombine results
Now what?
Domain Decomposition
Decide how to divide
work
Spatially
Temporally
Other?
Introduces overhead
Can pessimize
instead of optimize
Cheat!
“Embarassingly parallel” code
Splits naturally into small pieces
Small pieces can ignore each other
Small pieces can be computed by a single
node
Folding@Home
Problem: fold all proteins they care about
Decomposition: individual proteins
Dependencies: none!
Domain Decomposition:
Challenges
Analyze dependencies
Communication patterns
Communication volume
Data distribution
Overlap computation/communication
Consider system characteristics
Communication latency/bandwidth
Computational efficiency
Computation/communication ratio
Do this all ahead of time?
Domain Decomposition:
Pitfalls
Parallel overhead
Computation waiting on communication
Feed-forward dependencies
Dynamic decomposition schemes
Pick two: performance, portability,
scalability