Transcript bioposter

Cluster Based
Protein Folding
Douglas Fuller and Brandon
McKethan
Overview: What is
Folding@Home?
 Screensaver-based distributed
computing program run by Stanford
 Utilizes unused processing power to fold
proteins through a finite number of
frames
 The same work units are run on
numerous computers to confirm accuracy
What Does F@H
Accomplish?
 Helps in showing the process of linear
amino acid chain to 3D protein structure
 Is used on proteins involved with many
diseases in order to elucidate how
misfolding occurs
 Will eventually lead to mutation-tophenotype simulations
Computational Aspects
 Monte Carlo simulation using lowest
energy state calculations
 Non-parallel, unimolecular program
 Heuristic approaches
Implications of
Heuristics/Unimolecular
 The environment of the cell and
molecular interactions
 Solvents and extramolecular interactions
cannot be ignored in the process of
folding
 Many diseases arise from misfolding that
is not influenced by the internal energy
state
Diseases of Protein
Misfolding May Require
Multimolecular Interactions
 Cancers – B-RAF, Hsp-90 and 17-AAG
 Prions – Infectious Proteins
 BSE (Mad Cow Disease)
 Kuru
 Sheep Scrapie
Computational Weight of
Multimolecular Interactions
 The number of energy states and
inter/intra-molecular interactions are much
higher than unimolecular
 Pushes the computational return time
above appreciable limits for the F@H
project
 Desktop computing and the Lowest
Common Denominator
Cluster Based Folding and
Future Aspects of Folding
 Use cluster computing as the testing
ground for truly parallel simulations
 Individual proteins are discrete units
 Allows the program to be refined while
highly parallel desktop computing comes
to fruition
 5-10 year timeframe
Post-Multithreading
Possibilities
 Next truly discrete unit is the atom itself
 Atom-per-processor modeling vs. Monte
Carlo
 Requires incredibly high number of
processors – 100’s of thousands
 Once again clusters provide testing
ground
Parallelizing a
computation
 Considered “re-bugging” your code
 Distribute work to multiple processors
 Requires communication to deal with
dependencies
 Requires computation to distribute work
and recombine results
 Now what?
Domain Decomposition
 Decide how to divide
work
 Spatially
 Temporally
 Other?
 Introduces overhead
 Can pessimize
instead of optimize
Cheat!
“Embarassingly parallel” code
Splits naturally into small pieces
Small pieces can ignore each other
Small pieces can be computed by a single
node
 Folding@Home




 Problem: fold all proteins they care about
 Decomposition: individual proteins
 Dependencies: none!
Domain Decomposition:
Challenges
 Analyze dependencies




Communication patterns
Communication volume
Data distribution
Overlap computation/communication
 Consider system characteristics
 Communication latency/bandwidth
 Computational efficiency
 Computation/communication ratio
 Do this all ahead of time?
Domain Decomposition:
Pitfalls
 Parallel overhead
 Computation waiting on communication
 Feed-forward dependencies
 Dynamic decomposition schemes
 Pick two: performance, portability,
scalability