Petascale Supercomputing in Structural Biology

Download Report

Transcript Petascale Supercomputing in Structural Biology

High-Throughput Virtual Molecular Docking:
Hadoop Implementation of AutoDock4 on a Private Cloud
The Second International Emerging Computational Methods for the Life Sciences Workshop
ACM International Symposium on High Performance Distributed Computing
June 8, 2011, San Jose, CA
Sally R. Ellingson
Graduate Research Assistant
Center for Molecular Biophysics, UT/ORNL
Department of Genome Science and Technology, UT
Scalable Computing and Leading Edge Innovative Technologies (IGERT)
Dr. Jerome Baudry
PhD Advisor
Center for Molecular Biophysics, UT/ORNL
Department of BCMB, UT
Ultimate Goal:
Reduce the time and cost of
discovering novel drugs
1. Virtual Molecular Docking
a) Novel Drug Discovery
b) Virtual high-throughput screenings (VHTS)
2. Cloud Computing
a) Advantages for VHTS
b) Kandinsky
c) Hadoop (MapReduce)
3. AutoDockCloud
a) Current Implementation
b) Future Implementations
Virtual Molecular Docking
Given a receptor (protein) and ligand (small molecule), predict
1. Bound conformations
• Search algorithm to explore conformational space
2. Binding affinity
• Force field to evaluate energetics
Virtual Docking Engine
http://autodock.scripps.edu/wiki/AutoDock4
Novel Drug Discovery
Human HDAC4
HA3 crystal structure
ZINC03962325
Virtual High-Throughput Screening (VHTS)
VHTS with Autodock4
Potential advantages of Cloud
Computing for VHTS
• Affordable access to compute resources
(especially for small labs and classrooms).
• Easy to use interface accessible through web
for non-computer experts. Software
maintained by experts.
• Scalable resources for size of screening.
Kandinsky
Private Cloud Platform at ORNL
Kandinsky, the Systems Biology Knowledgebase
Computer, Sponsored by the Office of
Biological and Environmental Research in the
DOE Office of Science
68 nodes X 16 cores/node = 1088 cores
20 Gbps Infiniband Interconnect
Designed to support Hadoop applications and
gain an understanding of the MapReduce
paradigm.
•57 nodes for MapReduce tasks
• 1 tasktracker per node
•10 map and 6 reduce tasks per node (16 tasks
per node)
•570 map tasks and 342 reduce tasks can run
simultaneously on Kandinsky
Hadoop
•
•
•
•
Scalable
Economical
Efficient
Reliable
http://hadoop.apache.org/common/docs/current/api/overview-summary.html
MapReduce
programming paradigm used by Hadoop
people.apache.org
people.apache.org
Current AutoDockCloud
Implementation
input=file names needed for each docking
map(input)
{
copy input to local working directory;
run AutoDock4 locally;
copy result file to HDFS;
}
*pre-docking set-up and post-docking analysis is currently
done manually
*no reduce function is currently being used
Current AutoDockCloud
Implementation
Er Agonist screening from DUD as benchmark
450 speed-up with 570 available map slots on
Kandinsky, private cloud at ORNL
Percent of known ligands found
Current AutoDockCloud
Implementation
Percent of ranked database
Docking enrichment plot for ER agonist using
AutoDockCloud and DUD.
Future AutoDockCloud
Implementation
input=ligand file from chemical compound database
map(input)
{
create pdbqt (AutoDock input file) from input;
run AutoDock4 locally;
find best scoring ligand structure;
save structure to HDFS;
return <score, ligand>;
}
reduce(<score, ligand>)
{
sort;
return ranked_database;
}
*pre-docking and post-docking will be automated and distributed
*less total I/O requirements
Future Plans
• Incorporate additional docking engines
– Autodock Vina
• Less I/O
• More efficient and accurate algorithm
• No charge information needed
• Deploy on Commercial Cloud (EC2)
• Develop web interface
1. Virtual Molecular Docking
a) Novel Drug Discovery
b) Virtual high-throughput screenings (VHTS)
2. Cloud Computing
a) Advantages for VHTS
b) Kandinsky
c) Hadoop (MapReduce)
3. AutoDockCloud
a) Current Implementation
b) Future Implementations
Acknowledgements
• Dr. Jerome Baudry (advisor)
• Center for Molecular Biophysics, UT/ORNL
• Genome Science and Technology, UT
• Scalable Computing and Leading Edge
Innovative Technologies (IGERT)
• Avinash Kewalramani, ORNL
• ECMLS and HPDC organizers and participants
Questions/Comments