Purdue RP updates

Download Report

Transcript Purdue RP updates

Purdue RP Highlights
TeraGrid Round Table
May 20, 2010
Preston Smith
Manager - HPC Grid Systems
Rosen Center for Advanced Computing
Purdue University
Updates on Purdue Condor Pool
• Purdue Condor resource now in excess of
30,000 cores
• Recent active users:
– Fixation Tendencies of the H3N2 Influenza Virus
– N-body simulations: Planets to Cosmology
– De Novo RNA Structures with Experimental Validation
– Planet-planet scattering in planetesimal disks
– Robetta Gateway
TeraGrid Round Table, 5/20/2010
New Developments in Condor Pool
• Virtual Machine “Universe”
Running on student Windows labs today – with VMWare
Integrating now: KVM and libVirt on cluster (Steele) nodes
TeraGrid Round Table, 5/20/2010
Condor VM Use Cases
• “VMGlide”
– Using Condor to submit, transfer, and boot Linux VMs as
cluster nodes on Windows systems
• More usable to the end-user!
– Tested and demonstrated on the order of ~800 VMs over
a weekend
• All running real user jobs inside of the VM container
– Working with Condor team to minimize the network
impact of transferring hundreds of VM images over the
network
TeraGrid Round Table, 5/20/2010
Condor VM use cases
• User-submitted virtual machines
– For example: User has a code written in Visual Basic, that
runs for weeks at a time on his PC
• Submitting to Windows Condor is an option, but the long runtime
coupled with an inability to checkpoint limits its utility
• Solution:
– Submit the entire Windows PC as a VM universe job – which will
be suspended, checkpointed, and moved to a new machine
until execution completes
TeraGrid Round Table, 5/20/2010
Condor and Power
• In the economic climate of 2010, Purdue, like many
institutions is looking to save power costs
• The campus Condor grid will help!
– By installing Condor on machines around campus we will
• Get useful computation out of the powered-on machines
• And if there’s no work to be done?
– Condor can hibernate the machines and wake them when
there is work waiting
TeraGrid Round Table, 5/20/2010
Cloud Computing: Wispy
• Purdue staff operating experimental cloud resource
– Built with Nimbus from UC
– Current Specs
• 32 nodes (128 cores):
– 16 GB RAM
– 4 cores per node
– Public IP space for VM guests
TeraGrid Round Table, 5/20/2010
The Workspace Service
TeraGrid Round Table 5/20/2010
Slide borrowed from Kate Keahey:
http://www.cs.wisc.edu/condor/CondorWeek2010/condor-presentations/keahey-nimbus.pdf
Wispy – Use Cases
• Used in Virtual Clusters
– Publications using Purdue’s Wispy cited below
• NEES project exploring using Wispy to provision ondemand clusters for quick turn-around of wide parallel
jobs
• Working with faculty at Marquette Univ. to use Wispy
in Fall 2010 course to teach cloud computing concepts
• With OSG team, using Wispy (and Steele) to run VMs
for STAR project
•“CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for
Bioinformatics Applications” by A. Matsunaga, M. Tsugawa and J. Fortes. eScience 2008.
•“Sky Computing”, by K. Keahey, A. Matsunaga, M. Tsugawa, J. Fortes, to appear in IEEE
Internet Computing, September 2009
TeraGrid Round Table, 5/20/2010
HTPC
• High-Throughput Parallel Computing
– With OSG and Wisconsin, using Steele to submit
ensembles of single-node parallel jobs
• Package jobs with a parallel library (MPI, OpenMP, etc)
• Submit to many OSG sites as well as TG
– Who’s using?
• Chemistry – over 300,000 hours used in Jan
– HTPC allowed 9 papers in 10 months to be written!
TeraGrid Round Table, 5/20/2010
Storage
• DC-WAN mounted and used at Purdue
– Working on Lustre lnet routers to reach compute nodes
• Distributed Replication Service
– Sharing spinning disk to DRS today
– Investigating integration with Hadoop Filesystem (HDFS)
• NEES project investigating using DRS to archive data
TeraGrid Round Table, 5/20/2010