konishi_condor_knopp.. - Computer Sciences Dept.

Download Report

Transcript konishi_condor_knopp.. - Computer Sciences Dept.

Improving the Research Bootstrap of Condor High
Throughput Computing for Non-Cluster Experts Based
on Knoppix Instant Computing Technology
RIKEN Genomic Science Center
Fumikazu KONISHI
Background
• Biologists need a high
performance computing
system for their research
process. However, they do
not know how to build a
cluster system by
themselves.
Condor Week 2006
Meet Chie-san.
She is a
biologist with
a big problem.
I borrowed slides from Condor.
Condor Week 2006
Chie-san’s Application …
Run a Sequence Sweep of
InterProScan for Mouse
cDNAs of a total of
103,000 clones .
– InterProScan takes on the
average 1 minute to compute
on a “typical” workstation
(total = 103000 × 1 = 103000
minutes = 1716 hours )
– InterProScan requires 6G
bytes Public Database set for
each.
Condor Week 2006
http://www.ebi.ac.uk/interpro/README1.html
I have 103,000 sequences to search a gene
functional domain. And I am Non-Cluster
Experts.
Policy Barrier
Technical Skill Barrier
Who will help me?
Condor Week 2006
Getting Knoppix for InterProScan
High Throughput computing Edition
•
Available as a free download from
Google Search “fumikazu”.
Download the image file.
The image includes:
•
•
•
•
InterProScan4.1
Condor 6.6.10
PVFS2 1.2
Ganglia 3.0.1
Condor Week 2006
Chie-san can boot up by an image of Instant
High Throughput Computing with an Application
on lab’s machines…
She can borrow lab’s
computers on weekend
without any software
installation.
Condor Week 2006
Goal
• This research goal is to
provide an instant high
performance
bioinformatics research
workbench for all biology
researchers, and allow us
easy setup in collaborative
project without side
effect to local system.
Condor Week 2006
Instant Setup Technologies
• Install-Based Deploy System
– RPM-Based automatic configuration technology
(Redhat)
– NPACI Rocks toolkits (UCSD)
• Image-Based Deploy System
– Live-CD technology (Knoppix)
Condor Week 2006
Key Solutions
• Knoppix
– A GNU/Linux distribution that construct a
machine without hard disk instillation.
• Parallel File System
– PVFS is intended a high-performance
parallel file system for cluster computing.
This system provides high bandwidths
access and huge volume storage area.
Condor Week 2006
Parallel File System on RAM Disk
Condor Week 2006
Knoppix for InterProScan4.1 High
Throughput Computing Edition
Worker Node
500MByte
500MByte
500MByte
500MByte
Intra Network
PXE Boot
500MByte
500MByte
Head Node
Inter Network
Database download server
Private Network
Service Sharing
Condor Week 2006
Step 1: Booting image
Boot the head node, IP address
leased by the DHCP server is
displayed after the boot sequence.
Condor Week 2006
Step 2: after the successful, two setup
options—EASY and ADVANCED—are displayed
on the screen.
Condor Week 2006
Step 3: Boot work nodes
All nodes must support PXE boot; The system must automatically assess
whether sufficient resources are available for the database arrangement of
InterProScan4.1.
Condor Week 2006
Step 4: building cluster system
Condor Week 2006
Condor Week 2006
Download InterProScan database set
Condor Week 2006
Testing
The system submits a single test job. The test jobs are
completed in a few minutes. The condor job status is
displayed on the browser, and Ganglia provides a large
amount of information on all nodes. All configurations can
Week 2006
be tested in thisCondor
phase.
Results
Condor Week 2006
Condor Week 2006
Web site
Condor Week 2006
http://big.gsc.riken.jp/index_html/Members/fumikazu/htc
Questions
Condor Week 2006