oscar - Computer Science
Download
Report
Transcript oscar - Computer Science
Open Source Cluster Applications Resources
Overview
What is O.S.C.A.R.?
History
Installation
Operation
Spin-offs
Conclusions
History
CCDK (Community Cluster Development Kit)
OCG (Open Cluster Group)
OSCAR (the Open Source Cluster Application
Resource)
IBM, Dell, SGI and Intel working closely together
ORNL – Oak Ridge National Laboratory
First Meeting
Tim Mattson and Stephen Scott
Decided on these:
That the adoption of clusters for mainstream, high-performance computing is
inhibited by a lack of well-accepted software stacks that are robust and easy to
use by the general user.
That the group embraces the open-source model of software distribution.
Anything contributed to the group must be freely distributable, preferably as
source code under the Berkeley open-source license.
That the group can accomplish its goals by propagating best-known practices
built up through many years of hard work by cluster computing pioneers.
Initial Thoughts
Differing architectures (small, medium, large)
Two paths of progress, R&D and ease of use
Primarily for non-computer-savvy users.
Scientists
Academics
Homogeneous system
Timeline
Initial meeting in 2000
Beta development started the same year
First distribution, OSCAR 1.0 in 2001 at LinuxWorld
Expo in New York City
Today up to OSCAR 5.1
Heterogeneous system
Far more robust
More user friendly
Supported Distributions – 5.0
Distribution and Release
Architecture
Status
Red Hat Enterprise Linux 4
x86
Fully supported
Red Hat Enterprise Linux 4
x86_64
Fully supported
Red Hat Enterprise Linux 4
ia64
Fully supported
Fedora Core 4
x86
Fully supported
Fedora Core 4
x86_64
Fully supported
Fedora Core 5
x86
Fully supported
Fedora Core 5
x86_64
Fully supported
Mandriva Linux 2006
x86
Fully supported
SUSE Linux 10.0
x86
Fully supported
Installation
Detailed Installation notes
Detailed User guide
Basic idea:
Configure head node (server)
Configure image for client nodes
Configure network
Distribute node images
Manage your own cluster!!
Head Node
Install by running ./install_cluster eth1 script
GUI will auto-launch
Chose desired step in GUI, make sure each step is
complete before proceeding onto next one
All the configuration can be done from this system
from now on
Download
Subversion is used
Default is the OSCAR SVN
Can set up custom SVN
Allows for up to date
installation
Allows for controlled rollouts of
multiple clusters
OPD also has powerful
command line functionality
(LWP for proxy servers)
Select & Configure OSCAR packages
Customize server up to your
liking/needs
Some packages can be customized
This step is very crucial, choice of
packages can affect performance as
well as compatibility
Installation of Server Node
Simply installs packages which were selected
Automatically configures the server node
Now the Head or Server is ready to manage, administer
and schedule jobs for it’s client nodes
Build Client Image
Choose name
Specify packages within the package file
Specify distribution
Be wary of automatic reboot if network boot is
manually selected as default
Building the Client Image …
Define Clients
This step creates the network structure of the nodes
It’s advisable to assign IP based on physical links
GUI short-comings regarding multiple IP spans
Incorrect setup can lead to an error during node
installation
Define Clients
Setup Networking
SIS – System Installation Suite
SystemImager
MAC addresses are scanned for
Must link a MAC to a node
Must select network boot method (rsync, multicast,
bt)
Must make sure clients support PXE boot or create
boot CDs
Own Kernel can be used if the one supplied with SIS
does not work
Client Installation and Test
After the network is properly configured, installation
can begin
All nodes are installed and rebooted
Once the system imaging is complete, a test can be run
to ensure the cluster is working properly
At this point, the cluster is ready to begin parallel job
scheduling
Operation
Admin packages are:
Torque Resource Manager
Maui Scheduler
C3
pfilter
System Imager Suite
Switcher Environment Manager
OPIUM
Ganglia
Operation
Library packages:
LAM/MPI
OpenMPI
MPICH
PVM
Torque Resource Manager
Server on Head node
“mom” daemon on clients
Handles job submission and execution
Keeps track of cluster resources
Has own scheduler but uses Maui by default
Commands are not intuitive, documentation must be
read
From OpenPBS
http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:a
dministration_guide:ch4.1.1_torque_overview
Maui Scheduler
Handles job scheduling
Sophisticated algorithms
Customizable
Much literature on it’s algorithms
Has a commercial gen. of Maui called Moab
Accepted as the unofficial HPC standard for
scheduling
http://www.clusterresources.com/pages/resources/do
cumentation.php
C3 - Cluster Command Control
Developed by ORNL
Collection of tools for cluster administration
Commands:
cget, cpush, crm, cpushimage
cexec, cexecs, ckill, cshutdown
cnum, cname, clist
Cluster Configuration Files
http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:a
dministration_guide:ch4.3.1_c3_overview
pfilter
Cluster traffic filter
Default is that client nodes can only send outgoing
communications, outside the scope of the cluster
If it is desirable to open up client nodes, pfilter config
file must be modified
System Imager Suite
Tool for network Linux installations
Image based, can even chroot into image
Also has database which contains cluster configuration
information
Tied in with C3
Can handle multiple images per cluster
Completely automated once image is created
http://wiki.systemimager.org/index.php/Main_Page
Switcher Environment Manager
Handles “dot” files
Does not limit advanced users
Designed to help non-savvy users
Has guards in place that prevent system destruction
Which MPI to use – per user basis
Operates on two levels: user and system
Modules package is included for advanced users (and
used by switcher)
OPIUM
Login is handled by the Head node
Once connection is established, client nodes do not
require authentication
Synchronization run by root, at intervals
It stores hash values of the password in .shh folder
along with a “salt”
Password changes must be done at the Head node as
all changes propagate from there
Ganglia
Distributed Monitoring System
Low overhead per node
XML for data representation
Robust
Used in most cluster and grid solutions
http://ganglia.info/papers/science.pdf
LAM/MPI
LAM - Local Area Multicomputer
LAM initializes the runtime environment on a select
number of nodes
MPI 1 and some of MPI 2
MPICH2 can be used if installed
Two tiered debugging system exists: snapshot and
communication log
Daemon based
http://www.lam-mpi.org/
Open MPI
Replacement for LAM/MPI
Same team working on it
LAM/MPI relegated to upkeep only, all new
development in Open MPI
Much more robust (OS, schedulers)
Full MPI-2 compliance
Much higher performance
http://www.open-mpi.org/
PVM – Parallel Virtual Machine
Same as LAM/MPI
Can be run outside of the scope of Torque and Maui
Supports Windows nodes as well
Much better portability
Not as robust and powerful as Open MPI
http://www.csm.ornl.gov/pvm/
Spin-offs
HA-OSCAR - http://xcr.cenit.latech.edu/ha-oscar/
VMware with OSCAR -
http://www.vmware.com/vmtn/appliances/directory/
341
SSI-OSCAR - http://ssi-oscar.gforge.inria.fr/
SSS-OSCAR - http://www.csm.ornl.gov/oscar/sss/
Conclusions
Future Direction
Open MPI
Windows, Mac OS?