Transcript calu99

Building Beowulfs for High
Performance Computing
Duncan Grove
Department of Computer Science
University of Adelaide
http://dhpc.adelaide.edu.au/projects/beowulf
Anatomy of a “Beowulf”
•
“Cluster” of networked PCs
–
–
–
–
Intel PentiumII or Compaq Alpha
Switched 100Mbit/s Ethernet or Myrinet
Linux
Parallel and batch software support
Switching Infrastructure
Outside
World
Front-end
Node
n1
n2
Compute Nodes
nN
Why build Beowulfs?
• Science/$
• Some problems take lots of processing
• Many supercomputers are used as batch processing engines
– Traditional supercomputers wasteful high throughput computing
• Beowulfs:
– “ [useful] computational cycles at the lowest possible price.”
– Suited to high throughput computing
– Effective at an increasingly large set of parallel problems
Three Computational Paradigms
• Data Parallel
– Regular grid based problems
• Parallelising compilers, eg HPF
• Eg physicists running lattice gauge calculations
• Message Passing
– Unstructured parallel problems.
• MPI, PVM
• Eg chemists running molecular dynamics simulations.
• Task Farming
– “High throughput computing” - batch jobs
• Queuing systems
• Eg chemists running Gaussian.
A Brief Cluster History
•
•
•
•
•
•
•
•
•
Caltech Prehistory
Berkeley NOW
NASA Beowulf
Stone SouperComputer
USQ Topcat
UIUC NT Supercluster
LANL Avalon
SNL Cplant
AU Perseus?
Beowulf Wishlist
• Single System Image (SSI)
– Unified process space
– Distributed shared memory
– Distributed file system
• Performance easily extensible
– Just “add more bits”
• Is fault tolerant
• Is “simple” to administer and use
Current Sophistication?
• Shrinkwrapped “solutions” or do-it-yourself
– Not much more than a nicely installed network of PCs
– A few kernel hacks to improve performance
– No magical software for making the cluster transparent
to the user
– Queuing software and parallel programming software
can create the appearance of a more unified machine
Stone SouperComputer
Iofor
•
•
•
•
•
•
Learning platform
Program development
Simple benchmarking
Simple performance evaluation of real applcaions
Teaching machine
Money lever
iMacwulf
• Student lab by day, Beowulf by night?
– MacOS with Appleseed
– LinuxPPC 4.0, soon LinuxPPC 5.0
– MacOS/X
“Gigaflop harlotry”
•
Machine
Cost
# Processors
~ Peak Speed
•
•
•
•
Cray T3E
SGI Origin 2000
IBM SP2
Sun HPC
10s million
10s million
10s million
1s million
1084
128
512
64
1300Gflop/s
128Gflop/s
400Gflop/s
50Gflop/s
•
•
TMC CM5
SGI PowerChallenge
5 Million (1992)
1 Million (1995)
128
20
20Gflop/s
20Gflop/s
•
•
Beowulf cluster + myrinet
Beowulf cluster
1 Million
300K
256
256
120Gflop/s
120Gflop/s
The obvious, but important
•
In the past:
– Commomdity processors way behind supercomputer processors
– Commodity networks way, way, way behind supercomputer networks
•
In the now:
– Commomdity processors only just behind supercomputer processors
– Commmodity networks still way, way behind supercomputer networks
– More exotic networks still way behind supercomputer networks
•
In the future:
– Commodity processors will be supercomputer processors
– Will the commodity networks catch up?
Hardware possibilities
Advantages
Disadvantages
x86, K7
Mass market commodity
Good floating point
(Some) SMP capable
PowerPC
Very good integer
More expensive than x86
Poor floating point
Alpha
Very good floating point
Expensive
Limited vendors
OS possibilities
Advantages
Disadvantages
Linux
Large user community
Widely available
Open source
Many platforms
Open source
Good compiler x86 only
NT
Good compilers
Poor user model
Poor stability
Poor remote access
Poor networking
Digital Unix
Very good compilers
Runs on expensive hardware
Solaris
Robust
Good quality software
Runs on x86
Not open source
MacOSX
Attractive as multipurpose cluster
Not out yet!
Darwin
Open Source
May be ported to x86?
Small user community
Open Source
• The good...
– Lots of users, active development
– Easy access to make your own tweaks
– Aspects of Linux are still immature, but recently
• SGI has release xfs as open source
• Sun has released its HPC software as open source
• And the bad...
– There’s a lot of bad code out there!
Network technologies
• So many choices!
– Interfaces, cables, switches, hubs; ATM, Ethernet, Fast Ethernet,
gigabit Ethernet, firewire, HiPPI, serial HiPPI, Myrinet, SCI…
• The important issues
– latency
– bandwidth
– availability
– price
– price/performance
– application type!
Disk subsystems
• I/O a problem in parallel systems
– Data not local on compute nodes is a performance hit
– Distributed file systems
• CacheFS
• CODA
– Parallel file systems
• PVFS
• On-line bulk data is interesting in itself
– Beowulf Bulk Data Server
• cf with slow, expensive tape silos...
Perseus
• Machine for chemistry simulations
– Mainly high throughput computing
– RIEF grant in excess of $300K
– 128 nodes. For < $2K per node
• Dual processor PII450
• At least 256MB RAM
– Some nodes up to 1GB
• 6GB local disk each
– 5x24 (+2x4) port Intel 100Mbit/s switches
Perseus: Phase 1
• Prototype
• 16 dual processor PII
• 100Mbit/s switched Ethernet
Perseus: installing a node
Switching Infrastructure
Outside
World
Front-end
Node
n1
User node, administration,
compilers, queues, nfs,
dns, NIS, /etc/*,
bootp/dhcp, kickstart, ...
n2
nN
Floppy disk or bootrom
Software on perseus
• Software to support the three computational paradigms
– Data Parallel
• Portland Group HPF
– Message Passing
• MPICH, LAM/MPI, PVM
– High throughput computing
• Condor, GNU Queue
• Gaussian94, Gaussian98
Expected parallel performance
•
•
•
Loki, 1996
– 16 Pentium Pro processors, 10Mbit/s Ethernet
– 3.2 Gflop/s peak, achieved 1.2 real Gflop/s on Linpack benchmark
Perseus, 1999
– 256 PentiumII processors, 100Mbit/s Ethernet
– 115 Gflop/s peak
• ~40 Gflop/s on Linpack benchmark?
Compare with top 500!
– Would get us to about 200 currently
– Other Australian machines?
• NEC SX/4 @ BOM at #102
• Sun HPC at #181, #182, #255
• Fujitsi VPP @ ANU at #400
Reliability in a large system
•
Build it right!
•
•
Is the operating system and software running ok?
Is heat dissipation going to be a problem?
– Monitoring daemon
• Normal features
– CPU, network, memory, disk
• More exotic features
– Power supply and CPU fan speeds
– Motherboard and CPU temperatures
•
Do we have any heisen-cabling?
– Racks and lots of cable ties!
The limitations...
•
•
•
•
•
Scalability
Load balancing
– Effects of machines capabilities
– Desktop machines vs. dedicated machines
– Resource allocation
– Task Migration
Distributed I/O
System monitoring and control tools
Maintenance requirements
– Installation, upgrading, versioning
• Complicated scripts
• Parallel interactive shell?
… and the opportuntities
• A large proportion of the current limitations compared with
traditional HPC solutions are merely systems integration
problems
• Some contributions to be made in
– HOWTOs
– Monitoring and maintenance
– Performance modelling and real benchmarking