No Slide Title

Download Report

Transcript No Slide Title

The Australian Virtual
Observatory
Clusters and Grids
David Barnes
Astrophysics Group
Overview
•
•
•
•
•
•
•
•
What is a Virtual Observatory?
Scientific motivation
International scene
Australian scene
DataGrids for VOs
ComputeGrids for VOs
Sketch of AVO DataGrid and ComputeGrid
Clustering experience at Swinburne
What is a Virtual Observatory?
• A Virtual Observatory (VO) is a distributed, uniform
interface to the data archives of the world’s major
astronomical observatories.
• A VO is explored with advanced data mining and
visualisation tools which exploit the unified interface
to enable cross-correlation and combined
processing of distributed and diverse datasets.
• VOs will rely on, and provide motivation for, the
development of national and international
computational and data grids.
Scientific motivation
• Understanding of astrophysical processes depends
on multi-wavelength observations and input from
theoretical models.
• As telescopes and instruments grow in complexity,
surveys generate massive databases which require
increasing expertise to comprehend.
• Theoretical modeling codes are growing in
sophistication to consume available compute time.
• Major advances in astrophysics will be enabled by
transparently cross-matching, cross-correlating and
inter-processing otherwise disparate data.
Sample multi-wavelength data for the galaxy IC5332 (Ryan-Weber)
blue
HI spectral line
column density
H-alpha spectral line
HI spectral line
velocity dispersion
infrared
HI spectral line
velocity field
HI profile from public release
International scene
• AstroGrid (www.uk-vo.org) – phase A (1yr
R&D) complete; phase B (3yr
implementation) funded £3.7M.
• Astrophysical Virtual Observatory
(www.euro-vo.org) – phase A (3yr R&D)
funded €4.0M.
• National Virtual Observatory (www.usvo.org) – (5yr framework development)
funded USD 10M.
Australian scene
• Australian Virtual Observatory (www.aus-vo.org) –
phase A (1yr common-format archive
implementation) funded AUD 260K (2003 LIEF
grant [Melb, Syd, ATNF, AAO]).
• Data archives are:
–
–
–
–
HIPASS: 1.4 GHz continuum and HI spectral line survey
SUMSS: 843 MHz continuum survey
S4: digital images of the southern sky in five optical filters
ATCA archive: continuum and spectral line images of the
southern sky
– 2dFGRS: optical spectra of >200K southern galaxies
– and more...
DataGrids for VOs
• archives listed on previous slide range from
~10 GB to ~10 TB in processed (reduced)
size.
• providing just the processed images and
spectra on-line requires a distributed, highbandwidth network of data servers – that is,
a DataGrid.
• users may want some simple operations
such as smoothing or filtering, applied at the
data server. This is a Virtual DataGrid.
ComputeGrids for VOs
• More complex operations may be applied
requiring significant processing:
– source detection and parameterisation
– reprocessing of raw or intermediate data
products with new calibration algorithms
– combined processing of raw, intermediate or
"final product" data from different archives
• These operations require a distributed, highbandwidth network of computational nodes
– that is, a ComputeGrid.
Possible initial players
in the Australian Virtual
Observatory Data and
Compute Grids…
CPU?
Parkes?
Data CPU?
ATNF/AAO
2dFGRS
RAVE
Data
Canberra
CPU?
ATCA
Adelaide
Theory?
CPU
Data CPU?
VPAC
Melbourne
HIPASS
Gemini?
Theory
Data
Sydney
SUMSS
Grangenet
CPU
APAC
CPU
Swinburne
Theory
Clustering @ Swinburne
•
•
•
•
•
1998 – 2000: 40 Compaq Alpha workstations
2001: +16 Dell dual PIII rackmount servers
2002: +30 Dell dual P4 workstations
mid 2002: +60 Dell dual P4 rackmount servers
November 2002: placed 180th in Top500 with 343
sustained Gflop/s. (APAC 63rd with 825 Gflop/s)
• +30 Dell dual P4 rackmount servers installed mid
2002 at the Parkes telescope in NSW.
• psuedo-Grid with data pre-processed in realtime at
the telescope, shipped back in “slowtime”.
Swinburne activities
• N-body simulation codes:
– galaxy formation
– stellar disk astrophysics
– cosmology
• Pulsar searching and timing
– (1 GB/min data recording)
• Survey processing as a coarse-grained
problem
• Rendering of virtual reality content
Clustering costs
nodes
price/node price/cpu
1 cpu, 256MB std mem, 20GB
disk, ethernet
1.3K
1.3K
2 cpu, 1 GB fast mem, 20 GB
disk, ethernet
4.4K
2.2K
2 cpu, 2GB fast mem, 60 GB
SCSI disk, ethernet
8.0K
4.0K
Giganet, Myrinet, ...
1.5K
1.5K (1 cpu)
0.8K (2 cpu)
(estimates incl. on-site warranty; 2nd fastest cpu; excl. infrastructure)
Some ideas...
• “desktop cluster” – astro group has 6 dual-cpu
workstations.
– Add MPI, PVM, Nimrod libs and Ganglia monitoring tool to
get 12-cpu loose cluster with 8GB mem.
– Use MOSIX to provide transparent job migration with
workstations joining the cluster at night-time.
• “pre-purchase cluster” – univ. buys ~500
desktops/yr – use them for ~6 months!
– build up a cluster of desktops purchased ahead of
demand, and replace as deployed to desktops.
– Gain compute power of new CPUs without any real effect
on end-users.