DISCWorld – A Java-Based Metacomputing Environment

Download Report

Transcript DISCWorld – A Java-Based Metacomputing Environment

DISCWorld, Virtual Data Grids and
Grid Applications
Paul Coddington
Distributed & High Performance Computing Group
Department of Computer Science, University of Adelaide
Adelaide SA 5005, Australia
Email: [email protected]
December 2002
Background
• Distributed and High-Performance Computing Group
–
–
–
–
–
Started 1996 at University of Adelaide, PDC joined Aug 1997
Ken Hawick, Andrew Wendelborn, Francis Vaughan, Kevin Maciunas
Originally part of Research Data Networks CRC
Research into metacomputing middleware and on-line data archives
DHPC Bangor started in 2000 by Ken Hawick
• Research areas:
– Metacomputing (grid computing)
– Java for High-Performance Computing
– Parallel computing and parallel algorithms
– Cluster computing
– Scientific applications of DHPC
DISCWorld Project
• Distributed Information Systems Control World (DISCWorld) is a
metacomputing middleware system being developed by DHPC.
• Mainly a vehicle for research into metacomputing systems, but
also developing software and applications.
• O-O, written in Java, provides access to well-known services.
• Still work in progress -- design work done, various modules in
different states of completion.
• DISCWorld - high-level, but ideas not fully implemented.
• Globus - very low-level, limited capabilities, but defacto standard.
• Would like to use high-level DISCWorld ideas, but utilize grid
tools, protocols, etc being developed around Globus.
• Current work includes:
– chains or process networks of services for remote distributed processing
– transparent access to ``active’’ distributed heirarchical file systems
– integration with Globus tools (using Java CoG)
Virtual Data Grids
• Data grids - where storage, searching and accessing large (e.g. Pbyte)
distributed data sets is at least as important as data processing.
– High-energy physics, astronomy, satellite data, biological data, …
• DHPC area of interest - On-Line Data Archives (OLDA) project.
• Distributed ``active’’ data archives - or virtual data grids.
–
–
–
–
–
–
–
Servers don’t just provide ``passive’’ static data from files.
Can provide smart data pre-processing services (data reduction, conversion, etc).
Server(s) generate data on-the-fly, or access cached copy.
Specify data services or requirements (e.g. metadata), not filenames or URLs.
Transparently access ``best’’ copy from distributed replicas.
Distributed Active Resource arChitecture (DARC)
International Virtual Data Grid Laboratory (IVDGL) work on virtual data grids
• Example
– user specifies required satellite image using metadata (time, region, satellite)
– DARC node searches distributed archives, finds ``nearest’’ copy, requests data
– server does format conversion, georectification, cropping, ...
Active Data Using GASS
Host A
Active
GASS
Client
Host B
Active
GASS
Server
Host Table
Host C
Job Manager
Remote
GASS
Server
https://host:port/filename?program=Truncation&offset=1&length=100
https://host:port/Truncation?filename=myfile&offset=&length=100
Servlets using HTTPS
• Legacy applications can access data grid resources using filenames (URLs)
DARC
Host A
Host B
DR
TCP
DR
Node
Node
DR
DR
Node
DR
DR
Host C
• Distributed Active Resource Architecture (DARC)
– Allows building of distributed storage devices that support active data
– Peer-to-peer approach, each machine runs a DARC node
– User (or system) supplied Data Resources (DR)
Active Data Using DARC
Host A
Active
GASS
Client
Host B
GASS
GASS
Server
Proxy
File System
Node
Host C
GSI
Node
Host
Table
File System
Host Table
• Integration of DARC with Globus tools
– Allows DARC to use GASS, GridFTP, Replica Catalog
– Allows Globus grid applications to access DARC data resources
Mobile Metacomputing Middleware
• m.Net G3 mobile network testbed in Adelaide city (North Terrace).
• Collaborative project to provide metacomputing middleware for
mobile devices (e.g. iPAQ, phones) - starting next year.
• DISCWorld metacomputing ideas fit well to a mobile environment:
– provide thin client with access to set of well-know remote services
– provide resource broking in dynamic environment
– Java implementation
• Middleware handles low-level network details
– dynamic network environment - mobile user, dropouts, handovers
– 3G, regular mobile, 802.11 wireless, docking station
– quality of service (adding software layer to interface to IP stack)
• Allow users (or applications) to specify policies for services, tasks,
priorities, costs.
• Provide adaptation for dynamic network, user policies, QoS, cost.
Campus Computational Grid
• Many different compute resources available on campus
–
–
–
–
Supercomputers (SGI box, PC cluster with Ethernet, Sun and PC clusters with Myrinet)
Several small clusters (Sun Netra, Alpha, Linux PC, Javastation, …)
Student labs (Windows PC, iMac with OS X)
Desktop workstations
• Student labs are probably the largest computational resource!
• A mixture of non-interoperable cluster management systems, each
with pros and cons, significant effort to install and maintain
–
–
–
–
Condor - desktop workstations and Windows PCs (but not good for parallel machines)
Proprietary CMS (e.g. on Sun cluster)
PBS - other parallel computers
Only Sun Grid Engine currently ported to Mac OS X
• How to integrate this heterogeneous mix of computer resources and
management systems to make them easily and transparently accessible
by a variety of users?
Problems with Campus Grids
• Could integrate with Globus, but how to submit jobs?
• Not globusrun - too low level.
• Ideally users would submit jobs as they do now - with shell scripts,
PBS job scripts, Condor job submission files - and they run on any
resource (whether or not it runs PBS, Condor, etc).
• But currently requires something like
CMS script -> RSL/globusrun -> CMS/scheduler
• Globus (mostly) handles second translation, but not the first.
• Non-trivial - CMS combines job specification/execution and resource
request/brokering, but Globus separates the two.
• How to match jobs with appropriate resource (e.g. which jobs need
shared memory, Myrinet, Ethernet, no comms?)
• How to interface and cycle share with external grid resources?
Grid Applications
• Lattice Gauge Theory
– Centre for the Subatomic Structure of Matter (CSSM)
– International Lattice Data Grid for sharing simulation data
• Bioinformatics
– National Centre for Plant Functional Genomics
– Molecular Biosciences department
– APGrid biogrid project
• High-energy physics
– Collaboration between CSSM and Jefferson Lab in US
– Data analysis and results
– Access Grid
• Computational chemistry
• CANGAROO gamma ray telescope
– Collaboration between Australia and Japan
– Link to national/international Virtual Observatory projects