Transcript ICTP_APAC

APAC National Grid A technology responses to diverse
requirements
Ian Atkinson
James Cook University (www.jcu.edu.au), and
Queensland Cyberinfrastructure Foundation
(www.qcif.edu.au)
Lindsay Hood
Australian Partnership for Advanced Computing
www.apac.edu.au
Australian Partnership for
Advanced Computing
“providing national advanced computing,
data management and
grid services for eResearch”
Partners:
• Australian Centre for Advanced Computing and
Communications (ac3) in NSW
• CSIRO
• iVEC, The Hub of Advanced Computing in Western Australia
• Queensland Cyber Infrastructure Foundation (QCIF)
• South Australian Partnership for Advanced Computing (SAPAC)
• The Australian National University (ANU)
• The University of Tasmania (TPAC)
• Victorian Partnership for Advanced Computing (VPAC)
4500 CPUs, 3PB storage
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Recent Review
APAC in the future must be regarded not just as the National
Facility, but as the sum of its component parts comprising:
[…]
The National Grid
[…]
That the APAC National Grid must be the pre-eminent grid in
Australia and continue extending its coverage to include
capabilities wherever they exist or develop. It must also
nurture and support scientific research teams, NCRIS
infrastructure and international partnerships
Concept of the APAC National Grid
Research Teams
Data
Centres
APAC National Grid
a virtual system of
computing, data storage
and visualisation
facilities
Sensor
Networks
Other Grids:
Institutional
International
Instruments
NCRIS - National Collaborative
Research Infrastructure Scheme
• National Plan to invest AU$500M in medium scale
collaborative access research infrastructure across 5
years 2007-1011
• 15 Investment areas of interest including:
– bioinformatics, biosecurity, geosciences, astronomy, marine and
terrestrial observation systems, structural characterization
• APAC will now be funded via NCRIS
– APAC and the National Grid must directly support the NCRIS
invesment areas as a high priority
– NCRIS investments are expected to develop and execute plans to
ensure e-Research (cyberinfrastructure) tools are and practices are
embedded into their practices and data management
Data management is now a hot topic in Australia!
NCRIS Platforms for Collaboration
Vision
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
APAC National Grid
Core Services – built on Globus
Portal Tools:
GridSphere
QCIF
Info Services:
(JCU)
MDS 2/4
MIP
Security:
APAC CA (PKI)
Grix
MyProxy
VOMRS
Systems:
QCIF
APAC
National
Facility
IVEC
ANU
SAPAC
Gateways
Partners’ systems
Network:
AARNet
APAC Private Network ? (AARNet)
ac3
VPAC
CSIRO
TPAC
<15 Staff to deliver all services!
Some requirements
• Non dedicated resources (at partner sites)
• Varied middleware requirements
(many domains to support)
• Complex virtual organisation structure
• Distributed data, workflows
• Simplified interface
This turns out to be hard!
Requirements Analysis
circa.mid 2006
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Gateways
• Rapid churn in the middleware
– You need lots of test machines
• Different communities want different
middleware
– GT2, GT4, Gridsphere, SRB …
• Minimises interaction with non griddedicated production systems
• Virtualisation is well understood technology
• Xen has a nice price
The gateway concept
• Grid middleware evolving
• Security across firewalls & institutional policies are problems
• Using gateway virtual machines to isolate production
compute/storage elements from all this change
–
–
–
–
–
•
ng1 - Globus2
ng2 - Globus4
ngdata - gridFTP, other data (SRB)
ngportal - web application portals
Others are easy to build and deploy
But some parts of GT2 especially assume they are running on
the cluster mgmt/head node ...
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Gateways on a National Backbone
Installed Gateway servers at all grid
sites, using VM technology to support
multiple grid stacks
Cluster
Cluster
Datastore
Gateway Server
consistency, ease
of implementation,
performance?
High bandwidth, dedicated, secure
private network between grid sites
Layer 3 Private Network
Gateway Server
Gateways supporting GT2, GT4, LCG,
grid portals, and experimental grid
stacks
Datastore
Cluster
HPC
Virtual Organisations
• International use tends to be large VO’s
• Australia demands small, dynamic VO’s
• VOMS/VOMRS has problems
– Admin security model
– myproxy interaction
– gridmapfile – a user can be in one VO
• Adopting PRIMA/GUMS
– Still complicated and not especially dynamic
“Workflows”
• Many existing HPC users have significant
shell scripts, and queue commands (PBS)
• WSGRAM, JSDL, BPEL may be human
readable, but not human writable!
• Abstraction of HPC systems is tough
– Eg SGI’s profile.pl doesn’t handle cpusets correctly
• Working on a gsub that will take the majority
of batch scripts and run them on the grid
– User doesn’t have to learn JSDL, WSGRAM …
• Unicore-like client GUI would be neat
JSDL (1)
<?xml version="1.0" encoding="UTF-8"?>
<jsdl:JobDefinition xmlns="http://www.example.org/"
xmlns:jsdl="http://schemas.ggf.org/jsdl/2005/11/jsdl"
xmlns:jsdl-posix="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<jsdl:JobDescription>
<jsdl:JobIdentification>
<jsdl:JobName>My Gnuplot invocation</jsdl:JobName>
<jsdl:Description> Simple application invocation:
User wants to run the application 'gnuplot' to
produce a plotted graphical file based on some data
shipped in from elsewhere(perhaps as part of a
workflow). A front-end application will then build
into an animation of spinning data.
Front-end application knows URL for data file which
must be staged-in. Front-end application wants to
stage in a control file that it specifies directly
which directs gnuplot to produce the output files.
In case of error, messages should be produced on
stderr (also to be staged on completion) and no
images are to be transferred.
</jsdl:Description>
JSDL (2)
</jsdl:JobIdentification>
<jsdl:Application>
<jsdl:ApplicationName>gnuplot</jsdl:ApplicationName>
<jsdl-posix:POSIXApplication>
<jsdl-posix:Executable>
/usr/local/bin/gnuplot
</jsdl-posix:Executable>
<jsdl-posix:Argument>control.txt</jsdl-posix:Argument>
<jsdl-posix:Input>input.dat</jsdl-posix:Input>
<jsdl-posix:Output>output1.png</jsdl-posix:Output>
</jsdl-posix:POSIXApplication>
</jsdl:Application>
<jsdl:Resources>
<jsdl:IndividualPhysicalMemory>
<jsdl:LowerBoundedRange>2097152.0</jsdl:LowerBoundedRange>
</jsdl:IndividualPhysicalMemory>
<jsdl:TotalCPUCount>
<jsdl:Exact>1.0</jsdl:Exact>
</jsdl:TotalCPUCount>
</jsdl:Resources>
JSDL (3)
<jsdl:DataStaging>
<jsdl:FileName>control.txt</jsdl:FileName>
<jsdl:CreationFlag>overwrite</jsdl:CreationFlag>
<jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination>
<jsdl:Source>
<jsdl:URI>http://foo.bar.com/~me/control.txt</jsdl:URI>
</jsdl:Source>
</jsdl:DataStaging>
<jsdl:DataStaging>
<jsdl:FileName>input.dat</jsdl:FileName>
<jsdl:CreationFlag>overwrite</jsdl:CreationFlag>
<jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination>
<jsdl:Source>
<jsdl:URI>http://foo.bar.com/~me/input.dat</jsdl:URI>
</jsdl:Source>
</jsdl:DataStaging>
<jsdl:DataStaging>
<jsdl:FileName>output1.png</jsdl:FileName>
<jsdl:CreationFlag>overwrite</jsdl:CreationFlag>
<jsdl:DeleteOnTermination>true</jsdl:DeleteOnTermination>
<jsdl:Target>
<jsdl:URI>rsync://spoolmachine/userdir</jsdl:URI>
</jsdl:Target>
</jsdl:DataStaging>
</jsdl:JobDescription>
</jsdl:JobDefinition>
Portals
•
•
•
•
Web browser is the interface to everything…
Present simple interface to the underlying resource
Good model for many users and applications
Gridsphere adopted as the web grid standard
– GT4 based ngportal VM
• Java cogkit for standalone “portal” apps
– Chemistry Java application with molecular editor being developed
– Desktop job submission tool from iVEC
http://www.grid.apac.edu.au/Services/ProductionPortals
Data services
• Data is hard
– Different communities have different needs
– Complex access controls
• We have gridftp between sites
– Network consistency is interesting …
– Data staging has to
• SRB today; iRODS later
• Dcache, SRM, Gfarm as communities require
• Credible use cases for a global file system?
Registry services
• MDS2, MDS4 running
• About to deploy Modular Information
Provider to present site and aggregated
information more easily
• Using GLUE schema, but it’s far from
satisfactory for describing real-world
production HPC resources
Improved AAA Services
• NCRIS will require e-research services to a much
wider community than traditional HPC
– PKI doesn’t scale and is conceptually difficult for non-IT focused
users
• Australian Access Federation funded (2007-_
• IAM Suite from MELCOE
– Shibboleth authentication plus appropriate attributes generates
short lived certificate (www.identiy
– Tools for users to easily create shared workspaces and manage
attribute release
• Only a few people will need real certificates
• But probably a year away before being ready for
prime time
IAM Suite
Federation
Receive
assertions
AFS adaptor Federation SP
Fedora
(internal or
external,
e.g. IR)
VO-WAYF
GridSphere
VO-IdP
GroupModule
ShARPE
AuthN IM
Autograph
FedoraWeb
MyProxy
GTK
GTK
Storage
Cluster
GTK
Specific
tools
GTK
Presence
PeoplePicker
Calendar
AuthZ Mgnr
Equipm.
VO-SP
VO-SP
Forum
Wiki
VO-SP
VO-SP
LMS
Etc.
www.federation.org.au
Macquarie University’s E-Learning Centre of Excellence (MELCOE)
Erik Vullings
APAC National Grid Status
• Essentially operational
– core services implemented
• APAC CA and myproxy, VOMRS, GT2, GT4, gridsphere, SRB
– some applications close to ‘production’ mode
– See http://goc.grid.apac.edu.au; http://www.grid.apac.edu.au
• Systems coverage
– users can access ALL systems at APAC partners
• via gateways
• from the desktop is needed
– about 4600 processors and 100’s of Tbytes of disk
– around 3Pbytes of disk-cached HSM systems
Future Strategies
• Expand the user base
– NCRIS, Merit Allocation Scheme, Partners
– Open access to core grid services
• Expand the services
– Workflow engines and tools – Kepler, Taverna
– Data management: metadata support
• Expand the facilities
– Include major data centres
• data from instruments, government agencies
– Include institutional systems and repositories
• Resulting changes:
–
–
–
–
Policies: acceptable service provision
Organisation: coordinated user support
Architecture: scaling gateways
Technologies: Attribute-based authorisation
Changing User Base
• National Collaborative Research Infrastructure
Strategy
– Ambitious plan to hand out $0.5B of federal money to fund
research infrastructure collaboratively
•
•
•
•
•
•
•
•
•
•
•
Evolving Biomolecular Platforms and Informatics
Integrated Biological Systems
Characterisation
Fabrication
Biotechnology Products
Networked Biosecurity Framework
Optical and Radio Astronomy
Integrated Marine Observing System
Structure and Evolution of the Australian Continent
Terrestrial Ecosystem Research Network
Population Health and Clinical Data Linkage
$50.0M
$40.0M
$47.7M
$41.0M
$35.0M
$25.0M
$45.0M
$55.2M
$42.8M
$20.0M
$20.0M
•
Platforms for Collaboration
$75.0M
Summer in Australia?
• APAC Nat. Fac. >1600cpu Altix
• Shoulder clusters
E-Reseach
services
Interoperation and Collaboration
Services (ICS)
• Old APAC Grid
Aust. Nat. Data Service (ANDS)
• Federation of Mass Data Stores
• Long term archiving and
curation
National Coordination Council
National Compute Infrastructure (NCI)
Australian Access Federation (AAF),
AREN - Network
New Names and Structures
Bringing it all together real applications
Future Strategies
• Expand the user base
– NCRIS, Merit Allocation Scheme, Partners
– Open access to core grid services
• Expand the services
– Workflow engines and tools – Kepler, Taverna
– Data management: metadata support, collections registry
• Expand the facilities
– Include major data centres
• data from instruments, government agencies
– Include institutional systems and repositories
• Resulting changes:
–
–
–
–
Policies: acceptable service provision
Organisation: coordinated user support
Architecture: scaling gateways
Technologies: Attribute-based authorisation
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Source: Office of Integrative Activities, NSF
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Three “views” of the Grid
“The grid lets me run lots of jobs all over the”
place – Nimrod, Gridbus
“The grid lets me build a workflow that uses”
distributed resources” – Kepler, Taverna
“The grid lets me scale my workstation model
to a supercomputer seamlessly” –DEISA
So grid means different things to different
communities
We must deliver production quality services for all of them?
Grid Collaboration
• Beyond Data and Compute are the
AccessGrid
• Australia has had a long-term commitment to
the AG
• Small highly dispersed population
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
– Queensland has the most distributed population in Aust.
• AG is still burdened by its “antique” media
tools, but the concept is essential
– Skype and IP video conferencing are insufficient
Access Grid - ATP, Sydney
1st in Australia… participating in SC-Global (Nov 2001)
Australia’s 2nd AG node - in Qld.
JCU, Townsville
April 2002
Minister Hon Paul Lucas
AccessGrid
• AccessGrid is now very widely available in
Australia
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
– Most Universities have several nodes
– Extensive use in teaching (AMSI)
• New tools being developed
– HD codes (Chris Willing UQ)
– SRB data grid integration with accessgrid (Atkinson /
Willing)
– International Quality Assurance Program
– Better Multicast / Unicast integration
SRB Browser
Connecting the DataGrid to the AccessGrid
AG Vic video
AG VenueClient
SRB Browser
• AG Shared Application
• New SRB Java/Python
interface library written
(now part of SRB)
• All AG clients can
share in data from
SRB Data store
• Cross platform
• Exposes SRB
metadata
Nigel Bajema
AG Rat audio
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Files moved from SRB to/from
AG data manager
Acknowledgment
• Lindsay Hood, APAC Grid Manager
• Rhys Francis, Former APAC Grid Manager
• David Bannon, VPAC Gateway project
manager
• Rob Woodcock, CSIRO Minerals Exploration