Transcript APAC Grid

Australian Partnership for
Advanced Computing
Partners:
• Australian Centre for Advanced Computing and
Communications (ac3) in NSW
“providing advanced computing and
• The Australian National University (ANU)
grid infrastructure
forIndustrial
eResearch”
• Commonwealth
Scientific and
Research
Organisation (CSIRO)
• Interactive Virtual Environments Centre (iVEC) in WA
• Queensland Parallel Supercomputing Foundation
Rhys Francis
(QPSF)
Manager,Partnership
APAC grid
programComputing
• South Australian
for Advanced
(SAPAC)
• The University of Tasmania (TPAC)
• Victorian Partnership for Advanced Computing (VPAC)
APAC Programs
• National Facility Program
– a world-class advanced computing service
– currently 232 projects and 659 users (27 universities)
– major upgrade in capability (1650 processor Altix 3700 system)
• APAC Grid Program
– integrate the National Facility and Partner Facilities
– allow users easier access to the facilities
– provide an infrastructure for Australian eResearch
• Education, Outreach and Training Program
–
–
–
–
increase skills to use advanced computing and grid systems
courseware project
outreach activities
national and international activities
$8M pa in people
Project
Leader
Plus compute/data
resources
Engineering
Taskforce
Implementation
Taskforce
Research Activities
Research
Leader
Steering
Committee
Activities
140 people
>50 full time equivs
APAC Grid Development
Project
Leader
APAC Grid Operation
Development Activities
Activities
Projects
Grid Infrastructure
 Computing Infrastructure
• Globus middleware
• certificate authority
• system monitoring and management
(grid operation centre)
 Information Infrastructure
• resource broker (SRB)
• metadata management support
(Intellectual Property control)
• resource discovery
 User Interfaces and Visualisation
Infrastructure
• portals to application software
• workflow engines
• visualisation tools
Grid Applications
 Astronomy
 High-Energy Physics
 Bioinformatics
 Computational Chemistry
 Geosciences
 Earth Systems Science
Organisation Chart
APAC
Executive Director
John O’Callaghan
Strategic
Program Manager
Management
Rhys Francis
Services Architect
Markus Buchhorn
Infrastructure Support
(Middleware)
Project
Compute Infrastructure
Middleware
Services
Deployment
CA
VOMS/VOMRS
Leader
David Bannon
Information Infrastructure Ben Evans
UI&VI
Rajesh Chabbra
Collaboration Services
Chris Willing
Name
Youzhen Cheng
Bob Smart
Martin Nicholls
John Dalton
David Green
Ashley Wright
Gram2/4
SRB
GridFTP
MDS2/4
Gridsphere
Myproxy
A/G
Partner
Name
ac3
David Baldwyn
CSIRO
Darran Carey
QPSF/UQ
Grant Ward
TPAC
Chris Samuel
Associated grid nodes
QPSF/Griffith Ian Atkinson
QPSF/QUT
Marco La Rosa
Gateway VM
NG1, NG2
NGdata
Partner
ANU
iVEC
SAPAC
VPAC
QPSF/JCU
UoM
Research
Leader
Applications
Susan Scott
Matthew Bailes
Project
Astronomy Gravity Wave
Astrophysics portal
Australian Virtual Observatory
Genome annotation
Molecular docking
Chemistry workflow
Earth Systems Science workflow
Geosciences workflow
EarthBytes
Experimental high energy physics
Theoretical high energy physics
Remote instrument management
Systems
Management
NGportal
Application
Support
Gateway Servers
Support Team
David Bannon
Infrastructure Support
(Systems)
LCG VM
Marco La Rosa
Katherine Manson
Matthew Bellgard
Rajkumar Buyya
Andrey Bliznyuk
Glenn Hyland
Robert Woodcock
Dietmar Muller
Glenn Moloney
Paul Coddington
Chris Willing
S/C Chair
Rachael Webster
Rachael Webster
Rachael Webster
Mark Ragan
Mark Ragan
Brian Yates
Andy Pitman
Scott McTaggart
Scott McTaggart
Tony Williams
Tony Williams
Bernard Pailthorpe
Experimental High Energy Physics
• Belle Physics Collaboration
– K.E.K. B-factory detector
• Tsukuba, Japan
– Matter/Anti-matter investigations
– 45 Institutions, 400 users worldwide
• 10 TB data currently
– Australian grid for KEK-B data
• testbed demonstrations
• data grid centred on APAC National Facility
• Atlas Experiment
– Large Hadron Collider (LHC) at CERN
• 3.5 PB data per year (now 15 PB pa)
• operational in 2007
– Installing LCG (GridPP), will follow EGEE
Belle Experiment
• Simulated collisions or events
– used to predict what we’ll see (features of data)
– essential to support design of systems
– essential for analysis
• 2 million lines of code
Belle simulations
• Computationally intensive
– simulate beam particle collisions, interactions, decays
– all components and materials : 10x10x20 m, 100 µm
accuracy
– tracking and energy deposition through all
components
– all electronics effects (signal shapes, thresholds,
noise, cross-talk)
– data acquisition system (DAQ)
• Need 3 times as many simulations as real
events to reduce statistical fluctuations
Belle status
• Apparatus at KEK in Japan
• Simulation work done world wide
• Shared using an SRB federation: KEK, ANU, VPAC,
Korea, Taiwan, Krakow, Beijing…(led by Australia!)
• Previous research work used script based workflow
control, project is currently evaluating LCG middleware for
workflow management
• Testing in progress: LCG job management, APAC grid job
execution (2 sites), APAC grid SRB data management (2
sites) with data flow using international SRB federations
• Limitation is international networking
Earth Systems Science Workflow
• Access to Data Products
– Inter-governmental Panel Climate
Change scenarios of future climate
(3TB)
– Ocean Colour Products of
Australasian and Antarctic region
(10TB)
– 1/8 degree ocean simulations (4TB)
– Weather research products (4TB)
– Earth Systems Simulations
– Terrestrial Land Surface Data
• Grid Services
–
–
–
–
Globus based version of OPeNDAP (UCAR/NCAR/URI)
Server side analysis tools for data sets: GRADS, NOMADS
Client side visualisation from on-line servers
THREDDS (catalogues of OPeNDAP repositories)
Workflow Vision
Discovery
Analysis Toolkit
Visualisation
Crawler
Job/Data Management
OPeNDAP
IVEC
SAPAC
AC3
VPAC
APAC NF
Digital Library
Workflow Components
Gridsphere
Portal
Discovery
Portlet
Web
Services
Application
Layer
Data
Layer
Hardware
Layer
OAI
Library API
(Java)
Meta
data
Crawler
Visualisation
Portlet
Get Data
Portlet
Analysis Toolkit
Portlet
Web Map
Service
Web Coverage
Service
Web Processing
Service
Live Access
Server (LAS)
OPeNDAP
Server
Processing
App.
Meta
data
Digital Repository
Config
Compute Engine
OPeNDAP Services
AC3 Facility (Sydney)
Land surface datasets
APAC NF (Canberra)
International IPCC model results (10-50Tb)
TPAC 1/8 degree ocean simulations (7Tb)
Met Bureau Research Centre (Melbourne)
Near real-time LAPS analyses products (<1Gb)
Sea- and sub-surface temperature products
CSIRO HPSC (Melbourne)
IPCC CSIRO Mk3 model results (6Tb)
CSIRO Marine Research (Hobart)
Ocean colour products & climatologies (1Tb)
Satellite altimetry data (<1Gb)
Sea-surface temperature product
TPAC & ACE CRC (Hobart)
NCEP2 (150Gb), WOCE3 Global (90Gb)
Antarctic AWS (150Gb), Climate modelling (4Gb)
Sea-ice simulations, 1980-2000
Australian Virtual Observatory
User
SSA
get( )
MCAT
AVD
SRB
Registry
SRB
SSA
SIA
Data
APAC Grid Geoscience
•
•
•
•
•
•
•
Conceptual models
Databases
Modeling codes
Mesh generators
Visualization packages
People
High Performance
Computers
• Mass Storage
Facilities
Atmosphere
Biosphere
Upper
Crust
Oceans
weathering
lower Crust
Subcontinental
lithosphere
Sediments
Oceanic
Crust
Oceanic Lithosphere
Upper
Mantle
Deep Mantle
Core
Mantle Convection
• Observational Databases
– access via SEE Grid Information
Services standards
• Earthbytes 4D Data Portal
– Allows users to track observations
through geological time and use them
as model boundary conditions and/or
to validate process simulations.
• Mantle Convection
– solved via Snark on HPC resources
• Modeling Archive
– stores the problem description so they
can be mined and audited
Trial application provided by:
•D. Müller (Univ. of Sydney)
•L. Moresi (Monash Univ./MC2/VPAC)
Workflows and services
User
Login
Edit Problem
Description
Run
Simulation
Job
Monitor
Archive
Search
Data Management
Service
Resource
Registry
AAA
Service Registry
Geology W.A
Geology S.A
Rock Prop.
W.A
Rock Prop.
N.S.W
Local
Repository
Results
Archive
Job
Management
Service
EarthBytes
Service
Snark Service
HPC Repository
Status update
APAC National Grid
Key steps
• Implementation of our own CA
• Adoption of VDT middleware packaging
• Agreement to a GT2 base for 2005, GT4 in 2006
• Agreement on portal implementation technology
• Adoption of federated SRB as base for shared data
• Development of gateways for site grid architecture
• Support for inclusion of ‘associated’ systems
• Implementation of VOMS/VOMRS
• Development of user and provider policies
VDT components
Apache HTTPD, v2.0.54
GSI-Enabled OpenSSH, v3.5
Apache Tomcat, v4.1.31
Java SDK, v1.4.2_08
Apache
Tomcat,
jClarens,
DOE and
LCGv5.0.28
CA Certificates v4 (includes
LCGv0.6.0
0.25 CAs)
Clarens,
v0.7.2
jClarens
Web
Service
Registry,
v0.6.0
GriPhyN Virtual Data System (containing Chimera and
Pegasus)
1.2.14
ClassAds,
v0.9.7
JobMon, v0.2 script
Condor/Condor-G
6.6.7 VDT Condor configuration
Condor/Condor-G,
v6.7.12
KX509, v20031111
Fault Tolerant Shell
(ftsh) 2.0.5
VDT Condor
scriptVDT Globus
Monalisa,
v1.2.46
Globus
Toolkitconfiguration
2.4.3 + patches
configuration
script
DOE
andSchema
LCG CA1.1,
Certificates,
vv4
GLUE
extended
version 1MyProxy, v2.2
(includes
LCG 0.25 CAs)
MySQL,
GLUE Information
Providers CVS version
1.79,v4.0.25
4-April-2004
DRM,
v1.2.9
Nest,
v0.9.7-pre1
EDG Make Gridmap 2.1.0
EDG
CRL
Update,
v1.2.5
Netlogger, v3.2.4
EDG
CRL
Update
1.2.5
EDG
Make Gridmap,
v2.1.03.4
PPDG Cert Scripts, v1.6
GSI-Enabled
OpenSSH
Fault
Tolerant
Shell (ftsh), v2.0.12
PRIMA Authorization Module, v0.3
Java
SDK 1.4.2_06
Generic
PyGlobus, vgt4.0.1-1.13
KX509Information
2031111 Provider, v1.2
(2004-05-18)
RLS, v3.0.041021
Monalisa 1.2.12
gLite
CE Monitor,
SRM Tester, v1.0
MyProxy
1.11 v1.0.2
Globus
Toolkit,
pre web-services,
UberFTP, v1.15
PyGlobus
1.0.6
v4.0.1
+ patches
Virtual Data System, v1.4.1
UberFTP
1.3
Globus
Toolkit, web-services, v4.0.1
VOMS, v1.6.7
RLS 2.1.5
GLUE
Schema,
v1.2 draft 7
VOMS Admin (client 1.0.7, interface
ClassAds
0.9.7
Grid
User Management
System
1.0.2, server 1.1.2), v1.1.0-r0
Netlogger
2.2
(GUMS), v1.1.0
Our most important design decision
Cluster
Cluster
Installing Gateway Servers at all grid
sites, using VM technology to support
multiple grid stacks
Datastore
Gateway Server
High bandwidth, dedicated private
networking between grid sites
V-LAN
Gateway Server
Gateways will support, GT2, GT4,
LCG/EGEE, Data grid (SRB etc),
Production Portals, development
portals, experimental grid stacks
Datastore
Cluster
Cluster
Gateway Systems
• Support the basic operation of the APAC National Grid
and translate grid protocols into site specific actions
– limit the number of systems that need grid components
installed and managed
– enhance security as many grid protocols and
associated ports only need to be open between the
gateways
– in many cases only the local gateways need to interact
with site systems
– support roll-out and control of production grid
configuration
– support production and development grids and local
experimentation using Virtual Machine implementation
Grid pulse – every 30 minutes
NG1 – globus toolkit 2 services
Gateway Down [email protected]
ANU
Gateway Up [email protected]
iVEC
Gateway Up [email protected]
Gateway Up [email protected]
VPAC
Gateway Up [email protected]
NG2 – globus toolkit 4 services
Gateway Down [email protected]
iVEC
Gateway Up [email protected]
SAPAC (down)
Gateway Up [email protected]
Gateway Up [email protected]
VPAC
Gateway Down [email protected]
NGDATA – SRB & GridFTP
Gateway Down [email protected]
ANU
Gateway Up [email protected]
iVEC
Gateway Up [email protected]
VPAC (down)
Gateway Up [email protected]
Gateway Up [email protected]
NGLCG – special physics stack
Gateway Up [email protected]
VPAC
Gateway Up [email protected]
NGPORTAL – apache/tomcat
Gateway Down [email protected]
iVEC
Gateway Up [email protected]
VPAC
http://goc.vpac.org/
A National Grid
+3500 processors
+3PB near line
storage
Townsville
QPSF
Brisbane
Perth
IVEC
CSIRO
Adelaide
Canberra
SAPAC
ANU
Sydney
ac3
GrangeNet Backbone
Centie/GrangeNet Link
AARNet Links
Melbourne
VPAC
CSIRO
Hobart
TPAC
CSIRO
Significant Resource Base
Mass stores (15TB cache, 200+ TB holdings, 3PB capacity)
• ANU 5+1300 TB CSIRO 5+1300 TB plus several 70-100 TB stores
Compute Systems (aggregate 3500+ processors)
•
•
•
•
•
•
•
•
•
•
•
Altix 1,680 1.6 GHz Itanium-II
NEC 168 SX-6 vector cpus
IBM 160 Power 5 cpus
2 x Altix 160 1.6 GHz Itanium-II
2 x Altix 64 1.5 GHz Itanium-II
Altix 128 1.3 GHz Itanium-II
374 x 3.06 GHz Xeon
258 x 2.4 GHz Xeon
188 x 2.8 GHz Xeon
168 x 3.2 GHz Xeon
152 x 2.66 GHz P4
3.6 TB
1.8 TB
432 GB
160 GB
120 GB
180 GB
374 GB
258 GB
160 GB
224 GB
153 GB
120 TB disk
22 TB disk
NUMA
5TB disk, NUMA
Gigabit Ethernet
Myrinet
Myrinet
GigE, 28 with infiniband
16TB disk, GigE
Functional decomposition
Resources
Data
Compute
Users
Monitoring
Constraints
Activities
Interfaces
Progress
Monitoring
Authorisation
(Policy &
Enforcement)
Global resource
allocation and
scheduling
Command line
access to
Resources
Resource
Discovery
VO Mgmt (Rights,
Shares,
Delegations)
Workflow
Processing (Job
execution)
Portals, workflow
Grid
Interfaces
Resource
Availability
Accounting
Application
development
Portal for Grid
Mgmt (GOC)
Data
Movement
Queues
Resource
Registration
Configuration
Mgmt
Data and
Metadata Mgmt
(Curation)
AccessGrid
interaction
Files, DBs,
Streams
Binaries,
Libraries,
Licenses
Authentication
(Identity Mgmt)
Reporting,
analysis and
summarisation
3rd party GUIs for
applications and
activities
Grid Staging and Execution
Access
Services
History,
Auditing
Operating Systems and Hardware
Firewalls, NATs and Physical Networks
Security: agreements, obligations, standards, installation, configuration, verification
1
2
3
4
5
6
Resources
Data
Compute
Users
Monitoring
Constraints
Activities
Interfaces
Progress
Monitoring
Authorisation
(Policy &
Enforcement)
Global resource
allocation and
scheduling
Command line
access to
Resources
Resource
Discovery
VO Mgmt (Rights,
Shares,
Delegations)
Workflow
Processing (Job
execution)
Portals,
workflow
Grid
Interfaces
Resource
Availability
Accounting
Application
development
Portal for Grid
Mgmt (GOC)
Data
Movement
Queues
Resource
Registration
Configuration
Mgmt
Data and
Metadata Mgmt
(Curation)
AccessGrid based
interaction
Files, DBs,
Streams
Binaries,
Libraries,
Licenses
Authentication
(Identity Mgmt)
Reporting,
analysis and
summarisation
3rd party GUIs for
applications and
activities
Grid Staging and Execution
Access
Services
History,
Auditing
Operating Systems and Hardware
Firewalls, NATs and Physical Networks
Security: agreements, obligations, standards, installation, configuration, verification
APAC National Grid
one virtual
system of
computational
facilities
IVEC
SAPAC
QPSF
AC3
APAC
NATIONAL
FACILITY
ANU
CSIRO
TPAC
VPAC
• Basic Services
–
–
–
–
single ‘sign-on’ to the facilities
portals to the computing and data systems
access to software on the most appropriate system
resource discovery and monitoring