Interactive Data Analysis on the Grid

Download Report

Transcript Interactive Data Analysis on the Grid

Interactive Data Analysis
on the “Grid”
Tech-X/SLAC/PPDG:CS-11
Balamurali Ananthan ([email protected])
David Alexander ([email protected])
Tony Johnson ([email protected])
Victor Serbo ([email protected])
Presented at Computing in High Energy Physics
Interlaken, Switzerland, September 2004
Focus of our work
Interactive Data Analysis on the Grid


Very quick (<1 second  100’s seconds)
turnaround
Intermediate results presented in real time
Plots update as analysis proceeds
Output from analysis displayed immediately

High degree of interactivity
Change cuts/binning etc. and see immediate results
Goal – seamless interactive computing on
the web.
Starting Point
JAS2 analysis client supports

Local Analysis
Data, analysis code and GUI client live on same machine

Client-Server Analysis
Data and analysis runs on remote machine, GUI client runs on local
machine (uses Java RMI as network protocol)
In 2002 we added GRID based analysis



GUI client runs on local machine
Data and Analysis runs in parallel on a farm of remote machines
Initial implementation used Globus2 + Java RMI
In all three modes goal is for physicist to feel that he is
interacting with his local machine


All three modes look almost identical to use
Try to hide as much of the Grid from the end-user as practical
JAS2 Grid Client
Current Project
Builds on Earlier Work
Grid Services based on OGSI/Globus 3
Switch to using WS-RF (Globus 4?) in future


Reuse existing Globus facilities where possible
Define new services if not already available
Design loosely-coupled services to encourage re-use
Separate interface from implementation


Interfaces: Collaborate with CS-11/PPDG/ARDA
Reference Implementation: JAS-DAGS (Dataset Analysis Grid
Service)
Use JAS3 as reference analysis client
Currently in development

Plan for initial use for International Linear Collider
Simulation Studies
Dataset Catalog Service
First component developed


Interface collaboratively designed as part of
PPDG-CS11 project
Aims to separate interface from
implementation
We have a reference implementation

Based on Java and simple “in-memory” XML database
Designed to make it easy to put same interface on
top of other existing data catalog systems
Has also been deployed as a Clarens service
Dataset Catalog Service
Allows user to
“browse” dataset
hierarchy
Allows user to
“search” using
“meta-data”
associated with
each dataset
Output

Grid Service Handle (GSH) of the Dataset Locator
The Locator service that knows the actual location of the Dataset.

String ID of the Dataset
An opaque string interpreted only by the dataset locator
DAGS
Dataset Analysis Grid Service

Aim to produce complete interactive data analysis
system
Loosely based on CS-11 API’s
Migrate from RMI->OGSA in stages to maintain working system
at each stage
Key design goals

Only requires Globus (+JavaVM) on worker nodes
Everything else dynamically deployed



Specialized analysis services only need to be installed
on specific gateway nodes.
Few services need to be visible outside firewall.
No Grid software on Client node (except Java COG)
WORKER NODE 1
WORKER NODE 2
Analysis
Server
Reliable File
Transfer Service
Analysis
Server
Managed Job
Service
Reliable File
Transfer Service
Managed Job
Service
Caching Service
Data Splitter Service
Result Merging Service
(AIDA based)
Dataset Locator
Service
Analysis Task
Results
Index Service
Firewall
Dataset Catalog
Service
Dataset ID
JAS3 Client
Dataset Analysis
Manager Service
Analysis Job
Description
Dataset query
Data Chooser
Plugin
Proxy Login
Plugin
Results
DAGS client
DAGS
Conceptua
l
Diagram
Performance
JAS2 system used Java Remote Method Invocation (RMI).
Current system still uses RMI in some areas, but intention is
to migrate to OGSA
Performance is a real problem:

Trivial Service Invocation (AuctionService) over 10Mbit LAN

all times for 100 calls, excluding first call
RMI: 100 calls - 96ms
Globus3.2 (non-secure): 100 calls - 22 seconds
Globus3.2 (secure): 100 calls - 112 seconds


Problems may be partly related to Globus implementation, but are
clearly also partly fundamental problems with XML encoding/decoding
and web-service protocol
Possible workarounds
“fast web services”
http://java.sun.com/developer/technicalArticles/WebServices/fast
WS/
or “clarens + xml-rpc”
or …
Plans
Deploy Dataset Catalog Interface with some real data
sources


International Linear Collider Simulation Data
Some interest in interface to POOL
Deploy full DAGS system and try with real users

First target will be linear collider simulation studies
Work on interoperability with other systems



Clarens/Rendezvous service
gLite?
One goal of switching to OGSI was to use interoperable modules
This requires development of “standard” interfaces which provide for
flexibility in the way in which they will be used
It is unclear that the HEP community has the motivation to do this
Conclusion
We are making progress on developing a Globus 3
based interactive data system

Aim to have usable system by end 2004
Globus/OGSI/WS-RF is certainly not the easiest
way to implement interactive data analysis

Performance is a problem
Workarounds exist



not clear if/when this will be addressed by core Globus software
Looking at other technologies for better performance
Interoperability and Component Reuse
Some progress but not so far as effective as was hoped for
Links
DAGS


http://www.slac.stanford.edu/~banantha/dags
http://grid.txcorp.com/dags
CS11

http://www.ppdg.net/pa/ppdg-pa/idat/
JAS3

http://jas.freehep.org/jas3/
AIDA


http://aida.freehep.org/
http://java.freehep.org/JAIDA/
Clarens

http://clarens.sourceforge.net/
Screenshots
Some Screenshots
Starting Work Manager..
Starting Grid Service Manager..
Screenshots (cont…)
Starting MMJFS on the end nodes…
Starting JAS Client..
JAS Client..
Resulting Histogram…