presentation
Download
Report
Transcript presentation
Interactive Data Analysis on the Grid
with JAS and Globus
David Alexander, Brian Miller, & John Exby
Tech-X Corporation (www.techxhome.com)
Boulder, Colorado
Tony Johnson, Massimiliano Turri, & Booker Bense
Stanford Linear Accelerator Center
Menlo Park, California
Supported by U.S. Department of Energy
Small Business Innovative Research Grant DE-FG03-02ER83556
and Stanford Linear Accelerator Center
TechXHome.com
Project Overview
• Started with Java Analysis Studio (JAS)
– Has distributed analysis system based on RMI
• Set up test grids on Linux clusters
– Used Globus Toolkit 2.0
– Each node had GRAM & GridFTP servers and Java
Runtime Environment
• Wrote a JAS grid plug-in
– Used Java CoG Kit 0.9
• Demonstrated at SC2002
– Hit remote and on-site cluster
TechXHome.com
Java Analysis Studio (JAS)
jas.freehep.org
• Open source application
– Built for interactive data analysis, but flexible & modularized
• Publication quality plotting facilities
• User writes Java code to analyze data
TechXHome.com
Java Analysis Studio (JAS)
jas.freehep.org
• Abstracted data source interface
– Modules are written to work with a variety of file formats (PAW, HIPPO, AIDA, Root,
ODBC, flat files, SIO, HEP)
• Distributed System Available
• Versatile & Well used in high energy physics
– Pure Java (Portable, Web Start installation & upgrade)
– Flexible topology (stand-alone, client/server, cluster)
– Integration w/ BaBar, Geant4, Wired
TechXHome.com
Design Ideas & Added Features
•Goal: clustered deployment, launch, &
federation
•Special JAS Job use
•Minimal prerequisites:
–Bare grid: Globus, Java, nothing else
–Heterogeneous cluster
–Off-grid (or not) client, data, codebase
–Clients don’t need to be superusers
•Optional background deployment
•Single sign on
TechXHome.com
About Resource Discovery
• Resource discovery
– Software needs location of data files
– Software needs location of Java-enabled
hosts
– Pluggable LDIF source (MDS, URL of text file)
• Community Authorization Service
– Fine-grained access control
– Is resource discovery in a way
Move code to data with GridFTP
•Location transparency
–User sees data sets
–Could also have user choice
•Automatic deployment of JAS
–Multi-threaded task set
–Verification of code version,
GridFTP codebase to node if new
–GridFTP/link data to user sandbox
–Deploy control and catalog
servers only on cluster head node
–Worker nodes wait for catalog
server to run
TechXHome.com
Launch Application with GlobusRun
•Automatic launch of Java servers
–Java Data Servers are run on specified JRE-enabled nodes
•Special Grid Job is now started (exit the Wizard)
•Code loaded into client or written in editor
-compiled
-automatically distributed to Java Data Servers
-results (std out, std err, & histograms) sent back
TechXHome.com
A few more Impressive Features
•User can stop analysis, change code, & restart.
•Distributed debugging can catch individual node failures.
•Histogram re-bin slider surprisingly responsive
TechXHome.com
Headaches and Issues
•Versions of Globus vs. Java CoG Kit
•CoG properties configuration
•Client & server clocks disagree
•MS-Windows text line breaks
•Abandoned jobs
•Firewalls
TechXHome.com
Future Ideas
•Upgrade to Globus Toolkit 3
•Pre-install code on cluster head or portal
machine and deploy from there
•Use more grid services (Condor, Replica)
•Implement interfaces or service
descriptions from PPDG CS-11 group.
TechXHome.com
Further Information on JAS
for the latest on JAS see the 3pm Catogory 9 paper
JAS3 - A general purpose data analysis framework
for HENP and beyond.
CONTACTS
David Alexander, [email protected]
Brian Miller, [email protected]
Tony Johnson, [email protected]
Massimiliano Turri, [email protected]
Java Analysis Studio, http://jas.freehep.org