JAS – Distributed Data Analysis
Download
Report
Transcript JAS – Distributed Data Analysis
JAS – Distributed Data
Analysis
Grid Enabled Analysis Workshop
Caltech - June 23-25, 2003
Contents
JAS2
History
client-server mode
JAS2 and the Grid
JAS3
What’s new
JAS3 and AIDA
Plans for Gridification
JAS – Distributed Data Analysis
June 2003
JAS History
First version of JAS2 released in 2000.
Incremental improvements released over time.
JAS – Distributed Data Analysis
June 2003
JAS2 History – Use Cases
With WIRED event display
JAS – Distributed Data Analysis
Online Monitoring
June 2003
JAS2 History – Use Cases
Custom Applications
JAS – Distributed Data Analysis
Web Servlets
June 2003
JAS Client-Server Mode
Data
Analysis
Engine
GUI
Experiment
Java Compiler
Extensions
+
(Event Display)
Debugger
User’s
Java Code
Padded Cell
JAS – Distributed Data Analysis
June 2003
DATA
Distributed Analysis System: Goals
Prototype for GRID enabled JAS analysis
Run analysis on a farm of machines
Use multiple CPU’s in parallel for CPU-intensive analysis
Access multiple I/O channels for data-intensive analysis
Use standard JAS (Client) as if we are running a local Job
Get interactive feedback
Create analysis modules (code)
Control job execution
View results (Plots/Histograms)
Access distributed datasets as if they were local datasets
JAS – Distributed Data Analysis
June 2003
Distributed Analysis System: Architecture
JAS Data
Server
JAS Data
Server
…
JAS Data
Server
Network
Control
Server
Control
Server
Network
JAS
Client
JAS
Client
…
JAS
Client
Users
JAS – Distributed Data Analysis
June 2003
Catalog
Server
JAS 2 – GRID interface (Tech-X)
JAS – Distributed Data Analysis
June 2003
JAS3 Overview
A completely new version of JAS
Design based on Application Shell, into which many (optional)
modules can be plugged
Highly customizable for different application domains
– HEP/Astrophysics/Other
– DST analysis/Online Monitoring/GRID analysis
– Experiment/User specific modules
Modules can be updated independently of shell
– Possible to release bug fixes fast
Includes support for programming in many languages
Scripting: Python, Pnuts, Dynamic Java, ….
– Command prompt
Java (compiled)
Analysis (histograms, tuples, fitting) based on AIDA standard
Not technically backwards compatible with JAS2
But migration is straightforward.
JAS – Distributed Data Analysis
June 2003
AIDA Overview
AIDA = Abstract Interfaces for Data Analysis
Covers key areas for data analysis
Histograms, Tuples, Fitting, Data Points, Plotting, Management
Developed collaboratively at series of workshops by groups at
CERN, LAL, SLAC.
Next workshop June 30-July 4 -- CERN
Interfaces developed for C++ and Java ( and maybe Python?)
Several implementations/tools available
Anaphe/Lizard/LCG PI – CERN
Open Scientist – LAL
JAIDA/JAS/AIDAJNI – SLAC
JAS – Distributed Data Analysis
June 2003
JAS3 and AIDA
JAS3 has adopted AIDA for analysis
AIDA allows us to leverage experience and skill of other developers
AIDA is functionally more complete than JAS2 analysis package
AIDA allows JAS to exchange data with other AIDA tools
AIDA provides bridge to C++ programs (e.g. Geant4)
AIDA encourages creativity and innovation
JAS3 HEP Analysis tools based on JAIDA
JAIDA = Java implementation of AIDA
JAIDA is part of FreeHEP library
Usable as standalone library for any Java Application
AIDAJNI = Interface between C++ and Java AIDA
Allows C++ programs to use JAIDA, JAS3
JAS – Distributed Data Analysis
June 2003
JAS3, AIDA and C++
C++ AIDA
Implementation
AIDA- JNI
AIDA
AIDA
C++ program
.aida file
(XML)
JAIDA
JAS – Distributed Data Analysis
AIDA
Java program
JAS3
June 2003
JAS3 and AIDA
JAS3 supports all AIDA functionality, including
Histograms (includes arithmetic, projections, etc.)
Clouds (unbinned histograms, scatterplots)
Plotter
Tuples
Fitting – AIDA interfaces allow for multiple fitters
Uncmin -- pure java minimizer
Minuit -- Fortran called by Java Native Interface (JNI)
IO
AIDA XML, PAW, Root
JAS3 supports user interaction with AIDA in three ways
Scripting (Pnuts, Python etc)
Compiled (Java) code
GUI – Plotting, Fitting, Cuts etc.
JAS – Distributed Data Analysis
June 2003
JAS3 Scripting
JAS3 has multi-language OO scripting support
Command line, Console, Editor
Major components (e.g. AIDA) have scripting interfaces
Currently have plugins to support
Pnuts – syntax almost identical to Java, fast, well documented
and feature complete
Python (using Jython)
More scripting languages can be added
not restricted to Java implementations (e.g. could use C-Python, JPE)
JAS – Distributed Data Analysis
June 2003
JAS3 Lightning Tour
Tour designed to
give you an
overview of the
capabilities of JAS3,
you can try them
out for yourself this
afternoon.
Welcome Page, gives initial
info and links to example
scripts and programs
JAS – Distributed Data Analysis
Memory
monitor
June 2003
Opening Files
Use file
menu
Drag from
explorer
JAS – Distributed Data Analysis
June 2003
Graphical Interface to AIDA
Histograms,
Clouds, Tuples
all presented in
AIDA tree
.aida files,
.hbook files,
.root files all
presented as
AIDA objects
JAS – Distributed Data Analysis
June 2003
Drag items onto
page, or use
(popup) menus
Printing
Or
copy/paste
into Word,
PowerPoint
etc.
Can send
individual
plots or full
page
direct to
printer
JAS – Distributed Data Analysis
Or save as PS,
EPS, PDF, SWF,
SVG, PNG, GIF…
June 2003
Java Editor, Compiler and Loader
Tree
shows
loaded
programs
Built-in Java
compiler
JAS – Distributed Data Analysis
Built-in
editor for
writing
analysis
code
Unlike JAS2 which only supported “event
analyzers” JAS3 allows any Java program
to be loaded.
This example “main routine” is
June 2003
taken directly from the AIDA manual
Scripting
Can also
write and run
scripts
Console allows
direct interaction
with scripting
language
JAS – Distributed Data Analysis
June 2003
Pnuts Language
Currently support Pnuts scripting language
Complete and well documented
http://javacenter.sun.co.jp/pnuts/doc/guide.html
Fast (although not as fast as compiled Java)
Syntax very similar to Java
Can easily call compiled Java classes from scripts – best of both
worlds
Plan to support other languages in future
In particular Python
JAS – Distributed Data Analysis
June 2003
Record Sources
Opening record
(or event)
based files
causes the run
control toolbar
to appear
JAS – Distributed Data Analysis
Works similarly to JAS2
Job control, but now also
supports random access
and “tagged” data sets
(mainly for event
displays)
June 2003
Tuple Explorer - Plots
Histogram
Works with any tuple,
read from file or
dynamically created
ScatterPlot
XY Data
(More appropriate for
smaller data sets)
JAS – Distributed Data Analysis
June 2003
Profile
Tuple Explorer – Define Columns
JAS – Distributed Data Analysis
June 2003
Tuple Explorer - Cuts
JAS – Distributed Data Analysis
June 2003
Tuple Explorer - Tabulate
JAS – Distributed Data Analysis
June 2003
Tuple Explorer – Record Source
To be used with record loop
JAS – Distributed Data Analysis
June 2003
JAS3 Spreadsheet
Simple spreadsheet plugin
for
Displaying results
Calculations
Simple Plots
Supports reading/writing
.csv files
Excel files
Cut/Paste with Excel etc
Coming Soon…
Scripting interface
GUI for building plots
User defined functions
– Java, scripting
JAS – Distributed Data Analysis
June 2003
Miscellaneous Features
Save/Restore
configuration
User
Preferences
Plugin
Manager
JAS – Distributed Data Analysis
June 2003
Status
Currently released JAS3 version 0.7.1
AIDA functionality is quite solid
Compiler, Loader, Record Loop all quite recently added,
Certainly still some rough edges
Documentation limited but available
Built-in example scripts and programs
Tutorial on web
If you are used to JAS2 you will find some functionality
not yet ported to JAS3
Remote (client/server) access to data.
3D Lego/Surface plots
JAS – Distributed Data Analysis
June 2003
JAS3 and the GRID
We plan to add client-server/distributed capabilities to
JAS3 similar to those in JAS2
Will be based on (distributed) AIDA
Next AIDA workshop (at CERN next week) will discuss this
Want to use Grid standards where they exist
Work with others (PPDG-CS11,???) to define standards where they
do not exist
Want to be compatible with C++ servers
Tech-X have submitted phase II SBIR and if approved will work
closely if approved
JAS – Distributed Data Analysis
June 2003
JAS3 Links, More Info
JAS – Java Analysis Studio - http://jas.freehep.org
JAS3 – http://jas.freehep.org/jas3
JAIDA – http://java.freehep.org/jaida/
AIDA – http://aida.freehep.org
FreeHEP - http://www.freehep.org
FreeHEP Java Libraries - http://java.freehep.org
WIRED – http://wired.freehep.org
JAS – Distributed Data Analysis
June 2003