JAS – Distributed Data Analysis

Download Report

Transcript JAS – Distributed Data Analysis

JAS – Distributed Data
Analysis
Grid Enabled Analysis Workshop
Caltech - June 23-25, 2003
Contents
 JAS2
 History
 client-server mode
 JAS2 and the Grid
 JAS3
 What’s new
 JAS3 and AIDA
 Plans for Gridification
JAS – Distributed Data Analysis
June 2003
JAS History
 First version of JAS2 released in 2000.
 Incremental improvements released over time.
JAS – Distributed Data Analysis
June 2003
JAS2 History – Use Cases
 With WIRED event display
JAS – Distributed Data Analysis
 Online Monitoring
June 2003
JAS2 History – Use Cases
 Custom Applications
JAS – Distributed Data Analysis
 Web Servlets
June 2003
JAS Client-Server Mode
Data
Analysis
Engine
GUI
Experiment
Java Compiler
Extensions
+
(Event Display)
Debugger
User’s
Java Code
Padded Cell
JAS – Distributed Data Analysis
June 2003
DATA
Distributed Analysis System: Goals
 Prototype for GRID enabled JAS analysis
 Run analysis on a farm of machines
Use multiple CPU’s in parallel for CPU-intensive analysis
Access multiple I/O channels for data-intensive analysis
 Use standard JAS (Client) as if we are running a local Job
 Get interactive feedback
Create analysis modules (code)
Control job execution
View results (Plots/Histograms)
 Access distributed datasets as if they were local datasets
JAS – Distributed Data Analysis
June 2003
Distributed Analysis System: Architecture
JAS Data
Server
JAS Data
Server
…
JAS Data
Server
Network
Control
Server
Control
Server
Network
JAS
Client
JAS
Client
…
JAS
Client
Users
JAS – Distributed Data Analysis
June 2003
Catalog
Server
JAS 2 – GRID interface (Tech-X)
JAS – Distributed Data Analysis
June 2003
JAS3 Overview
 A completely new version of JAS
 Design based on Application Shell, into which many (optional)
modules can be plugged
Highly customizable for different application domains
– HEP/Astrophysics/Other
– DST analysis/Online Monitoring/GRID analysis
– Experiment/User specific modules
Modules can be updated independently of shell
– Possible to release bug fixes fast
 Includes support for programming in many languages
Scripting: Python, Pnuts, Dynamic Java, ….
– Command prompt
Java (compiled)
 Analysis (histograms, tuples, fitting) based on AIDA standard
 Not technically backwards compatible with JAS2
 But migration is straightforward.
JAS – Distributed Data Analysis
June 2003
AIDA Overview
 AIDA = Abstract Interfaces for Data Analysis
 Covers key areas for data analysis
Histograms, Tuples, Fitting, Data Points, Plotting, Management
 Developed collaboratively at series of workshops by groups at
CERN, LAL, SLAC.
Next workshop June 30-July 4 -- CERN
 Interfaces developed for C++ and Java ( and maybe Python?)
 Several implementations/tools available
Anaphe/Lizard/LCG PI – CERN
Open Scientist – LAL
JAIDA/JAS/AIDAJNI – SLAC
JAS – Distributed Data Analysis
June 2003
JAS3 and AIDA
 JAS3 has adopted AIDA for analysis
 AIDA allows us to leverage experience and skill of other developers
 AIDA is functionally more complete than JAS2 analysis package
 AIDA allows JAS to exchange data with other AIDA tools
 AIDA provides bridge to C++ programs (e.g. Geant4)
 AIDA encourages creativity and innovation
 JAS3 HEP Analysis tools based on JAIDA
 JAIDA = Java implementation of AIDA
JAIDA is part of FreeHEP library
Usable as standalone library for any Java Application
 AIDAJNI = Interface between C++ and Java AIDA
Allows C++ programs to use JAIDA, JAS3
JAS – Distributed Data Analysis
June 2003
JAS3, AIDA and C++
C++ AIDA
Implementation
AIDA- JNI
AIDA
AIDA
C++ program
.aida file
(XML)
JAIDA
JAS – Distributed Data Analysis
AIDA
Java program
JAS3
June 2003
JAS3 and AIDA
 JAS3 supports all AIDA functionality, including
 Histograms (includes arithmetic, projections, etc.)
 Clouds (unbinned histograms, scatterplots)
 Plotter
 Tuples
 Fitting – AIDA interfaces allow for multiple fitters
Uncmin -- pure java minimizer
Minuit -- Fortran called by Java Native Interface (JNI)
 IO
AIDA XML, PAW, Root
 JAS3 supports user interaction with AIDA in three ways
 Scripting (Pnuts, Python etc)
 Compiled (Java) code
 GUI – Plotting, Fitting, Cuts etc.
JAS – Distributed Data Analysis
June 2003
JAS3 Scripting
 JAS3 has multi-language OO scripting support
 Command line, Console, Editor
 Major components (e.g. AIDA) have scripting interfaces
 Currently have plugins to support
 Pnuts – syntax almost identical to Java, fast, well documented
and feature complete
 Python (using Jython)
 More scripting languages can be added
not restricted to Java implementations (e.g. could use C-Python, JPE)
JAS – Distributed Data Analysis
June 2003
JAS3 Lightning Tour
 Tour designed to
give you an
overview of the
capabilities of JAS3,
you can try them
out for yourself this
afternoon.
Welcome Page, gives initial
info and links to example
scripts and programs
JAS – Distributed Data Analysis
Memory
monitor
June 2003
Opening Files
Use file
menu
Drag from
explorer
JAS – Distributed Data Analysis
June 2003
Graphical Interface to AIDA
Histograms,
Clouds, Tuples
all presented in
AIDA tree
.aida files,
.hbook files,
.root files all
presented as
AIDA objects
JAS – Distributed Data Analysis
June 2003
Drag items onto
page, or use
(popup) menus
Printing
Or
copy/paste
into Word,
PowerPoint
etc.
Can send
individual
plots or full
page
direct to
printer
JAS – Distributed Data Analysis
Or save as PS,
EPS, PDF, SWF,
SVG, PNG, GIF…
June 2003
Java Editor, Compiler and Loader
Tree
shows
loaded
programs
Built-in Java
compiler
JAS – Distributed Data Analysis
Built-in
editor for
writing
analysis
code
Unlike JAS2 which only supported “event
analyzers” JAS3 allows any Java program
to be loaded.
This example “main routine” is
June 2003
taken directly from the AIDA manual
Scripting
Can also
write and run
scripts
Console allows
direct interaction
with scripting
language
JAS – Distributed Data Analysis
June 2003
Pnuts Language
 Currently support Pnuts scripting language
 Complete and well documented
http://javacenter.sun.co.jp/pnuts/doc/guide.html
 Fast (although not as fast as compiled Java)
 Syntax very similar to Java
 Can easily call compiled Java classes from scripts – best of both
worlds
 Plan to support other languages in future
 In particular Python
JAS – Distributed Data Analysis
June 2003
Record Sources
Opening record
(or event)
based files
causes the run
control toolbar
to appear
JAS – Distributed Data Analysis
Works similarly to JAS2
Job control, but now also
supports random access
and “tagged” data sets
(mainly for event
displays)
June 2003
Tuple Explorer - Plots
Histogram
Works with any tuple,
read from file or
dynamically created
ScatterPlot
XY Data
(More appropriate for
smaller data sets)
JAS – Distributed Data Analysis
June 2003
Profile
Tuple Explorer – Define Columns
JAS – Distributed Data Analysis
June 2003
Tuple Explorer - Cuts
JAS – Distributed Data Analysis
June 2003
Tuple Explorer - Tabulate
JAS – Distributed Data Analysis
June 2003
Tuple Explorer – Record Source
To be used with record loop
JAS – Distributed Data Analysis
June 2003
JAS3 Spreadsheet
 Simple spreadsheet plugin
 for
 Displaying results
 Calculations
 Simple Plots
 Supports reading/writing
 .csv files
 Excel files
 Cut/Paste with Excel etc
 Coming Soon…
 Scripting interface
 GUI for building plots
 User defined functions
– Java, scripting
JAS – Distributed Data Analysis
June 2003
Miscellaneous Features
Save/Restore
configuration
User
Preferences
Plugin
Manager
JAS – Distributed Data Analysis
June 2003
Status
 Currently released JAS3 version 0.7.1
 AIDA functionality is quite solid
 Compiler, Loader, Record Loop all quite recently added,
Certainly still some rough edges
 Documentation limited but available
 Built-in example scripts and programs
 Tutorial on web
 If you are used to JAS2 you will find some functionality
not yet ported to JAS3
 Remote (client/server) access to data.
 3D Lego/Surface plots
JAS – Distributed Data Analysis
June 2003
JAS3 and the GRID
 We plan to add client-server/distributed capabilities to
JAS3 similar to those in JAS2
 Will be based on (distributed) AIDA
Next AIDA workshop (at CERN next week) will discuss this
 Want to use Grid standards where they exist
Work with others (PPDG-CS11,???) to define standards where they
do not exist
 Want to be compatible with C++ servers
 Tech-X have submitted phase II SBIR and if approved will work
closely if approved
JAS – Distributed Data Analysis
June 2003
JAS3 Links, More Info
 JAS – Java Analysis Studio - http://jas.freehep.org
 JAS3 – http://jas.freehep.org/jas3
 JAIDA – http://java.freehep.org/jaida/
 AIDA – http://aida.freehep.org
 FreeHEP - http://www.freehep.org
 FreeHEP Java Libraries - http://java.freehep.org
 WIRED – http://wired.freehep.org
JAS – Distributed Data Analysis
June 2003