Transcript PPT

Java Analysis Studio &
Object Oriented Data Analysis (in Java)
KEK
25th May 2000
Tony Johnson - SLAC
[email protected]
Contents
Overview of Java
Why Java for Data Analysis
Java Analysis Studio
Recently added features
Using Java for Reconstruction
Linear Collider Simulation Framework
Is Java fast enough for Data Analysis?
HEP-wide java libraries
Conclusions
Demo
History of Java
1991 James Gosling at Sun creates Java language (née Oak)
Targeted at consumer electronics - cable top boxes, VCR, TV etc.
Goal was reliability not speed
1994 Hot Java Web browser written (in Java)
Supports Applets - Downloadable programs that run inside web browser
Java licensed by Netscape, Oracle, Microsoft many others
• Huge hype surrounding “Web Programming language”
1997 Java 1.1 released with many standard libraries
Sun’s mantra becomes “Write Once Run Anywhere”
Enthusiastically supported by all major hardware and many software vendors
Microsoft begins to have second thoughts
1998 Java 2 released, even more standard libraries
Now truly general purpose language
Sun (and DOJ) sue Microsoft
Java Architecture
Java Source code
More than just a Web Tool
Java is a fully functional, platform
independent, object-oriented language
Powerful set of machine independent libraries,
including GUI library.
Compiler
Java “Bytecodes”
Totally Buzzword Compliant
Simple, Object Orientated, Distributed,
Dynamic, Robust, Secure, Architecture Neural,
Portable, High Performance, Multithreaded.
Interpreted?
Mac
Compiled + Interpreted.
Bytecode
Dynamic Optimization may make Java
faster than statically compiled languages Interpreter
(in principle).
Unix
PC
JIT
Compiler
Machine Code
Java Features
Simple
But not trivial…you need to read a book
• Syntax very close to C++
 No backwards compatibility issues
 Some features of C++ which add undue complexity dropped.
 Good stepping stone to (or from) C++
• Clean and Efficient Object-Oriented Language
Language features guide programmer toward reliable programming habits
Robust
• Extensive Compile-Time checking of code
• Second level of run-time checking of code
• Memory management done by system, not by programmer
• No pointers to mess up (Java uses references rather than pointers)
Chances of program running as designed without the need for timeconsuming debugging is greatly increased.
Java Features (continued)
Highly Portable
Java works today on NT, Win95/98, Unix (including Linux), Mac, VMS
• Personal Java - Windows CE, Palm Pilot
Programs written in Java are very portable
• Move to another platform and it just works
 Care needed with AWT GUI components (obsolete) and web browsers
Lifetime of HEP experiments > OS lifetime.
• Lifetime of Java > Lifetime of HEP experiment??
Encourages true modularity
Build entire framework for HEP experiment in Java
Abstract away underlying systems (batch system, IO system etc.)
Java Features (continued)
Distributed
Built in support for Internet protocols, URL’s, HTTP, Remote Method
Invocation, Corba, Database access etc.
Secure
Bytecode “verifier”, padded cell (c.f. Web Browser)
Multithreaded
Language has direct support for multithreading
Dynamic
Libraries can change without recompiling programs that use them
Can dynamically load and unload code during program execution
Can move objects across the network (agents), or store them in databases
and retrieve them later.
Java Libraries and API’s
Standard Libraries and API’s
2D + 3D graphics + GUI (Swing) + Imaging + Printing
Database connectivity (JDBC) + ODMG
Collections, IO (Serialization), Data Compression
Networking, Sockets, SSL, Corba, RMI
Java Beans (components), Help
Multimedia, Sound, Speech
Security, Code Signing, Cryptography
Math, Arbitrary Precision Math
Shared Data (Collaborative Applications)
Huge “Community-Ware” software archive
IBM alone has hundreds of Java resources on its Alphaworks site
Java Tools
Popularity of Java = many
tools
• And they are cheap (or even
free)
Development Environments
(IDE’s)
• Editor, Compiler, Debugger,
WYSIWYG GUI designer,
Source control
Automatic Documentation
generators
Memory and CPU Optimizers
• Since debugging time is
minimal you might actually
have time to use them
Object Modelers
Java Limitations?
No operator overloading
Annoying for complex numbers, matrices, 3/4-vectors
Perhaps more often abused than sensibly used
Lightweight Objects (value semantics) may overcome this
Bugs sometimes slow to be fixed
Printing, Imaging existed for >1 year
Perhaps “Community Source License” will help
Little control over Memory Allocation
Integration with C++ could be better
Standardization lacking
Sun had promised to submit Java to ISO for
standardization, but has so far failed to deliver
Why Java for HEP Computing?
Previous generation of experiments
used Fortran + Data Management
System (== Jazelle, Zebra, BOS)
Solves Three Problems
Ability to Represent Complex
Data Structures
Persistence (i.e. read in and
write out complex structures)
Run time access to named data
in structures (for analysis)
Now time has marched on and
modern experiments use C++
 Represent Complex Data
 Persistence
 Run time access to data
Still need to build (or buy and
deploy) data management
system (e.g. Root, Objectivity)
Java
 Represent Complex Data
 Persistence (serialization)
 Run time access to data
(reflection)
support built-in to language
Where would HEP use Java?
 GUI systems
 online + control (not really any alternative)
 Event Display
 Reconstruction+Simulation packages?
 Data Analysis tasks
 Offline
 Online
 Event Generators
Java Analysis Studio
Experiment independent analysis tools for High
Energy Physics data
Introduction to JAS
JAS starts from experience with SLD interactive
data analysis
IDA (Toby Burnett) + SLD extensions
Integrates ideas from
• Reason, Hippodraw, LHC++, Histoscope, …
Exploit advantages of Java
• Cross platform, dynamic loading, GUI, many standard API’s –
networking, HTML, etc.
Aim is to solve real life physicist problems
Want to get input from as many people as possible.
System is flexible enough to change.
JAS Overview
Modular Java Toolkit for Analysis of HEP data
Data Format Independent
Experiment Independent
Supports arbitrarily complex analysis modules written in
Java
Rich Graphical User Interface (GUI) with:
•
•
•
•
•
Data Explorer
Flexible Histogram + Scatterplot display
Histogram manipulation+fitting
Built-in Editor/Compiler (for writing analysis modules)
Extensible via plugins
User extensible via Object Orientated API's
Written entirely in Java so will run on any platform with a
Java VM (JDK 1.1 or better)
• Support: Windows 95/98/NT/2000 + Linux + Solaris
• Works on: DEC + SGI + Mac
JAS Components
Plugin
Histo/Plot
Adaptor
GUI
Framework
Network
Adapter
JASHist
(Plot Bean)
Analysis
Framework
3-4 Vector
Utilities
Histogram
Accumulation
Fitting
Framework
Functions
Particle
Properties
Data
Interface
Fitters
PAW
SQL
Jet
Finder
stdHEP
Data Access Classes
Analyze local or remote data
Desktop
Client
Local Data
DIM
Remote Data
Network
Data Server DIM
User interface independent of Data Location
Does not assume fast network (works well at 28.8 bps]
Analysis code moves (transparently) to data
Remote Data Analysis
TCP/IP Network
GUI
Experiment
Extensions
(Event Display)
Java
Compiler +
Debugger
Padded Cell
Users
Java
Code
Experiment
Interface
C++ Code
Data
Analysis
Engine
Data
•Zebra
•Jazelle
•Paw
•Root
•Objectivity
Distributed Data Analysis
Desktop
Client
Network
Data
Server
Data
Controller
Distributed Data
Data Server DIM
Data
Server
DIM
Data
Server
DIM
Data
Server
DIM
Data
Server
Data Server DIM
DIM
Plot Display Package
1-d/2-d Histogram/ScatterPlot Display
multiple axes, direct user interaction, overlays,
fitting
Java Analysis Studio GUI
Example
Analysis
Code
(Track
Recon)
Demo
New Features
Modular Plot Component
Can be used in other applications
• GUI, servlets
Model-view-controller design
Supports many display styles, 1d, 2d, scatterplot,
fitting, slices, user interaction,
XML for data interchange with other apps.
jEdit Editor
Full featured program editor
• Syntax highlighting, indenting, bracket matching
Expect to be able to integrate advanced features
• Debugging, auto-completion
New Features – HTML support
New Features – WIRED Plugin
New Features – AIDA support
AIDA is attempt to standardize HEP histogram interface
Abstract interface
• C++ and Java supported
Multiple implementations
• JAS now supports AIDA interface
• Now possible to create JAS histograms from C++
Java
AIDA
AIDA
C++
Program
JNI
JAS
New Features – G4 interface
Future Features - 3D Support
Usage
Babar using for Online Monitoring
Using Online Monitoring API
HTML Pages with embedded plots
Custom Overlays
US Linear Collider Studies
Have an entire recon+analysis package written in Java
• Using JAS as analysis interface
• Making use of remote data access using repository at
University of Pennsylvania
CLEO
Using plot bean for online displays
Other smaller scale users
All giving very valuable feedback
Helping to produce more reliable solution
OpenSource – Anyone can Contribute!
All source code now stored in CVS
Use any CVS client for anonymous (read-only) access
• We recommend jCVS (pure Java CVS client)
Source code all web browsable
• Implemented using jCVS servlet
Write access can be given to interested developers
Intend to put entire code under LGPL
Platform independent build system
Uses jmk - pure java make-like tool
• To build entire system on any platform with CVS and Java
cvs co jas
cd jas
java -jar jmk.jar
Documentation
LCD Tutorial exists
Nice step by step tutorial for beginners
Examples are all based on LCD but can be used by anyone
Starts from very beginning
Slowly adding information to Users Guide
Still nowhere near complete
How To being created to cover specific topics
Servlets How To
HTML How To
XML How To
Online API How To
Working on Fitting How To
JavaDoc generated API documentation available
Documentation remains weak link
We are aware of this and are working on producing more documentation
Also need more design specs/internals documentation to make open
source model more effective
Java for
Reconstruction/Simulation
Dual Goals:
Contribute to Linear Collider
Detector/Physics Studies
Experiment with using Java for full offline
reconstruction and analysis package
LC Detector studies in US
Goals:
Detailed Study of physics processes in a variety of
possible LC Detectors.
• Reference Small and Large detectors
Full simulation with GISMO
• Switch to Geant4, when ready
Analysis using
• Paw
• C++ & Root
• Java & JAS
Software Requirements
• Flexibly handle different detector geometries
and technologies
• Rapid development of variety of
reconstruction and analysis algorithms
Java package hep.lcd
Framework
Driver framework
interactively control
calling of processors
debugging/histograming
Parameter (Constant) access
driven by detector geometry
MC event input (StdHEP format)
IO system based on Java IO
random access files
Can be run inside JAS or standalone
Reconstruction Processors
Track finder+fitter written
Interface to Fortran fitter in progress
Several clustering algorithms
Parameterized MC Processors
Can read generator input or Gismo
output
Track and Cluster smearing
Analysis Utilities
Event Shape + Thrust utilities
Jet finder [Jade, Durham]
Histograming
Event Displays
Simple 2D Event display
Full 3D WIRED event display
Event Display
Event Display
Event Display
Event Display
Java for Reconstruction/Simulation
Looks very promising
Have been able to develop framework very fast
People have no problem learning and using it
Performance looks good
Future
Java interface to Geant4?
Reconstruction Performance
Cluster Finding
JDK1.1.8 -nojit
JDK1.1.8
MS 5.00.3177
IBM1.1.7
Seconds/Event
1.2
1
0.8
0.6
IBM1.1.8
JDK 1.2.1 Classic
JDK 1.2.1 HotSpot
0.4
0.2
0
Virtual Machine
Track Finding + Fitting
JDK1.1.8 -nojit
JDK1.1.8
MS 5.00.3177
IBM1.1.7
IBM1.1.8
JDK 1.2.1 Classic
JDK 1.2.1 HotSpot
40
Seconds/Event
35
30
25
20
15
10
5
0
Virtual Machine
Java Performance Summary
Is Java Fast Enough for Physics Analysis?
Yes
• Time gained in development well worth runtime overhead
• Good design has more effect on final speed than language
 Many tools available to help optimize code
Java will continue to get faster
More information • ACM 1999 Java Grande Conference
 http://www.cs.ucsb.edu/conferences/java99/
• THE JAVA PERFORMANCE REPORT
 http://www.javalobby.org/features/jpr/
HEP-wide Java libraries
FreeHep java library
Extract common code from JAS+WIRED
Add other utilities (not highly hep specific)
• Encapsulated Postscript generator
• JACO – Java to C++ interface
Encourage others to look at what is there
• We welcome contributions from others
HEP library – more physics specific
3 and 4 vectors, jet finders, MC generators
Histograming package (AIDA)
HEP-wide Java libraries
FreeHEP library already has useful stuff in
it, HEP library just getting started
Both libraries in CVS
• Read access available to anyone
• Write access to qualified developers
Web Site
http://java.freehep.org
Contributions welcome
Conclusions
Java is a very useful language+environment that could be very
beneficial to HEP in many areas.
Could Java be used for entire offline for major experiment?
Technically - Yes
Will Java Survive long enough?
• Need ISO standard
• Need to see how market forces play out.
Programming in Java is Fun!!
Spend time architecting an elegant solution to problem to be solved
• Not
 Reinventing the wheel,
 Debugging someone else’s problem
 Porting to different platforms
More Information…
Java Analysis Studio
http://jas.freehep.org
FreeHEP library
http://java.freehep.org
US Linear Collider Reconstruction
http://www-sldnt.slac.stanford.edu/nld
WIRED
http://wired.cern.ch
AIDA
http://wwwinfo.cern.ch/asd/lhc++/AIDA/index.html