Title Page - Chep 2000 Home Page
Download
Report
Transcript Title Page - Chep 2000 Home Page
NOVA
Networked Object-based EnVironment for Analysis
P. Nevski, A. Vaniachine, T. Wenaus
NOVA is a project to develop distributed object oriented physics analysis components,
adaptable to different experiments. A job configuration manager uses a scripting
interface to provide web-based editing, submission and cataloguing of analysis jobs,
both user-level and experiment-wide, centrally managed in a database. A client/server
system distributed over compute nodes provides job submission and monitoring across
facilities, which may span several sites. A file catalog records production relationship of
data files generated by an experiment. NOVA provides database tools for geometry and
parameter object storage. A NOVA web-based browser navigates a relational database
storing hierarchically structured dataObjects. Clients may access database information
from the code or through a CORBA-specified interface. NOVA components have been
tested and deployed in the STAR and ATLAS environments.
February 7, 2000
Outline
•
•
•
•
•
•
February 7, 2000
Goals
Requirements
Architecture
Components
Details
Summary
CHEP in Padova
Motivations
• Unprecedented data volume and software complexity
in new large High Energy and Nuclear Physics
experiments at RHIC (BNL) and LHC (CERN)
New approaches to analysis and data handling
software
Distributed computing environment (DCE) is vital
and increasingly powerful
Experience in developing DCE solutions for STAR
Build on experience to develop DCE tools for use
in similarly challenging environments
February 7, 2000
CHEP in Padova
Goals
• Develop software tools for
– coordination and control of widely distributed analysis
development and physics analysis activity
– distributed management and analysis of very large
datasets
– enhanced robustness, reusability and maintainability
of analysis software
• For application in many global computing environments
(ATLAS, STAR, …)
– generic tools not tied to specific implementation
choices
– select, templatable implementations provided such
that NOVA components can be used in a baseline
framework
February 7, 2000
CHEP in Padova
Requirements
• Support wide area data intensive analysis
• Define middleware services are required to permit
analysis applications to effectively run over wide area
networks
• Provide a rich set of features that applications can
select and use to obtain the level of service they need
to operate
• Define the features and the API's necessary to allow
the application and middleware to communicate
• Integrate the middleware API's with the applications
February 7, 2000
CHEP in Padova
Design Approach
• Small, modular components; application-neutral
interfaces
– Can be used as a coherent framework or in
isolation to extend existing analysis systems
• Focused on support for C++ based analysis
– Used for all RHIC, LHC, other large experiments
• Emphasis on user participation in iterative
development; real-world prototyping and testing
(STAR, ATLAS)
• Extensive use of existing tools and technologies
– Must be readily available, true or de facto
standards, well supported, widely used or showing
good growth
February 7, 2000
CHEP in Padova
Component-based Architecture
NOVA Architecture
nanoDST
GCA Query
Visualisation
Dynamically
loaded apps
Web
browser
Mobile
Analysis
Client
Client
Data Binder
Module
Regional
Center
Analysis
Daemon
Client
Data Binder
Module
Remote Analysis
Remote
Clients
Offline Control
Framework
Bug system
HyperNews MySQL Client
State DB
Database
Navigator
Web Server
Server
Data Binder
Module
CVS Code
Repository
Monitoring
Module
MySQL Analysis
Catalogue
Analysis Server
State
Server
Middleware Components
Grand
Challenge
Architecture
(GCA)
NOVA component
Parameters
Repository
Data Repository
Data Management
Catalog
Interface
MySQL Data
Catalogue
Third party tool customized for
and integrated into NOVA
Application specific; sample
implementation provided
Status:
Implemented
Prototyped
Planned
Existing third party tool employed by NOVA
February 7, 2000
CHEP in Padova
Tools and Technologies
• Third party tools and technologies used in NOVA:
– MySQL: relational database for catalogs, state
information and simple objects: C-structs
– Perl: Unix scripting and web development tool
– Apache: customizable (Perl & PHP) web server for
communication and monitoring
– CORBA: low-volume interprocess data exchange
– ROOT: visualization and analysis tools
February 7, 2000
CHEP in Padova
Components
NOVA components fall into four domains
– Regional Center
• Central management and execution of analysis
– Remote Client
• Mobile Analysis
– Middleware Components
• Data exchange and navigation tools
• Client/Server object request brokerage
– Data Management
• Data repository, catalogue, and interface
• Data model for simple objects (C-structs)
February 7, 2000
CHEP in Padova
Dynamic Binding
• Problem:
– A user has a new idea that was not foreseen at the
beginning. User modifies the structure of one object in his
application. Application stores new objects in the database.
– Remote applications unaware of a new functionality may
request objects in old format.
• Solution:
– Application: provides metadata request (name, time,
selectors...) and the application dataObject dictionary
– Database server: provides dataObject and the dictionary
– Object Request Broker module: converts dataObject
according to the application dictionary
February 7, 2000
CHEP in Padova
Dynamic Object Broker
Application
DataObject
Application
Dictionary
Object
Request
Broker
Database
DataObject
Middleware Services
Database
Dictionary
Remote Application Clients
Parameters Repository
Central Database Server
February 7, 2000
CHEP in Padova
Forward Compatibility
• Benefits:
– Separation of database and analysis applications
– Robust interface (via built-in type checking)
– Dictionary built from C-header files or IDL-files
– Database access is independent of application
code version: user can read new dataObjects
with an old executable
• Usage:
– Parameters data management (versioned
geometry and reconstruction constants support)
February 7, 2000
CHEP in Padova
Static Binding
• Problem:
– Remote application (web browser) navigates
current database hierarchy.
• Solution:
– Object Request Broker at the Regional Center
serves dynamic HTML dataObjects in format
tailored according to application ID: Netscape or
MS Internet Explorer
February 7, 2000
CHEP in Padova
Static Object Broker
NOVA Browser
Application
DataObject
Apache
Web
Server
Application
ID
Database
API
Module
Database
DataObject
Middleware Services
Database
API Call
Parameters Repository
Regional Center Database Server
Remote Application Client
February 7, 2000
CHEP in Padova
Layered Interface
February 7, 2000
CHEP in Padova
Data Model
Array of parameters
Array of structures
structure
February 7, 2000
relation
CHEP in Padova
parameter
Cataloguing Analysis Workflow
Job configuration manager
Job monitoring system
fileCatalog
February 7, 2000
CHEP in Padova
Grand Challenge Interface
GC System
Query
Estimator
GCA Interface
STAR Components
gcaClient
StIOMaker
Query
Monitor
database
Cache
Manager
FileCatalog
fileCatalog
Index
Builder
IndexFeeder
tagDB
February 7, 2000
CHEP in Padova
Limiting Dependencies
Experiment-specific & GCA-dependent
• IndexFeeder server
– IndexFeeder read the “tag database” so that GCA
“index builder” can create index
• FileCatalog server
– FileCatalog queries the “file catalog” database of the
experiment to translate fileID to HPSS & disk path
• gcaClient interface
– Experiment sends queries and get back filenames
through the gcaClient library calls
February 7, 2000
CHEP in Padova
Summary
What is NOVA?
• Framework components for distributed computing
What are NOVA components?
•
•
•
•
•
•
Configuration manager for analysis jobs
Distributed job submission and monitoring system
Analysis workflow catalog
Database for versioned dataObjects
Brokered extraction of dataObjects
Web-based database navigation tool
February 7, 2000
CHEP in Padova