Update on CLAS12 software

Download Report

Transcript Update on CLAS12 software

CLAS12 Software
D.P. Weygand
Thomas Jefferson National Accelerator Facility
Thomas Jefferson National Accelerator Facility
Page 1
Projects
ClaRA
Simulation
GEMC
CCDB
Geometry Service
Event Display
Tracking
SOT
Gen III
Event Reconstruction
Post-Reconstruction Data Access
Data-Mining
Slow Controls
Documentation
Doxygen
Javadoc
Testing/Authentication
Detector Subsystems
(Reconstruction and Calibration)
EC
PCAL
FTOF
CTOF
LTCC
HTCC
OnLine
Code Management
SVN
Bug Reporting
Support
Visualization Services
Support Packages
Eg.
CLHEP
Root
Thomas Jefferson National Accelerator Facility
Page 2
Service Oriented Architecture
Overview:
Services are unassociated loosely coupled units of functionality that have no
calls to each other embedded in them. Each service implements one action.
Rather than services embedding calls to each other in their source code, they
use defined protocols that describe how services pass and parse messages.
SOA aims to allow users to string together fairly large chunks of functionality to
form ad hoc applications that are built almost entirely from existing software
services. The larger the chunks, the fewer the interface points required to
implement any given set of functionality; however, very large chunks of
functionality may not prove sufficiently granular for easy reuse. Each interface
brings with it some amount of processing overhead, so there is a performance
consideration in choosing the granularity of services. The great promise of SOA
suggests that the marginal cost of creating the nth application is low, as all of the
software required already exists to satisfy the requirements of other applications.
Ideally, one requires only orchestration to produce a new application.
Thomas Jefferson National Accelerator Facility
Page 3
SOA/Complexity
SOA is principally based on object oriented design. Each service is built as a
discrete piece of code. This makes it possible to reuse the code in different ways
throughout the application by changing only the way an individual service
interoperates with other services that make up the application, versus making
code changes to the service itself. SOA design principles are used during
software development and integration.
Software complexity is a term that encompasses numerous properties of a piece
of software, all of which affect internal interactions. There is a distinction
between the terms complex and complicated. Complicated implies being difficult
to understand but with time and effort, ultimately knowable. Complex, on the
other hand, describes the interactions between a number of entities. As the
number of entities increases, the number of interactions between them would
increase exponentially, and it would get to a point where it would be impossible
to know and understand all of them. Similarly, higher levels of complexity in
software increase the risk of unintentionally interfering with interactions and so
increases the chance of introducing defects when making changes. In more
extreme cases, it can make modifying the software virtually impossible.
Thomas Jefferson National Accelerator Facility
Page 4
ClaRa and Cloud Computing
• Address physics data processing major components as
services.
• Services and information bound to those services can be
further abstracted to process layers and composite
applications for developing various analyses solutions.
• Agility, or the ability to change physics data processing
process on top of existing services.
• Ability to monitor points of information and points of
service, in real time, to determine the well-being of entire
physics data processing application.
SOA is the choice for ClaRA as a key architecture: highly
concurrent: cloud computing.
Thomas Jefferson National Accelerator Facility
Page 5
ClaRA Stress Test
V. Gyurjyan S. Mancilla
JLAB Scientific Computing Group
Thomas Jefferson National Accelerator Facility
Page 6
ClaRA Components
compute node
DPE
Platform
(cloud controller)
C
S
“service”
“container”
DPE
C
S
DPE
C
S
Thomas Jefferson National Accelerator Facility
Orchestrator
Page 7
Batch Deployment
Thomas Jefferson National Accelerator Facility
Page 8
16 core hyper-threaded (no IO)
150 ms/event/thread
Event Reconstruction Rate vs Number of Threads
kHz
Thomas Jefferson National Accelerator Facility
Page 9
Batch job submission
<Request>
<Project name="clas12" />
<Track name="reconstruction" />
<Name name="clara-test" />
<CPU core="16" />
<TimeLimit time="72" unit="hours" />
<Memory space="27" unit="GB" />
<OS name="centos62"/>
<Command><![CDATA[
setenv CLARA_SERVICES /group/clas12/ClaraServices;
$CLARA_SERVICES/bin/clara-dpe -host claradm-ib
]]></Command>
<Job></Job>
</Request>
Thomas Jefferson National Accelerator Facility
Page 10
Single Data-stream Application
Farm Node N
S1
S1
S1
Sn
Sn
Sn
S2
S2
S2
Executive Node
ClaRA Master DPE
Persistent
Storage
R
AO
orchestrator
W
Administrative
Services
Thomas Jefferson National Accelerator Facility
Page 11
Multiple Data-stream Application
S1
S1
S1
Sn
Sn
Sn
S2
S2
S2
R
Persistent
Storage
W
DS
Farm Node N
Persistent
Storage
Executive Node
AO
Administrative
Services
ClaRA Master DPE
Thomas Jefferson National Accelerator Facility
Page 12
Batch queue
Common queue
Exclusive queue : CentOS 6.2
16 core, 12 processing nodes
Thomas Jefferson National Accelerator Facility
Page 13
Single Data-stream Application
Clas12 Reconstruction: JLAB batch farm
Data Processing Rate
Single datastream
2.5
y = 0.193x + 0.1141
R² = 0.9901
2
kHz
1.5
Ethernet
Infiniband
1
RAM Disk IO
0.5
0
0
2
4
6
8
Number of processing nodes
(32cores: 16 cores with hyperthreading )
10
Thomas Jefferson National Accelerator Facility
12
Page 14
Computing Capacity Growth
Today:
1K cores in the farm (3 racks, 4-16 cores per node, 2 GB/core)
9K LQCD cores (24 racks, 8-16 cores per node 2-3 GB/core)
180 nodes w/ 720 GPU + Xeon Phi as LQCD compute
accelerators
2016:
20K cores in the Farm (10 racks, 16-64 cores per node, 2
GB/core)
Accelerated nodes for Partial Wave Analysis? Even 1st Pass?
Total footprint, power and cooling will grow only slightly.
Capacity for detector simulation will be deployed in 2014 and 2015, with
additional capacity for analysis in 2015 and 2016.
Today Experimental Physics has < 5% of the compute capacity of LQCD. In
2016 it will be closer to 50% in dollar terms and number of racks (still small in
terms of flops).
Thomas Jefferson National Accelerator Facility
Page 15
Compute Paradigm Changes
Today, most codes and jobs are serial. Each job uses one core, and
we try to run enough jobs to keep all cores busy, without
overusing memory or I/O bandwidth.
Current weakness: if we have 16 cores per box, and run 24 jobs to
keep them all busy, that means that there are 24 input and 24
output file I/O streams running just for this one box! => lots of
“head thrashing” in the disk system.
Future: most data analysis will be event parallel (“trivially parallel”).
Each thread will process one event. Each box will process 1 job
(DPE) 32-64 events in parallel, with 1 input and 1 output => much
less head thrashing, higher I/O rates.
Possibility: the farm will include GPU or Xeon Phi accelerated nodes!
As software becomes ready, we will deploy it!
Thomas Jefferson National Accelerator Facility
Page 16
Tested on calibration and simulated data
V. Ziegler & M. Mestayer
Thomas Jefferson National Accelerator Facility
Page 17
Jerry Gilfoyle & Alex Colvill
•
•
•
•
•
•
Code to reconstruct the signals from the Forward Time-of-Flight system (FTOF) *
Written as a software service so that it can be easily integrated into ClaRA framework.
FTOF code converts the TDC and ADC signals into times and energies and corrects for effects like time walk.
The position of the hit along the paddle determined by the difference between the TDC signals and the time of the
hit -- reconstructed using the average TDC signal and correcting for the propagation time of light along the
paddle.
Energy deposited extracted from the ADC signal and corrected for light attenuation along the paddle.
Modifications to this procedure applied when one of more of the ADC or TDC signals are missing.
FTOF code is up and running and will be used in the upcoming ‘stress test’ of the full CLAS12 event
reconstruction package.
* modeled after the ones used in the CLAS6 FTOF reconstruction and tested using Monte Carlo data from the CLAS12, physics-based simulation
gemc.
Fig. 1: histogram of the number Nadj
of adjacent paddles in a cluster
normalized to the total number of
events. Clusters are formed in a
single panel by grouping adjacent hits
together. The red, open circles are
Nadj for panel 1b. The black, filled
squares are for panel 1a which is
behind panel 1b relative to the target.
Most events consist of a single hit,
but there is a significant number that
have additional paddles in each
cluster.
Thomas Jefferson National Accelerator Facility
Page 18
Intel Xeon Phi MIC Processor
The Intel Xeon Phi KNC processor is essentially a 60-core SMP chip
where each core has a dedicated 512-bit wide SSE (Streaming SIMD
Extensions) vector unit. All the cores are connected via a 512-bit
bidirectional ring interconnect (Figure 1). Currently, the Phi
coprocessor is packaged as a separate PCIe device, external to the
host processor. Each Phi contains 8 GB of RAM that provides all the
memory and file-system storage that every user process, the Linux
operating system, and ancillary daemon processes will use. The Phi
can mount an external host file-system, which should be used for all
file-based activity to conserve device memory for user applications.
Thomas Jefferson National Accelerator Facility
Page 19
Virtual Machine
Thomas Jefferson National Accelerator Facility
Page 20
Virtual Machine
Thomas Jefferson National Accelerator Facility
Page 21
CLAS12 Constants Database (CCDB)
Johann Goetz & Yelena Prok
Thomas Jefferson National Accelerator Facility
Page 22
CLAS12 Constants Database (CCDB)
Thomas Jefferson National Accelerator Facility
Page 23
Proposed Programming Standards
•
•
•
Programming standards create a unified
collaboration among members
Standards and documentation increase the
expected life of software by creating unified design
aiding in future maintenance
This is a proposed working standard and is open for
suggestions and modification. It is found on the
Hall-B wiki.
Thomas Jefferson National Accelerator Facility
Page 24
Profiling and Static Analysis
150 ms/event/thread
•
•
Profiling is a system of dynamic program analysis.
o Individual call stacks
o Time analysis
o Memory analysis
Static Analysis is a pre-run analysis that pinpoints
areas of potential error.
o Possible bugs
o Dead code
o Duplicate code
o Suboptimal code
Thomas Jefferson National Accelerator Facility
Page 25
Testing
•
•
Unit Testing
o extends life of code
o catch errors made by modification
o decrease amount of debugging
o decrease amount of suboptimal code
Testing is a required portion of the proposed coding
standards and decreases the amount of time spent
working with incorrect code.
Thomas Jefferson National Accelerator Facility
Page 26
PYTHON/ClaRa
Python has a simple syntax that allows for very quick prototyping of experimental analysis
services, along with a large number of incredibly useful built in functions and data-types.
There are also a huge number of open source, highly optimized, and well documented
computational analysis modules available to be imported into any analysis service. A
small handful of the supported areas are:
Full Statistical and
Function Optimization
Toolkits
Eigenvalue and
Eigenvectors of
large Sparse
Matrices
Integration Functions
with Support for
Integration of Ordinary
Differential Equations
Fourier Transforms
Interpolation
Signal Processing
Linear Algebra
Special Functions
Full Statistics Suite
Highly efficient File I/O
Spatial Algorithm
and Data
Structures
Clustering Algorithms
for theory, target
detection, and other
areas
Thomas Jefferson National Accelerator Facility
Page 27
PYTHON/ClaRa
The existing Cmsg Protocol will be
wrapped into an importable Python
module that provides the needed methods
to receive and send a Cmsg Container
to/from the Python Service and CLARA.
Existing
CMsg
written in
C
Imported
Python Wrapper
Python
Analysis
Service
If the received Cmsg Container contains
EVIO data, a separate EVIO Support
module can be imported in order to
provide the needed functions to read and
append data to the EVIO Event stored in
the Cmsg container.
EVIO Support
Module Written
in Python
Imported
Thomas Jefferson National Accelerator Facility
Page 28
Summary
• Scaled SOA implemented successfully via ClaRa
Software infrastructure is being integrated into the Jlab batch farm system
More services need to written
More Orchestrators/Applications
• Progress on Major Systems
CCDB
Geometry Service
Reconstruction
Tracking
TOF & EC
• Testing/Standards/QA
Actively being developed
Thomas Jefferson National Accelerator Facility
Page 29
Summary cont.
• User Interfaces/Ease of Use
•
•
•
•
•
•
•
GUI to CCDB
Data Handling
Data Access/Mining
EVIO data access via dictionary
Virtual Box
Programming Libraries: ROOT,ScaVis, SciPy/NumPy …
Examples Examples Examples
Thomas Jefferson National Accelerator Facility
Page 30
Thomas Jefferson National Accelerator Facility
Page 31