BSchmidt_KDataEURECA_software_workshopx

Download Report

Transcript BSchmidt_KDataEURECA_software_workshopx

EDELWEISS data structure
and analysis framework
Benjamin Schmidt, June 2014 at MPI München
Photo by Böhringer Friedrich
KIT – Universität des Landes Baden-Württemberg und
nationales Forschungszentrum in der Helmholtz-Gemeinschaft
www.kit.edu
Motivation to build a new data structure and
analysis framework (Kdata)
Task:
Get the data
J. Cham
Era
Root based, but difficult
access, no server with
most recent code/data…
Saclay
Ana
Fortran, Paw and C, No
paw support, French
comments in code/data…
We had: Edw-II data analysis dispersed between Ana and Era
2 experts (full time analysis)
Each with their own code
single(few local)-user / single-programmer
2010 A. Cox and I struggling to find, to access and to analyze Edw2
data
Coincidence (Muon-Veto/Bolometer) study as diploma work
2
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Motivation to build a new data structure and
analysis framework (Kdata)
Short term facilitate data access
Build flexible event based data structure
Single combined HLA-file:
muon-veto and bolometer data
Make code and data easily available
Documentation
Long term establish a common collaboration-wide analysis and data
storage tool
Share tasks (calibration, template creation, …) / Remove barriers
(documentation)
Allow for upgrade to 100’s of detectors – develop automatic processing
scheme
3
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
The general picture – The idea
All software modules
DAQ
KSamba
KDS data structure
Kamping
KPTA pulse trace analysis
A bit special:
Standalone code
Extensive use of templates
Raw
Amp
ampToHLA
HLA
Analysis:
KDataPy
KQPA
4
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Specific known - unknown requirements during
Kdata development
Requirements Edw-3:
10 -> 40 detectors
Larger workload for debugging, calibration and analysis
New detector design
(channel number/specifics initially unknown)
New electronics (some specifics unknown)
1st time resolved ionization signals (trace length?, num traces?)
Change in analog amplifiers -> signal shape?, trace length?, sampling?
new efforts to optimize signal treatment needed
Integrate muon-veto in bolo DAQ
5
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Event based data sorage
Kdata - implementation
The idea:
Build a data storage and analysis framework
for event-based physics data
use ROOT
Fast I/O
Support for LHC lifetime
Data compression
Statistics tools
Well known
C++ class library for data encapsulation
Keep it modular
Keep it flexible and general
Try to keep it simple
Keep fully split tree (library independent)
https://edwdev-ik.fzk.de/SVN_Repository_for_the_KIT_Dark_Matter_Group/KData.html
Document it
Make it easily accessible
6
Benjamin Schmidt
repository
June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure in detail
Use ROOT types
No nested arrays
Kdata library not needed to read data
Long livety of data guaranteed
Kdata coded consistent to ROOT and taligent coding style:
Easier to read/collaborate/check code
For example:
classes defined in header .h; implemented in .cxx
variables start with small f (fChannelName; fAmp; fExtra; …)
functions start with capital letter GetChannelName(); GetTrace();…
Kds completely implemented with Get…() and Set…() methods
Tab completion (ipython, root session)
7
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure in detail
ROOT TTree with single event branch
Event with flexible structure:
Variable sized TClonesArrays for Bolometer-, BoloPulse-, PulseAnalysis-,
Samba- and MuonModule information
Allows to change in hardware number of bolos/number of channels per
bolo… without code change in “kds” (data structure source code)!
Requires some effort to get to know, though
8
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure
Logic Layout:
Logic event structure via
TRef and TRefArray
Very powerful – can be spread
over files,….
TTree
KEvent
KBolometerRecord
KBoloPulseRecord = Channel
KPulseAnalysisRecord
A word of caution though:
Require specific handling in
event building: Never forget to
reset the referenced object
count
TProcessID::SetObjectCount
->blowing up file size otherwise
KSambaRecord
KMuonModuleRecords
9
Benjamin Schmidt
Probably most bugs and pbs in
kds were related to TRef
issues
June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure
Logic Layout:
Looping in python:
for event in filereader:
for bolo in event.boloRecords():
for pulse in bolo.pulseRecords():
for analyis in pulse.analysisRecords():
TTree
KEvent
KBolometerRecord
Looping C++ style in python:
KBoloPulseRecord = Channel
KPulseAnalysisRecord
Bandpass analysis
KPulseAnalysisRecord
Optimal filter
KPulseAnalysisRecord
Trapezoidal filter …
KSambaRecord
for i in range(f.GetEntries()):
f.GetEntry(i)
event = f.GetEvent()
for ii in range(event.GetNumBolos()):
bolo = event.GetBolo(ii)
samba = bolo.GetSambaRecord()
print samba.GetNtpDateSec()
for iii in range(bolo.GetNumPulseRecords()):
pulse = bolo.GetPulseRecord(iii)
Trace = pulse.GetTrace()
…
KMuonModuleRecords
10
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure in detail
Structure subclassed in
Raw: KRawEvent, KRawBolometerRecord, …
Amp: KAmpEvent, KAmpBolometerRecord, ….
HLA: KHLAEvent, KHLABolometerRecord, …
~ 1/2 samba file size
< 1/10 raw file size
Amp and HLA – no pulse traces,
but KPulseAnalysisRecord
Raw – with pulse traces!
No KPulseAnalysisRecords
With a quick calculation
2.87* 356/1850 *2.35 
FWHM 1.04 keV
Ana 1.1 keV
11
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Python and KDataPy
12
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
simpleEventViewer output:
13
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Looping utilites –
no need to write the looping/plotting
Use KDataPy.util with plotpulse(), looppulse(), loopbolo() and
KDataPy.loop_amp with loopchannel(), plotchan_x(),
plotchan_x_files(), plotchan_x_dir()
Loop_amp to be completed with plotchannel_xy(), … and loop/plotbolo
functions – Note that KDataPy.util loopbolo() also works for Amp and
HLA data
Basic usage:
import ROOT
import KDataPy.util as ut
ut.plotpulse(“/sps/edelweis/kdata/data/raw/nk23b002_000.root”, “chalB
FID823”)
Documentation
14
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Our data acquisition chains revisited
Our look up
place
Modane
Samba Macs
Radon
Automated
proc0: copy to Lyon
proc1: rootification
proc2: raw->amp
Bolo-Raw proc3: amp->hla
proc4: merge/skim
data
muon/hla bolo data
spsToHpss: backup on
tape drive
Muon Veto
DAQ
15
Benjamin Schmidt
Lyon
Kdata - ROOT on kalinka
Karlsruhe
June 2014, CRESST/EDELWEISS/EURECA software workshop
Using the Kdata pulse processing library
Adam Cox our
benevolent
dictator for life
16
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
The KPulseAnalysisChain
The kpta-chain is applied before
your analysis function
17
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Ionisation channel after pattern removal:
18
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Advantages – Drawbacks (personal opinion)
Flexibility of data structure
Consistency of data structure (over
time)
Same data structure for different
detector systems -> Great for
coincidence studies
Same data structure for different
processing/analyses (bandpass,
optimal filter, …)
Decouple high level analyses from
DAQ/processing changes
Flexibility of data structure
comes with some complexity
(heavyness)
Especially Ttree.Draw() more
complex
Single raw data folder 
restricted use of ls
Writing kpta with templates a
bit more complex
Independent kpta library
Has been reused with (flat) data from
EURECA test stand
Very versatile
19
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
Usage of pyhton
90 % of the time python feels like the
right solution
Shorter, more legible code
Vast set of external libraries
Extremely handy for scripting
Basic Documentation in python
always via ‘’’docstrings’’’
20
Benjamin Schmidt
Main price – speed:
Circumvent by producing an
additional set of data files
skimmed by detector
Future use of pypy + ROOT6
June 2014, CRESST/EDELWEISS/EURECA software workshop
But 50 x slower
PyPY-JIT compile 1.06 x slower
21
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop
22
Benjamin Schmidt
June 2014, CRESST/EDELWEISS/EURECA software workshop