Auger-Lecce, 10 November 2009

Download Report

Transcript Auger-Lecce, 10 November 2009

Validation related issues
Auger-Lecce, 10 November 2009
BuildBot- introduction
 BuildBot@pbsfarm
 Site Wide Installation
 Issues related to Install/Config/Valid
 Updates on ValidationTests
 Conclusions and Outlook

BuildBot – Introduction
BuildBot is the system used in Auger to automate the compile/test cycle to
validate code changes.
By automatically rebuilding and testing the tree each time something has
changed, build problems are pinpointed quickly.
By running the builds on a variety of platforms, developers who do not have the
facilities to test their changes everywhere before checkin will at least know
shortly afterwards whether they have broken the build or not.
The overall goal is to reduce tree breakage and provide a platform to run tests
or code-quality checks.
The Validation environment uses BuildBot as “testing automated framework”.
Buildbot works in a master/slave daemons scheme. The master receives
notification changes from the SVN server and tells the buildslaves to checkout,
build and test the code. Multiple slaves can run on different platforms. The
slaves report their results to the master, which posts them in a waterfall display
and sends an email to the appropriate person(s) in case problems are found.
BuildBot @ pbsfarm
Setting up BuildBot slaves on our nodes allow to automatically test the
build/test process on our system platforms.
A system virtual machine provides a complete System Platform which
supports the execution of a complete Operating System (OS).
On pbsfarm, 2 system virtual machines have been set up.:
• auger-le64.le.infn.it
Operating System:
Architecture:
Scientific Linux 4.7
64bit(x86-64)
• auger-le32.le.infn.it
Operating System:
Architecture:
Scientific Linux 4.7
32bit(i386)
They emulates the pbsfarm real nodes used for simulation/reconstruction.
The idea behind is to have BuildBot running on it, using a “site-wide”
installation.
Site Wide Installation
Using APE. Installation done from the virtual machines and located under nexus06.
For using it:
In your .bashrc includes
For 64 bit architecture:
export PATH=/nfs/argo/nexus06/gabriella/AugerOffline64Last/ape-0.98/:${PATH}
export APERC=/nfs/argo/nexus06/gabriella/AugerOffline64Last/ape-0.98/ape.rc
For 32 bit architecture
export PATH=/nfs/argo/nexus06/gabriella/AugerOffline32Last/ape-0.98/:${PATH}
export APERC=/nfs/argo/nexus06/gabriella/AugerOffline32Last/ape-0.98/ape.rc
At log, for configuring the environment you need to do a:
eval
`ape sh Externals` (for setting only the Externals)
eval `ape sh Offline` (for offline settings)
NOTE It works also for tcsh using eval `ape csh Externals` eval `ape csh Offline`
after setting in .tcshrc the equivalent of export (setenv PATH .... setenv APERC)
Issues @ Installation/Configuration
Problems during Aires build/install. (ape-0.98)
In #ape-0.98/ape.rc
...
[package Aires]
fc = g77
...
Should allow the setting of g77 as compiler in use, but it does not work. The
compilation stops since the gfortran (default compiler) is not found. I manually
changed the compiler setting directly in build/Aires/2-8-4a/config (mods
FortCompile=“g77” ) and then I entered the command build/Aires/2-8-4a/doinstall 0
Apparently things were OK but in the auger-offline-config the build of Aires
introduces a set of libraries in the system area, that address a boost installed in the
system that conflicts with the Boost in external, crash at run-time. Solved changing
manually the auger-offline-config.
TRAC (#34) It is MANDATORY to have $APERC set
Issues @ Validation
After a few rounds of validation on le-32 le-64 (see waterfall page @
http://129.10.132.228:8010/waterfall)
• In some cases the StandardApplications are very slow (particularly on le-32) and
the buildbot-master kills the application otherwise lasting forever. The problem
seems to be worse since a few days. Apparently no mods related. The
StandardApplications run involves full Sd simulation, starting from a Corsika airshower, and randomize the core position on the array. It can sometimes happens
that a core lands very close to a tank. In such a case an enormous amount of
particles is run through Geant (... It is not worth simulating them in such details since
those stations are in any case “saturated”...).... Only a luck of luckiness sequence?!...
(Notice the SdSim events are never reproduced)...
•The example and standardApplication running shows a difference between le-32 le64. In (le-64) severals:
FDTriggerSimulatorOG:MakeMirrorEvent...TAnalysedPixelData::Analyse() – found invalid
0x7f pattern!
Seen also by Mariangela- Present also in other 64 bits build machines (see example
in waterfall)- Requests to Tom Paul ... +Ralf Ulrich ...+ Michael Unger and Steffen
Mueller ...(FDEventLib ... responsability) ... + HJ Mathes.....
ValidationTests
Mods for Module Sequences: (used StandardApplications -data
Reconstruction- as reference) PLEASE CHECK!
FRec
EventFileReaderOG
EventCheckerOG
FdCalibratorOG
FdPulseFinderOG
PixelSelectorOG
FdSDPFinderOG
FdAxisFinderOG
FdApertureLightOG
FdProfileReconstructo
rKG
RecDataLister
RecDataWriter
EventFileExporterOG
SValidStore
SRec
EventFileReaderOG
EventCheckerOG
SdCalibratorOG
SdEventSelectorOG
SdPlaneFitOG
LDFFinderOG
SdEventPosteriorSelectorOG
SdRecPlotterOG
RecDataLister
RecDataWriter
EventFileExporterOG
SValidStore
with this Module Sequences: the code
is working. To do- update the inputevent before commit
ValidationTests
IO work- Main idea:
checking that new releases of Offline can read files produced with older
versions.
How to approach this:
• Trigger the BuildBot build on EventIO change.
• As Input – A list of reference Events with different versions
• A script running a read test
• A script running the hybrid Simulation+Reconstruction + writing the
Event + reco/sim test.
TAG 1 I/O
TAG 2 I/O
TAG …I/O
TAG N-1 I/O
DEV N I/O
Code 1
Code 2
Code ...
Sim
Sim
ref
ref
Sim
ref
Code N-1
Sim
ref
Code DEV
Sim
Rec
Conclusions and Outlook
• 2 BuildBot(slave) have been set up. They allows to automatically test the
build/test process on the system platforms we use.
• The use of a site wide installation from emulating node machines running
BuildBot maximize the pinpointing of problems from our side. (The build is from
the trunk with fixed externals). An offline reference is available.
•Possible evolution of virtualization- Worker Node on demand for GRID(?) – A
possible conservative approach: Check feasibility then do it @CNAF-if OK/
then propose to the collaboration. What is the status of porting OFFLINE on
GRID?
•The First issues from Installation, Buildbot setting and Validation are under
study.
• For the old SREC FREC Validation tests. The ModuleSequence has been
modified in order to update. Code working Feedback needed!