ATLAS Computing - Indico
Download
Report
Transcript ATLAS Computing - Indico
Migration of ATLAS PanDA to
CERN
Graeme Stewart, Alexei Klimentov, Birger Koblitz, Massimo
Lamanna, Tadashi Maeno, Pavel Nevski, Marcin Nowak,
Pedro Salgao, Torre Wenus, Mikhail Titov
Graeme Stewart: ATLAS Computing
1
Outline
PanDA Review
PanDA History
PanDA Architecture
First steps of Migration to CERN
Infrastructure Setup
PanDA Monitor
Task Request Database
Second Phase Migration
PanDA Server and Bamboo
Database bombshells
Migration, Tuning and Tweaks
Conclusions
Graeme Stewart: ATLAS Computing
2
PanDA Recent History
PanDA was developed by US
ATLAS in 2005
Became the executor of all
ATLAS production in EGEE
35k simultaneous running jobs
150k jobs per day finished
during 2008
March 2009: executes
production for ATLAS in
NDGF as well using ARC
Control Tower (aCT)
As PanDA had become central
to ATLAS operations it was
decided in late 2008 to relocate it to CERN
Graeme Stewart: ATLAS Computing
PanDA Server Architecture
PanDA (Production and
Distributed Analysis) is
ATLAS ProdDB
a pilot job system
Panda
Monitor
Panda
Client
Bamboo
Executes jobs from the
ATLAS production
system and from users
Brokers jobs to sites
based on available
Panda Server
compute resource and
data
Panda
Databases
Pilots get
jobs
data if necessary
Computing Site
Pilots
Pilot Factory
Graeme Stewart: ATLAS Computing
Can move and stage
Triggers data movement
back to Tier-1s for
dataset aggregation
PanDA Monitor
PanDA Monitor is the web interface
to the panda system
Provides summaries of
processing per cloud/site
Drill down to individual job logs
And directly view logfiles
Task status
Also provides a web interface to
request actions from the system
Task requests
Dataset Subscriptions
Graeme Stewart: ATLAS Computing
Task Request Database
AKTR
MySQL
PandaDB
MySQL
ProdD
BOracl
e
Task request interface is hosted as part of the panda monitor
Allows physicists do define MC production task
Backend database exists separately from rest of panda
Prime candidate for migration from MySQL at BNL to Oracle at
CERN
AKTR
Oracle
ProdD
BOracl
e
Graeme Stewart: ATLAS Computing
PandaDB
MySQL
Migration – Phase 1
Target was migration of task request
database and panda monitor
First step was to prepare infrastructure
for services:
3 server class machines to host panda
monitors
Setup as much as possible as standard
machines supported by CERN FIO
Dual CPU, Quad Core Intel E5410
CPUs
16GB RAM
500GB HDD
Quattor templates
Lemon monitoring
Alarms for host problems
Also migrated to the ATLAS
standard python environment
Utilise CERN Arbitrating DNS to
balance load across all machines
Picks the 2 ‘best’ machines of 3 with a
configurable metric
Graeme Stewart: ATLAS Computing
Python 2.5, 64 bit
Parallel Monitors
DB
Panda was always architected to have multiple stateless monitors
Each monitor queries the backend database to retrieve user requested
information and display it
Thus setting up a parallel monitor infrastructure at CERN was relatively easy
Once external dependencies were sorted
ATLAS Distributed Data Management (DDM)
Grid User Interface tools
This was deployed at the beginning of December 2008
Graeme Stewart: ATLAS Computing
Task Request Database
First real step was to migrate the TR DB between MySQL and Oracle
This is not quite as trivial as one first imagines
Each database supports some non-standard SQL features
Optimising databases is quite specific to the database engine
First attempts ran into trouble
And these are not entirely compatible
MySQL dump from BNL to CERN resulted in connections being dropped
Had to dump data at BNL and scp to CERN
Schema required some cleaning up
Dropped unused tables
Removing null constraints, CLOB->VARCHAR, resizing some text fields
However, after a couple of trial migrations we were confident that data
could be migrated in just a couple of hours
Graeme Stewart: ATLAS Computing
Migration
Migration occurred on Monday December 8th
Database data was migrated in a couple of hours
Two days were then used to iron out any glitches
In the Task Request interfaces
In the scripts which manage the Task Request to ProdDB interface
Could this all have been prepared in advance?
In theory yes, but we are migrating a live system
So there only a limited amount of test data which can be inserted
into the system
Real tasks trigger real jobs
System was live again and accepting task requests on Wednesday
Latency of tasks in the production system is usually several days,
even for short tasks
Acceptable to the community
Graeme Stewart: ATLAS Computing
A Tale of Two Infrastructures
MySQL
DB
Oracle
DB
New panda monitor setup required DB plugins to talk to both MySQL and
to Oracle
The MySQLdb module is bog standard
The cx_oracle module much less so
In addition Python 2.4 was the supported infrastructure at BNL as
opposed to Python 2.5 at CERN
This meant after the TR migration the BNL monitors started to have a
more limited functionality
This had definitely not been in the plan!
Graeme Stewart: ATLAS Computing
PanDA Servers
Some preliminary work on the panda server has been done already in 2008
However much still required to be done to migrate the full suite of panda
server databases:
PandaDB – holds live job information and status (‘fast buffer’)
LogDB – holds pilot logfile extracts
MetaDB – holds panda scheduler information on sites and queues
ArchiveDB – ultimate resting place of any panda job (big!)
For most databases the data volume was minimal and the main work was in
the schema details
Including the setup of Oracle triggers
For the infrastructure side we copied the BNL setup, with multiple panda
servers running on the same machines as the monitors
We knew the load was low and the machines were capable
We also required one server component which interfaces between the
panda servers and ProdDB, bamboo
Same machine template worked fine
Graeme Stewart: ATLAS Computing
ArchiveDB
In MySQL, because of constraints on the table performance vs. size an
explicit partitioning had been adopted
One ArchiveDB table for every two months of jobs
Jan_Feb_2007
Mar_Apr_2007
…
Jan_Feb_2009
In Oracle internal partitioning is supported:
CREATE TABLE jobs_archived (<list of columns>) PARTITION BY
RANGE(MODIFICATIONTIME) ( PARTITION jobs_archived_jan_2006 VALUES
LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')),
PARTITION
jobs_archived_feb_2006 VALUES LESS THAN (TO_DATE('01-MAR-2006','DDMON-YYYY')),
PARTITION jobs_archived_mar_2006 VALUES LESS THAN
(TO_DATE('01-APR-2006','DD-MON-YYYY')), …
This allows for considerable simplification of the client code in the
panda monitor
Graeme Stewart: ATLAS Computing
Integrate, Integrate, …
By late February trial migrations of the databases had happened to
integration databases hosted at CERN (the INTR database)
Trail jobs had been run through the panda server, proving basic
functionality
Decision now had to be made on final migration strategy
This could be ‘big bang’ (move the whole system at once) or ‘inflation’
(gradually migrate clouds one by one)
Big bang would be easier for, e.g., panda monitor
But would carry greater risks – suddenly loading the system with 35k
running jobs was unwise
If things went very wrong it might leave us with a big mess to recover
from
External constraint was the start of the ATLAS cosmics rereprocessing campaign due to start 9th March
We decided to migrate piecemeal
Graeme Stewart: ATLAS Computing
Final Preparations
In fact PanDA did have two heads already
IT and CERN clouds had been run from a parallel MySQL setup from
early 2008
This was an expensive infrastructure to maintain as it did not tap into
CERN IT supported services
It was obvious that migrating these two clouds would be a natural
first step
Plans were made to migrate to the ATLAS production database at
CERN (aka ATLR)
Things seemed to be under control a few days before…
Graeme Stewart: ATLAS Computing
DBAs
Friday before we were due to migrate CERN DBAs asked us not to do so
They were worried that not enough testing of the Oracle setup in
INTR has been done
This triggered a somewhat frantic weekend of work, resulting in
several thousand jobs being run through the CERN and IT clouds
using the INTR databases
From our side this testing looked to be successful
However, we reached a subsequent compromise that
We would migrate the CERN and IT clouds to panda running against the
INTR
They would start backups on the INTR database giving us the confidence
to run production for ATLAS though this setup
Subsequent migration from INTR to ATLR could be achieved much more
rapidly as the data was already in the correct Oracle formats
Graeme Stewart: ATLAS Computing
Tuning and Tweaking
Migration of PandaDB, LogDB, MetaDB was very quick
There was one unexpected piece of client code which hung during the
migration process (polling of CERN MySQL servers)
Migration and index building of ArchiveDB was far slower
However, we disabled access to ArchiveDB and could bring the
system up live within half a day
Since then a number of small improvements in the panda code have been
made to help optimise use of oracle
Connections are much more expensive in Oracle than in MySQL
Restructure code to use a connection pool
Create common reader and writer accounts for access to all database
schemas from the one connection
Migration away from triggers to .nextval() syntax
Despite fears, migration of panda server to oracle has been relatively
painless and been achieved without significant loss of capacity
Graeme Stewart: ATLAS Computing
Cloud Migration
Initial migration was for CERN and IT clouds
We added NG, the new Nordugrid cloud, which was from a standing start
We added DE after a major intervention in which the cloud was taken
offline
Similarly TW will come up in the CERN Oracle instance
UK was the interesting case where we migrated a cloud live:
Switched bamboo instance to send jobs to CERN Oracle servers
Current jobs are left being handled by old bamboo and servers
Start sending pilots to UK asking from jobs from CERN Oracle
servers
Force the failure of jobs not yet started in the old instance
These return to prodDB and then are picked up again by panda using the
new bamboo
Old running jobs are handled correctly by the ‘old’ system
There will be a subsequent re-merge into the CERN ArchiveDB
Graeme Stewart: ATLAS Computing
Monitor Blues
A number of problems did arise in the new monitor setup required for
the migrated clouds
Coincident with the migration there was a repository change from
CVS to SVN
However, the MySQL monitor was deployed from CVS and the Oracle
monitor from SVN
This lead to a number of accidents and minor confusions which it took a
while to recover from
New security features cause some loss of functionality at times as it
was hard to check all the use cases
And the repository problems augmented this
However, these are now mostly resolved issues and ultimely the system
will in fact become simpler
Graeme Stewart: ATLAS Computing
Conclusions
Migration of the panda infrastructure from BNL to CERN has underlined how
difficult the transition of a large scale, live, distributed computing system is
A very pragmatic approach was adopted in order to get the migration done in a
reasonable time
Although it always takes longer then you think
Much has been achieved
Monitor and task request database fully migrated
CERN Panda server infrastructure moved to Oracle
(This is true even when you try and factor in knowledge of the above)
Now running 5(6) of the 11 ATLAS clouds: CERN, DE, IT, NG, UK, (TW)
Remaining migration steps are now a matter of scaling and simplifying
We learned a lot
Love your DBAs, of course
If we have to do this again, now we know how
But there is still considerable work to do
Mainly in improving service stability, monitoring and support proceedures
Graeme Stewart: ATLAS Computing