ATLAS Computing - Indico

Download Report

Transcript ATLAS Computing - Indico

Migration of ATLAS PanDA to
CERN
Graeme Stewart, Alexei Klimentov, Birger Koblitz, Massimo
Lamanna, Tadashi Maeno, Pavel Nevski, Marcin Nowak,
Pedro Salgao, Torre Wenus, Mikhail Titov
Graeme Stewart: ATLAS Computing
1
Outline




PanDA Review

PanDA History

PanDA Architecture
First steps of Migration to CERN

Infrastructure Setup

PanDA Monitor

Task Request Database
Second Phase Migration

PanDA Server and Bamboo

Database bombshells

Migration, Tuning and Tweaks
Conclusions
Graeme Stewart: ATLAS Computing
2
PanDA Recent History
 PanDA was developed by US
ATLAS in 2005
 Became the executor of all
ATLAS production in EGEE

35k simultaneous running jobs

150k jobs per day finished
during 2008
 March 2009: executes
production for ATLAS in
NDGF as well using ARC
Control Tower (aCT)
 As PanDA had become central
to ATLAS operations it was
decided in late 2008 to relocate it to CERN
Graeme Stewart: ATLAS Computing
PanDA Server Architecture

PanDA (Production and
Distributed Analysis) is
ATLAS ProdDB
a pilot job system
Panda
Monitor
Panda
Client
Bamboo

Executes jobs from the
ATLAS production
system and from users

Brokers jobs to sites
based on available
Panda Server
compute resource and
data
Panda
Databases
Pilots get
jobs

data if necessary

Computing Site
Pilots
Pilot Factory
Graeme Stewart: ATLAS Computing
Can move and stage
Triggers data movement
back to Tier-1s for
dataset aggregation
PanDA Monitor

PanDA Monitor is the web interface
to the panda system

Provides summaries of
processing per cloud/site

Drill down to individual job logs



And directly view logfiles
Task status
Also provides a web interface to
request actions from the system

Task requests

Dataset Subscriptions
Graeme Stewart: ATLAS Computing
Task Request Database
AKTR
MySQL
PandaDB
MySQL
ProdD
BOracl
e

Task request interface is hosted as part of the panda monitor


Allows physicists do define MC production task
Backend database exists separately from rest of panda

Prime candidate for migration from MySQL at BNL to Oracle at
CERN
AKTR
Oracle
ProdD
BOracl
e
Graeme Stewart: ATLAS Computing
PandaDB
MySQL
Migration – Phase 1


Target was migration of task request
database and panda monitor
First step was to prepare infrastructure
for services:
 3 server class machines to host panda
monitors




Setup as much as possible as standard
machines supported by CERN FIO




Dual CPU, Quad Core Intel E5410
CPUs
16GB RAM
500GB HDD
Quattor templates
Lemon monitoring
Alarms for host problems

Also migrated to the ATLAS
standard python environment

Utilise CERN Arbitrating DNS to
balance load across all machines

Picks the 2 ‘best’ machines of 3 with a
configurable metric
Graeme Stewart: ATLAS Computing
Python 2.5, 64 bit
Parallel Monitors
DB

Panda was always architected to have multiple stateless monitors

Each monitor queries the backend database to retrieve user requested
information and display it

Thus setting up a parallel monitor infrastructure at CERN was relatively easy


Once external dependencies were sorted

ATLAS Distributed Data Management (DDM)

Grid User Interface tools
This was deployed at the beginning of December 2008
Graeme Stewart: ATLAS Computing
Task Request Database

First real step was to migrate the TR DB between MySQL and Oracle

This is not quite as trivial as one first imagines

Each database supports some non-standard SQL features


Optimising databases is quite specific to the database engine

First attempts ran into trouble


And these are not entirely compatible

MySQL dump from BNL to CERN resulted in connections being dropped

Had to dump data at BNL and scp to CERN
Schema required some cleaning up

Dropped unused tables

Removing null constraints, CLOB->VARCHAR, resizing some text fields
However, after a couple of trial migrations we were confident that data
could be migrated in just a couple of hours
Graeme Stewart: ATLAS Computing
Migration

Migration occurred on Monday December 8th



Database data was migrated in a couple of hours
Two days were then used to iron out any glitches

In the Task Request interfaces

In the scripts which manage the Task Request to ProdDB interface
Could this all have been prepared in advance?

In theory yes, but we are migrating a live system

So there only a limited amount of test data which can be inserted
into the system


Real tasks trigger real jobs
System was live again and accepting task requests on Wednesday

Latency of tasks in the production system is usually several days,
even for short tasks

Acceptable to the community
Graeme Stewart: ATLAS Computing
A Tale of Two Infrastructures
MySQL
DB
Oracle
DB

New panda monitor setup required DB plugins to talk to both MySQL and
to Oracle

The MySQLdb module is bog standard

The cx_oracle module much less so

In addition Python 2.4 was the supported infrastructure at BNL as
opposed to Python 2.5 at CERN

This meant after the TR migration the BNL monitors started to have a
more limited functionality

This had definitely not been in the plan!
Graeme Stewart: ATLAS Computing
PanDA Servers

Some preliminary work on the panda server has been done already in 2008

However much still required to be done to migrate the full suite of panda
server databases:


PandaDB – holds live job information and status (‘fast buffer’)

LogDB – holds pilot logfile extracts

MetaDB – holds panda scheduler information on sites and queues

ArchiveDB – ultimate resting place of any panda job (big!)
For most databases the data volume was minimal and the main work was in
the schema details


Including the setup of Oracle triggers
For the infrastructure side we copied the BNL setup, with multiple panda
servers running on the same machines as the monitors


We knew the load was low and the machines were capable
We also required one server component which interfaces between the
panda servers and ProdDB, bamboo

Same machine template worked fine
Graeme Stewart: ATLAS Computing
ArchiveDB

In MySQL, because of constraints on the table performance vs. size an
explicit partitioning had been adopted


One ArchiveDB table for every two months of jobs

Jan_Feb_2007

Mar_Apr_2007

…

Jan_Feb_2009
In Oracle internal partitioning is supported:

CREATE TABLE jobs_archived (<list of columns>) PARTITION BY
RANGE(MODIFICATIONTIME) ( PARTITION jobs_archived_jan_2006 VALUES
LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')),
PARTITION
jobs_archived_feb_2006 VALUES LESS THAN (TO_DATE('01-MAR-2006','DDMON-YYYY')),
PARTITION jobs_archived_mar_2006 VALUES LESS THAN
(TO_DATE('01-APR-2006','DD-MON-YYYY')), …

This allows for considerable simplification of the client code in the
panda monitor
Graeme Stewart: ATLAS Computing
Integrate, Integrate, …

By late February trial migrations of the databases had happened to
integration databases hosted at CERN (the INTR database)

Trail jobs had been run through the panda server, proving basic
functionality

Decision now had to be made on final migration strategy

This could be ‘big bang’ (move the whole system at once) or ‘inflation’
(gradually migrate clouds one by one)

Big bang would be easier for, e.g., panda monitor

But would carry greater risks – suddenly loading the system with 35k
running jobs was unwise

If things went very wrong it might leave us with a big mess to recover
from

External constraint was the start of the ATLAS cosmics rereprocessing campaign due to start 9th March

We decided to migrate piecemeal
Graeme Stewart: ATLAS Computing
Final Preparations

In fact PanDA did have two heads already

IT and CERN clouds had been run from a parallel MySQL setup from
early 2008

This was an expensive infrastructure to maintain as it did not tap into
CERN IT supported services

It was obvious that migrating these two clouds would be a natural
first step

Plans were made to migrate to the ATLAS production database at
CERN (aka ATLR)

Things seemed to be under control a few days before…
Graeme Stewart: ATLAS Computing
DBAs

Friday before we were due to migrate CERN DBAs asked us not to do so

They were worried that not enough testing of the Oracle setup in
INTR has been done

This triggered a somewhat frantic weekend of work, resulting in
several thousand jobs being run through the CERN and IT clouds
using the INTR databases

From our side this testing looked to be successful

However, we reached a subsequent compromise that

We would migrate the CERN and IT clouds to panda running against the
INTR

They would start backups on the INTR database giving us the confidence
to run production for ATLAS though this setup

Subsequent migration from INTR to ATLR could be achieved much more
rapidly as the data was already in the correct Oracle formats
Graeme Stewart: ATLAS Computing
Tuning and Tweaking

Migration of PandaDB, LogDB, MetaDB was very quick

There was one unexpected piece of client code which hung during the
migration process (polling of CERN MySQL servers)

Migration and index building of ArchiveDB was far slower

However, we disabled access to ArchiveDB and could bring the
system up live within half a day

Since then a number of small improvements in the panda code have been
made to help optimise use of oracle

Connections are much more expensive in Oracle than in MySQL

Restructure code to use a connection pool

Create common reader and writer accounts for access to all database
schemas from the one connection


Migration away from triggers to .nextval() syntax
Despite fears, migration of panda server to oracle has been relatively
painless and been achieved without significant loss of capacity
Graeme Stewart: ATLAS Computing
Cloud Migration

Initial migration was for CERN and IT clouds

We added NG, the new Nordugrid cloud, which was from a standing start

We added DE after a major intervention in which the cloud was taken
offline


Similarly TW will come up in the CERN Oracle instance
UK was the interesting case where we migrated a cloud live:

Switched bamboo instance to send jobs to CERN Oracle servers


Current jobs are left being handled by old bamboo and servers
Start sending pilots to UK asking from jobs from CERN Oracle
servers

Force the failure of jobs not yet started in the old instance

These return to prodDB and then are picked up again by panda using the
new bamboo


Old running jobs are handled correctly by the ‘old’ system
There will be a subsequent re-merge into the CERN ArchiveDB
Graeme Stewart: ATLAS Computing
Monitor Blues

A number of problems did arise in the new monitor setup required for
the migrated clouds

Coincident with the migration there was a repository change from
CVS to SVN

However, the MySQL monitor was deployed from CVS and the Oracle
monitor from SVN

This lead to a number of accidents and minor confusions which it took a
while to recover from

New security features cause some loss of functionality at times as it
was hard to check all the use cases


And the repository problems augmented this
However, these are now mostly resolved issues and ultimely the system
will in fact become simpler
Graeme Stewart: ATLAS Computing
Conclusions

Migration of the panda infrastructure from BNL to CERN has underlined how
difficult the transition of a large scale, live, distributed computing system is

A very pragmatic approach was adopted in order to get the migration done in a
reasonable time

Although it always takes longer then you think


Much has been achieved

Monitor and task request database fully migrated

CERN Panda server infrastructure moved to Oracle




(This is true even when you try and factor in knowledge of the above)
Now running 5(6) of the 11 ATLAS clouds: CERN, DE, IT, NG, UK, (TW)
Remaining migration steps are now a matter of scaling and simplifying
We learned a lot

Love your DBAs, of course

If we have to do this again, now we know how
But there is still considerable work to do

Mainly in improving service stability, monitoring and support proceedures
Graeme Stewart: ATLAS Computing