RLS Tier-1 Deployment

Download Report

Transcript RLS Tier-1 Deployment

RLS Tier-1 Deployment
James Casey,
PPARC-LCG Fellow, CERN
[email protected]
10th GridPP Meeting, CERN, 3rd June 2004
Database and Application Services
Overview





RLS for LCG at CERN and the Tier-1s
Tools for deploying the RLS
Deployment Progress
Evolution of RLS Architecture
Summary
Database and Application Services
RLS at CERN

Initial design of RLS system involved
 distributed Local Replica Catalogs (LRCs) and indices (RLIs)
RLIs never deployed in production environment


EDG RLS chosen as single Grid Catalog for LCG-1/LCG-2
 Since RLIs not proven, a single RLS (LRC/RMC) was provided
at CERN for each LCG Virtual Organization
 single, separate Metadata Catalog (RMC)

EDG supported RLS on both Oracle and MySQL/Tomcat
 Different VOs and different sites may need different levels
of Quality of Service

Oracle chosen for LCG deployment at CERN
 CERN IT-DB has a large experience with Oracle
 Single catalog required high availability if LCG was to work
Catalog acts as single point of access to grid files

Database and Application Services
RLS at Tier-1s (1/2)

New CERN Oracle contract in December 2002
 Based on “named users”
 Products can be used anywhere in the world by any of the
‘named users’
 Sites do not need to buy licenses for Oracle Servers running
LCG applications

Many Tier-1s already have Oracle experience
 RAL, NIKHEF, Taiwan, FZK, CNAF, …

Tier-1s run central services, and require the manageability
and availability
 Natural choice to try and deploy RLS components at Tier-1
using Oracle
Database and Application Services
RLS at Tier-1s (2/2)

The following deployment plan devised:
 Support Platforms in use at CERN only
RedHat Linux ES 2.1
 Use the same tools we use to deploy these products locally at
CERN


We provide the distribution toolkits for Oracle products
 Binary installation kits
 configuration scripts

Support is provided for
 the installation and deployment of Oracle
 the deployment of the “shrink-wrapped” application

Focus on standard environment to make things easier
 Sites can do things differently if they have the knowledge,
but we don’t support it
Database and Application Services
RLS Deployment Details (1/3)

Operating System
 RedHat ES/AS 2.1
 Oracle specific additions/fixes (binutils, fuser symlink)
 Standard disk layout assumed (based on Oracle Flexible
Architecture)

Database
 Oracle 9i 9.2.0.4 (move to 9.2.0.5 soon)

Application Server
 Oracle 9iAS 9.0.3.1 + Local Fixes
Database and Application Services
RLS Deployment Details (2/3)

Binary Install Kits
 Oracle 9i 9.2.0.4 single instance
 Oracle 9iAS 9.1.3.1 single instance

Environment configuration
 .bashrc/.cshrc/sysconfig configuration

init.d scripts
 For both Oracle 9i/9iAS
 Delivered as RPMs for RHES2.1

Database Creation
 Generic tool to create database instance, using per application
specific configuration files
oracle config files (pfile, tnsnames, init.ora, orapw)
Instance creation SQL


Database and Application Services
RLS Deployment Details (3/3)


Configuration files for Database Creation specific to RLS
Database deployment scripts
 create-tablespaces, create-users, create-schemas

Application Server deployment scripts
 deploy-webapps , undeploy-webapps, alias-webapp

CERN specific tools that other can decide to use
 DNS alias based application server fail-over
 Application level monitoring
 Application level statistics gathering and visualization
Database and Application Services
Communication with Sites

The Savannah portal will act as main point of contact
between CERN IT-DB team and the Tier1 administrators
http://savannah.cern.ch/projects/lcg-orat1/
 CVS Repository
http://savannah.cern.ch/cvs/?group=lcg-orat1
 File Download area
http://savannah.cern.ch/files/?group=lcg-orat1
 FAQ List
http://wwwdb.web.cern.ch/wwwdb/savannah-files/oralcgt1/docs/




Still being completed – much information already there, but
some missing
 Feedback welcome !!!
Database and Application Services
Deployment Progress

Summer/Autumn 2003
 Tier-1s invited to participate via Grid Deployment Board
 Experiments informed of plan via the LCG Applications Area
 Scripts used to install production version of catalogs for LCG-1/LCG-2
at CERN
 Academica Sinicia (Taiwan) deploys RLS Catalogs

December 2003
 First full version of distribution kits released to Tier-1s
 CNAF deploys RLS
Used in Replication tests between CERN and CNAF in
conjunction with CMS


April 2004
 FZK deploys RLS

May 2004
 Meeting at CERN with RAL representatives
Database and Application Services
Evolution of RLS Architecture

Initial design of RLS system does not scale well for common
use cases
 e.g. catalog extraction, bulk inserts, cross-catalog queries

Single LRC/RMC at CERN for Data Challenges in 2004
showed problems in deployed architecture
 “bottleneck” – all jobs had to contact catalog at CERN
 Single point of failure

Future architecture looks to replicated LRC/RMC at “core”
sites
 Targeted for Data Challenges in 2005
 CERN Tier-0 and several Tier-1s
 Number of Tier-1s in range 4 -> 6
Database and Application Services
Replication

Replicated multi-master databases will let the system grow
 Most commercial-strength databases already support
replication, so don’t try and invent it again! (no need for RLIs)

Tests of Replication carried out by IT-DB, CNAF and CMS
during CMS DC04
 Used Oracle Advanced Replication

Oracle now recommend Oracle Streams as better way of
doing replication
 Project started to test Oracle Streams based replication
Use sample workloads provided by CMS, using input from
DC04


Needs to be integrated with application for best results
 Conflict avoidance, not conflict resolution
Database and Application Services
Summary

RLS scalability requires a move from single site (CERN) to
distributed system
 Requires involvement of Tier-1s

Scripts prepared to make deployment as easy as possible
for Tier-1 administrators
 Oracle installation
 Application installation and monitoring

Next steps involve testing real performance and reliability
of distributed setup
 This will be needed to support 2005 Data Challenges
 Architecture will need to evolve as these new requirements
appear

Should build on the knowledge and expertise of Tier-1 sites
in running high-availability services under Oracle
Database and Application Services