VOMS - Agenda INFN
Download
Report
Transcript VOMS - Agenda INFN
Replicazione e QoS nella
gestione di database
grid-oriented
Barbara Martelli
INFN - CNAF
Outline
LCG 3D (Distributed Deployment of
Databases) project status
Oracle High Availability/Replication features
MySQL High Availability/Replication features
Databases in the GRID
Oracle replication case study: LFC
MySQL replication case study: VOMS
2
Oracle Streams
http cache (SQUID)
Cross DB copy &
MySQL/SQLight Files
T0
O
S
- autonomous
reliable service
Not Implemented
LCG 3D Service Architecture
M
S
T1-to
dbinvestigate
back bone
Is it possible/interesting
- all data replicated
- reliable service
Oracle Heterogeneus Connectivity
for
Tier-1 O
to Tier-2
replication?
F
Successfully
O
Implemented
Online DB
-autonomous
reliable service
O
S
T2 - local db cache
-subset data
-only local service
M
R/O Access at Tier 1/2
(at least initially)
S
3
Oracle Building Blocks
Each cloud has to guarantee high
availability, scalability, fault tolerance
@ CNAF High availability achieved at
different levels:
RAC
Storage H/W level: RAID, Storage Area
Network
Storage Logic level: logical volume
manager Automatic Storage Manager
Database level: Real Application Clusters.
Database shared among different servers.
Load balancing, connection retries,
failover implemented in Oracle drivers
(quasi-transparent to applications)
Disaster recovery: Recovery MANager
backups (RMAN)
Retention policy on disk: 2 days
Retention policy on tape 31 days
Availability rate: 98,7% in 2007
ASM
Availability (%) = Uptime/(Uptime + Target Downtime + Agent Downtime)
4
Oracle Streams
Replication
Master DB
Capture
Queue
LCR
Propagation
Queue
Replica DB
Apply
Redo
Log
Database Objects
Database Objects
5
MySQL High Availability
and replication features
Master – Slave replication:
Referred as Asynchronous replication
Available since 3.23
stable and reliable feature
Some examples of it in GRID production deployment (VOMS)
The original databases is managed by master.
The slave manages a copy of the original databases.
The update queries (update, delete and insert in SQL jargon) must be executed only
on the master host.
SQL and update commands are replicated, not the changed data
Multimaster replication
Available since 5.0
new and not fully tested feature
Possible only under particular conditions which allow for simple conflict resolution
policies
MySQL cluster
Referred as Synchronous replication
It doesn’t seem to be a stable feature as you can read from the MySQL 5.1
manual
“This chapter represents a work in progress, and its contents are subject to
revision as MySQL Cluster continues to evolve”
6
I Know of no MySQL production systems currently deployed as cluster
Databases in GRID services
Databases are key components of various GRID
components (list not exhaustive):
FTS
Database used for data persistency
MySQL and Oracle backends supported, but Oracle is recommended
High availability through Clusters
https://twiki.cern.ch/twiki/bin/view/EGEE/FTS
LFC
MySQL and Oracle backends supported
Both MySQL and Oracle replication supported
https://twiki.cern.ch/twiki/bin/view/LCG/LfcAdminGuide
VOMS
MySQL and Oracle backends supported
Both MySQL and Oracle replication are supported
7
http://www.grid.auth.gr/guides/voms_replication/voms_replication.php
Oracle replication case study:
LFC
LFC: LCG File Catalog is a high performance file catalog which stores
LFN
GUID
PFN
mappings.
Oracle One-way Streams replication is used in WLCG in order to balance the load
of LFC read-only requests among different catalog residing in various Tier-1s
The LFC code has been slightly modified in order to prevent an user to accidentally
write into a read – only catalog. The only thing an administrator has to do, is to set
the variable
RUN_READONLY="yes"
in the /etc/sysconfig/lfcdaemon configuration file.
Database replication has to replicate all tables except CNS_USERINFO and
CNS_GROUPINFO
In case of write attempts on the read-only LFC, you would get an error:
$ lfc-mkdir /grid/dteam/hello
cannot create /grid/dteam/hello: Read-only file system
Replication speed requirements are not very strict:
Update frequency ~ 1 Hz
Replication latency < 10 min
8
LHCb LFC Replication deployment
CERN-CNAF
CERN
CNAF
Read Only Clients
Read Only Clients
r/w Clients
LFC R-O
Server
LFC R-W
Server
LFC R-W
Server
6 nodes Cluster
Master Oracle DB
Stress test: insertions
at 900 Hz for 24
hours
LFC R-O Server LFC R-O Server
Max latency :
55 sec
Mean latency: 15 sec
Full Consistency
maintained
2 nodes Cluster
Oracle Streams
Replica Oracle DB
WAN
9
MySQL replication case study:
VOMS
The Virtual Organization Membership Service server manages authorization
data
provides a database of users, groups, roles and capabilities that are grouped in
Virtual Organizations (VO's)
users query the VOMS Server in order to get their VO grid credentials (proxy)
read-only operations originated by various command such as voms-proxy-info.
They could be balanced across read only VOMS replicas
write operations are originated by mk-gridmap and voms-proxy-init commands
Expected write-rate on the VOMS server is:
1 Hz of voms-proxy-init
Peaks of 100 Hz of mk-gridmap (to be fixed)
A MySQL master-slave replication deployment can be useful for load
balancing and fail over in case of read-only operations
VOMS supports MySQL one-way replication.
http://wiki.egee-see.org/index.php/SEE-GRID_VOMS_Failover
Some examples of VOMS on replicated MySQL:
LIP (Portugal)
Fermilab
CNAF – INFN Padova (CDF VOMS)
10
VOMS replicated deployment
VOMS code has been adapted to MySQL replication, it provides a script which
creates a slave MySQL replica, given a Master MySQL and a consistent dump.
http://glite.cvs.cern.ch:8180/cgi-bin/glite.cgi/org.glite.security.voms/src/replica/voms_install_replica.in?revision=1.3.4.1&pathrev=glite-security-voms_branch_1_8_0
Concurrent writes
VOMS server has a web component, running in a web container provided by
TomCat System that keeps the administration interface.
Problem: The administration interface running on a slave host will update the
seqnumber and realtime tables of each VO database.
Solution: Data from those tables must not be replicated to the slave hosts.
replicate-ignore-table=VOMS_seqnumber
replicate-ignore-table=VOMS_realtime
Some stress tests performed by Fermilab:
http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=2571
VOMS MySQL successfully queried at 125Hz (10.8M/day)
System load – 0.2, CPU – 10% (dual-core machine)
Simulated failures of one VOMS servers
Disabled network: New requests not routed to failed server
Re-enabled network: server added back to the pool for scheduling
Open connections during service failure are lost
Affected number of connections is very small (1-2)
Simulated failure of MySQL server
After re-enabling server, transaction logs replayed automatically
11
VOMS on Oracle replication is under test and will be available soon
Conclusions
Different high availability/redundancy techniques
have been tested in WLCG environment and allow
for a good availability of GRID database services
Both Oracle and MySQL replication solutions have
been deployed in WLCG and offer different solutions
in order to address different kind of load
LCG 3D project have developed a Tier-0 to Tier-1
replication but have left the Tier-1 to Tier-2
distribution issues to sites. Do we need to address
them?
12