LTC-DB-status - Indico
Download
Report
Transcript LTC-DB-status - Indico
European Organization For Nuclear Research
Status of the Accelerator
Online Operational Databases
Ronny Billen, Chris Roderick
Accelerators and Beams Department
Controls Group
LTC – 7 March 2008
2
7-03-2008
LTC - Controls session - Databases
Outline
The Accelerator Online Operational Databases
Current Database Server Situation
Evolution of the Provided Services
Performance Hitting The limits
2008: Planned Upgrade and Migration
Logging Data : Expected Vs Acceptable
The Future
Conclusions
3
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
Implications, Policy and Constraints for Applications
The Accelerator Online Operational Databases
Data needed instantaneously to interact with the accelerator
Database is between the accelerator equipment and the client
(operator, equipment specialist, software developer)
LSA – Accelerator Settings database
MDB – Measurement database
LDB – Logging database
CCDB – Controls Configuration
E-Logbook – Electronic Logbooks
CESAR – SPS-EA Controls
LASER – Alarms database
TIM – Technical Infrastructure Monitoring database
3-tier deployment of services for resource optimization
Client Application Server Database Server
4
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
Many database services, including APIs, and applications
Current Database Server Situation
SUNLHCLOG
Often referred to as the
“LHC Logging Database”
Technical
2-node cluster SUN Fire V240
2 x {single core 1GHz CPU, 4GB RAM,
2 x 36GB disks, 2 PS}
External Storage 9TB RAID 1+0 / RAID
5 mirrored & striped (~60% usable)
Main accounts - data
Logging: LHC HWC, Injectors, Technical
Services
Measurements: LHC HWC, Injectors
Settings: LSA for LHC, SPS, LEIR, PS,
PSB, AD
Today’s specifics
150 simultaneous user sessions
Oracle data-files 4.7 TB,
5
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
History
Purchased original setup: March 2004
Purchased extra disks: October 2006
Current Database Server Situation
Technical
Server SUN E420R
Often referred to as the
{450MHz CPU, 4GB RAM,
“Controls Configuration Database”
2x36GB disks}
External Storage 218GB
SUNSLPS
Main accounts - data
AB-Controls, FESA, CMW, RBAC,
OASIS
CESAR, PO-Controls, INTERLOCK
e-Logbooks, ABS-cache
Historical SPS and TZ data
LSA Test
Today’s specifics
200-300 simultaneous user sessions
Oracle data-files 32GB
6
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
History
Installed in January 2001
Evolution of the Provided Services
LSA Settings: operationally used since 2006
Deployed on SUNLHCLOG to get best performance
Used for LEIR, SPS, SPS & LHC transfer lines, LHC HWC
Continuously evolving due to requirements from LHC and PS
Measurement Service: operationally used since mid-2005
Satisfying central short-term persistence for Java clients
Generates accelerator statistics
Increasingly used for complete accelerator complex
Logging Service: operationally used since mid-2003
Scope extended to all accelerators, technical data of experiments
Equipment expert data for LHC HWC: accounts for >90% volume
Largest consumer of database and application server resources
7
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
Provides data filtering and transfer to long-term logging service
R. Billen, C. Roderick
Evolution of the Logging – Data Volume
8
7-03-2008
LTC - Controls session - Databases
Evolution of the Logging – Data Rates
CIET
QPS
9
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
CRYO
Performance Hitting The Limits
I/O Limits
I/O subsystem is used for reading and writing data
Recent samples: 4 to 37 clients waiting for I/O subsystem
R. Billen, C. Roderick
No of active sessions waiting for I/O subsystem
10
7-03-2008
LTC - Controls session - Databases
Performance Hitting The Limits
CPU Limits
CPU is always needed to do anything:
Data writing and extraction
Data filtering (CPU intensive) and migration from MDBLDB
Exporting archive log files to tape, Incremental back-ups
Migrating historic data to dedicated read-only storage
Hitting the I/O limits burns CPU
R. Billen, C. Roderick
Percentage of CPU used on I/O wait events
11
7-03-2008
LTC - Controls session - Databases
Performance Hitting The Limits
Storage Limits
Pre-defined allocated data-files difficult to manage (due to size)
Monthly allocations always insufficient (necessary)
Archive log file size insufficient (when backup service down)
R. Billen, C. Roderick
Storage Utilisation
12
7-03-2008
LTC - Controls session - Databases
2008: Planned Upgrade and Migration
Separate into 3 high-availability database services
Deploy each service on a dedicated Oracle Real Application Cluster
1. Settings & Controls Configuration (including logbooks)
Highest-availability, Fast response
Low CPU usage, Low disk I/O
~20GB data
Highest-availability
CPU intensive (data filtering MDBLDB), Very high disk I/O
~100GB (1 week latency) or much more for HWC / LHC operation
3. Logging Service
High-availability
CPU intensive (data extraction), High disk I/O
~10TB per year
13
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
2. Measurement Service
2008: Planned Upgrade and Migration
Additional server for
DataGuard testing:
Standby database for LSA
Oracle RAC 2
CTRL
CTRL
2 x quad-core
2.8GHz CPU
8GB RAM
Oracle RAC 3
CTRL
CTRL
Clustered NAS shelf
14x146GB FC disks
LSA Settings
Controls Configuration
E-Logbook
CESAR
14
7-03-2008
Measurements
HWC Measurements
Logging
Clustered NAS shelf
14x300GB SATA disks
LTC - Controls session - Databases
11.4TB
usable
R. Billen, C. Roderick
Oracle RAC 1
2008: Planned Upgrade and Migration
Dell PowerEdge 1950 Server specifications:
2x Intel Xeon quad-core 2.33 GHz CPU
2x 4 MB L2 cache
8GB RAM
2x power supplies, Network cards (10Gb Ethernet), 2x 72GB system
disks
2x disk Controllers (support for 336 disks (24 shelves))
2x disk shelves (14x 146GB Fibre Channel 10,000rpm)
8GB RAM (cache)
RAID-DP
Redundant hot-swappable: controllers, cooling fans, power supplies,
optics, and network cards
Certified >3000 I/O per second
15
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
NetApp Clustered NAS FAS3040 Storage specifications:
2008: Planned Upgrade and Migration
launched Sep-2007
Purchase order for servers (7/122)
launched Oct-2007
NetApps NAS storage shelves
arrived at CERN Nov-2007
Dell servers
arrived at CERN Jan-2008
Additional mounting rails for servers
ordered Jan-2008
Servers
stress-tested Jan-2008
Rack space
liberated Feb-2008
Server and storage
fully installed 7-Mar-2008
Oracle system software
installed, configured 14-Mar-2008
Database structures
deployed (AB/CO/DM)
Database services
ready for switch-over
Switch to services of new platform
(1-day stop) 21-Mar-2008?
Migration of existing 5TB logging data to new platform
(later)
Purchase additional logging storage for beyond 2008 (Sep-2008)
16
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
Purchase order for storage (2/11)
Implications, Policy and Constraints for Applications
Foreseen for all services, already implemented for a few:
Implications
All applications should be cluster-aware
Database load-balancing / fail-over (connection modifications)
Application fail-over (application modifications)
Policy
Constraints
Use APIs for data transfer (no direct table access)
Enforce controlled data access
Register authorized applications (purpose, responsible)
Implement application instrumentation
Provide details of all database operations (who, what, where)
17
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
Follow naming conventions for data objects
Logging Data: Expected Vs Acceptable
Beam related equipment starting to produce data
BLM
6,400 monitors * 12 * 2(losses & thresholds) + crate status
= ~154,000 values per second (filtered by concentrator & MDB)
XPOC
More to come…
Maximum: 1 Hz data frequency in Logging database
Not a data dump
Consider final data usage before logging – only log what is needed
Logging noise will have a negative impact on data extraction
performance and analysis
18
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
Limits
The Future
Logging Data
Original idea keep data available online indefinitely
Data rates estimated ~10TB/year
Closely monitor evolution of storage usage
Order new disks for 2009 data (in Sept 2008)
Service Availability
New infrastructure has high-redundancy for high-availability
Scheduled interventions will still need to be planned
Use of a standby database will be investigated, with the objective
of reaching 100% uptime for small databases
19
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
Migrate existing data (~4TB) to new disks
Conclusions
Databases play a vital role in the commissioning and operation of
the Accelerators
Database performance and availability have a direct impact on
operations
Today, the main server SUNLHCLOG is heavily overloaded
Based on experience, and the evolution of existing services, the
new database infrastructure has been carefully planned to:
Provide maximum availability
Provide independence between the key services
Scale in function of data volumes, and future requirements
The new database infrastructure should be operational ahead of
injector chain start-up and LHC parallel sector HWC
20
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
Address performance issues