LTC-DB-status - Indico

Download Report

Transcript LTC-DB-status - Indico

European Organization For Nuclear Research
Status of the Accelerator
Online Operational Databases
Ronny Billen, Chris Roderick
Accelerators and Beams Department
Controls Group
LTC – 7 March 2008
2
7-03-2008
LTC - Controls session - Databases
Outline
 The Accelerator Online Operational Databases
 Current Database Server Situation
 Evolution of the Provided Services
 Performance  Hitting The limits
 2008: Planned Upgrade and Migration
 Logging Data : Expected Vs Acceptable
 The Future
 Conclusions
3
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 Implications, Policy and Constraints for Applications
The Accelerator Online Operational Databases
 Data needed instantaneously to interact with the accelerator
 Database is between the accelerator equipment and the client
(operator, equipment specialist, software developer)
 LSA – Accelerator Settings database
 MDB – Measurement database
 LDB – Logging database
 CCDB – Controls Configuration
 E-Logbook – Electronic Logbooks
 CESAR – SPS-EA Controls
 LASER – Alarms database
 TIM – Technical Infrastructure Monitoring database
 3-tier deployment of services for resource optimization
Client  Application Server  Database Server
4
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 Many database services, including APIs, and applications
Current Database Server Situation
SUNLHCLOG
Often referred to as the
“LHC Logging Database”
 Technical
 2-node cluster SUN Fire V240
2 x {single core 1GHz CPU, 4GB RAM,
2 x 36GB disks, 2 PS}
 External Storage 9TB RAID 1+0 / RAID
5 mirrored & striped (~60% usable)
 Main accounts - data
 Logging: LHC HWC, Injectors, Technical
Services
 Measurements: LHC HWC, Injectors
 Settings: LSA for LHC, SPS, LEIR, PS,
PSB, AD
 Today’s specifics
 150 simultaneous user sessions
 Oracle data-files 4.7 TB, 
5
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 History
 Purchased original setup: March 2004
 Purchased extra disks: October 2006
Current Database Server Situation
 Technical
 Server SUN E420R
Often referred to as the
{450MHz CPU, 4GB RAM,
“Controls Configuration Database”
2x36GB disks}
 External Storage 218GB
SUNSLPS
 Main accounts - data
 AB-Controls, FESA, CMW, RBAC,
OASIS
 CESAR, PO-Controls, INTERLOCK
 e-Logbooks, ABS-cache
 Historical SPS and TZ data
 LSA Test
 Today’s specifics
 200-300 simultaneous user sessions
 Oracle data-files 32GB
6
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 History
 Installed in January 2001
Evolution of the Provided Services
 LSA Settings: operationally used since 2006
 Deployed on SUNLHCLOG to get best performance
 Used for LEIR, SPS, SPS & LHC transfer lines, LHC HWC
 Continuously evolving due to requirements from LHC and PS
 Measurement Service: operationally used since mid-2005
 Satisfying central short-term persistence for Java clients
 Generates accelerator statistics
 Increasingly used for complete accelerator complex
 Logging Service: operationally used since mid-2003
 Scope extended to all accelerators, technical data of experiments
 Equipment expert data for LHC HWC: accounts for >90% volume
 Largest consumer of database and application server resources
7
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 Provides data filtering and transfer to long-term logging service
R. Billen, C. Roderick
Evolution of the Logging – Data Volume
8
7-03-2008
LTC - Controls session - Databases
Evolution of the Logging – Data Rates
 CIET
 QPS
9
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 CRYO
Performance  Hitting The Limits
 I/O Limits
 I/O subsystem is used for reading and writing data
 Recent samples: 4 to 37 clients waiting for I/O subsystem
R. Billen, C. Roderick
No of active sessions waiting for I/O subsystem
10
7-03-2008
LTC - Controls session - Databases
Performance  Hitting The Limits
 CPU Limits
 CPU is always needed to do anything:
 Data writing and extraction
 Data filtering (CPU intensive) and migration from MDBLDB
 Exporting archive log files to tape, Incremental back-ups
 Migrating historic data to dedicated read-only storage
 Hitting the I/O limits burns CPU
R. Billen, C. Roderick
Percentage of CPU used on I/O wait events
11
7-03-2008
LTC - Controls session - Databases
Performance  Hitting The Limits
 Storage Limits
 Pre-defined allocated data-files difficult to manage (due to size)
 Monthly allocations always insufficient (necessary)
 Archive log file size insufficient (when backup service down)
R. Billen, C. Roderick
Storage Utilisation
12
7-03-2008
LTC - Controls session - Databases
2008: Planned Upgrade and Migration
Separate into 3 high-availability database services
 Deploy each service on a dedicated Oracle Real Application Cluster
1. Settings & Controls Configuration (including logbooks)
 Highest-availability, Fast response
 Low CPU usage, Low disk I/O
 ~20GB data
 Highest-availability
 CPU intensive (data filtering MDBLDB), Very high disk I/O
 ~100GB (1 week latency) or much more for HWC / LHC operation
3. Logging Service
 High-availability
 CPU intensive (data extraction), High disk I/O
 ~10TB per year
13
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
2. Measurement Service
2008: Planned Upgrade and Migration
Additional server for
DataGuard testing:
Standby database for LSA
Oracle RAC 2
CTRL
CTRL
2 x quad-core
2.8GHz CPU
8GB RAM
Oracle RAC 3
CTRL
CTRL
Clustered NAS shelf
14x146GB FC disks
LSA Settings
Controls Configuration
E-Logbook
CESAR
14
7-03-2008
Measurements
HWC Measurements
Logging
Clustered NAS shelf
14x300GB SATA disks
LTC - Controls session - Databases
11.4TB
usable
R. Billen, C. Roderick
Oracle RAC 1
2008: Planned Upgrade and Migration
 Dell PowerEdge 1950 Server specifications:
 2x Intel Xeon quad-core 2.33 GHz CPU
 2x 4 MB L2 cache
 8GB RAM
 2x power supplies, Network cards (10Gb Ethernet), 2x 72GB system
disks
 2x disk Controllers (support for 336 disks (24 shelves))
 2x disk shelves (14x 146GB Fibre Channel 10,000rpm)
 8GB RAM (cache)
 RAID-DP
 Redundant hot-swappable: controllers, cooling fans, power supplies,
optics, and network cards
 Certified >3000 I/O per second
15
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 NetApp Clustered NAS FAS3040 Storage specifications:
2008: Planned Upgrade and Migration
launched Sep-2007
 Purchase order for servers (7/122)
launched Oct-2007
 NetApps NAS storage shelves
arrived at CERN Nov-2007
 Dell servers
arrived at CERN Jan-2008
 Additional mounting rails for servers
ordered Jan-2008
 Servers
stress-tested Jan-2008
 Rack space
liberated Feb-2008
 Server and storage
fully installed 7-Mar-2008
 Oracle system software
installed, configured 14-Mar-2008
 Database structures
deployed (AB/CO/DM)
 Database services
ready for switch-over
 Switch to services of new platform
(1-day stop) 21-Mar-2008?
 Migration of existing 5TB logging data to new platform
(later)
 Purchase additional logging storage for beyond 2008 (Sep-2008)
16
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 Purchase order for storage (2/11)
Implications, Policy and Constraints for Applications
Foreseen for all services, already implemented for a few:
Implications
 All applications should be cluster-aware
 Database load-balancing / fail-over (connection modifications)
 Application fail-over (application modifications)
Policy
Constraints
 Use APIs for data transfer (no direct table access)
 Enforce controlled data access
 Register authorized applications (purpose, responsible)
 Implement application instrumentation
 Provide details of all database operations (who, what, where)
17
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 Follow naming conventions for data objects
Logging Data: Expected Vs Acceptable
 Beam related equipment starting to produce data
 BLM
 6,400 monitors * 12 * 2(losses & thresholds) + crate status
= ~154,000 values per second (filtered by concentrator & MDB)
 XPOC
 More to come…
 Maximum: 1 Hz data frequency in Logging database
 Not a data dump
 Consider final data usage before logging – only log what is needed
 Logging noise will have a negative impact on data extraction
performance and analysis
18
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 Limits
The Future
Logging Data
 Original idea  keep data available online indefinitely
 Data rates estimated ~10TB/year
 Closely monitor evolution of storage usage
 Order new disks for 2009 data (in Sept 2008)
Service Availability
 New infrastructure has high-redundancy for high-availability
 Scheduled interventions will still need to be planned
 Use of a standby database will be investigated, with the objective
of reaching 100% uptime for small databases
19
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 Migrate existing data (~4TB) to new disks
Conclusions
 Databases play a vital role in the commissioning and operation of
the Accelerators
 Database performance and availability have a direct impact on
operations
 Today, the main server SUNLHCLOG is heavily overloaded
 Based on experience, and the evolution of existing services, the
new database infrastructure has been carefully planned to:
 Provide maximum availability
 Provide independence between the key services
 Scale in function of data volumes, and future requirements
 The new database infrastructure should be operational ahead of
injector chain start-up and LHC parallel sector HWC
20
7-03-2008
LTC - Controls session - Databases
R. Billen, C. Roderick
 Address performance issues