poster - Indico
Download
Report
Transcript poster - Indico
LCG CONDITIONS DATABASE
COOL - PERFORMANCE TESTS
A. Valassi, L. Canali (CERN IT-PSS, Geneva, Switzerland), M. Clemencic (CERN PH-LBC, Geneva, Switzerland / LHCb),
D. Front (Weizmann Institute, Israel, and CERN IT-PSS, Geneva, Switzerland),
U. Moosbrugger, S. A. Schmidt (University of Mainz, Germany / Atlas), S. Stonjek (University of Oxford, UK / Atlas)
COOL Performance in Atlas Prompt Reconstruction (Simplified Scenario)
Atlas requirements (1 – processing)
• One reconstruction job every 5s
–
–
To process all events previously taken during a 5s interval
Conditions for all 5s are fetched together when job starts
• Events processed in the order they were taken
–
–
Hence IOVs are read sequentially from the COOL database
To be confirmed: maybe ‘almost’ in the order they were taken?
• Reconstruction farm of 100 nodes
–
–
10 processes per node (1k simultaneous processes in total)
Hence time available for one reconstruction job is 5000s
Atlas requirements (2 – conditions)
• 100 MB of conditions to process one event
–
Atlas ‘snapshot’ description at any given validity time
• Sustained throughput: 20 MB/s and 20k rows/s
–
1000 bytes per channel (typically, numbers and strings)
• Sustained I/O: 300 kB/s and 300 rows/s
–
Hence one condition changes every 3ms (5min/100k)
–
–
–
100 COOL folders with 1k channels in each folder
NB: actual requirement was 1k folders with 100 channels
100 relational tables to be separately queried
• 100k channels with uncorrelated validities
• Typical IOV duration is 5 minutes
• 100 different payload schemas
Simulated conditions for the test
Test setup
Hence: COOL performance requirements
Result: SUCCESS!
–
Total throughput from the database server to all clients
–
–
–
May exploit server data cache if conditions are fetched in order
Only 1/60 of data in [5s,10s] are different from those in [0s,5s]
Retrieve 5s chunks of conditions with 5min IOV duration
Test database
covers 5 hours
(61 IOVs)
70% CPU on each of 10 clients
(10 processes per client)
Sustained 12 + 9 kRows/s
… at least for this (how realistic?) tested scenario…
Client side response
Sustained data influx 2 MB/s
Distributed data access tests
Work in progress
• COOL team and DBAs: effect of data cache
Data cache size required to hold all data larger than expected
Larger I/O rates than expected (not all in cache?)
Is it safe to assume that the cache can be used?
Effect of data block size on cache and I/O?
–
–
Original 1k folder requirement more difficult to handle
Shared pool latches observed under certain conditions?
• COOL team and DBAs: effect of table number
• COOL team and DBAs: network data rates
–
SQL*Net data compression for identical values in different rows?
• COOL team: client-side C++ overhead
–
Less than 10s (out of 70s) are spent on the server CPU
• Atlas: confirm/modify detailed requirements
–
–
–
–
100k channels (one condition change every 3ms?)
100 vs. 1k folders (1k different schemas/tables?)
Events processed in the order they were taken?
Can reconstruction jobs process more than 5s event chunks?
– Higher values for some parameters may lead to
scalability problems (e.g. 1k vs 100 folders)
30
• Access from COOL client in Oxford
–
–
–
To Oracle server in Oxford
To Oracle server at RAL (near Oxford)
To Oracle server at CERN
• Main components of client real time:
–
–
–
Client user time (COOL C++ data manipulation)
Server time (e.g. Oracle server CPU and I/O)
Client-server network round-trips
• Remote access time is dominated by
network round-trip component
–
–
Oxford-CERN
25
real time / user time
–
–
–
–
20
15
10
5
Oxford-RAL
Oxford-Oxford
0
Roughly proportional to ping latency
Use bulk retrieval to minimise #round trips
0
5
10
15
20
ping latency [m s]
Relational query improvements
Examples of past improvements
• Multi-channel bulk retrieval
–
Single query on each IOV table (with optional channel selection)
–
Wrong (old) execution plan observed otherwise
• Force SQL hard parse when gathering statistics
Work in progress
• Increasing time for IOV retrieval
–
–
Querying on both ‘since’ and’ until’ is not optimal
New ‘max (since)’ query strategy will be implemented
–
Identified during Atlas prompt reconstruction tests
–
–
Extra ‘channel’ table must be added to the schema
Need bulk update/delete functionalities from CORAL
• Missing multi-dimensional indices
• Multi-channel bulk insertion
Andrea Valassi (CERN IT-PSS)
CHEP 2006, Mumbai (13 - 17 February 2006)