Gancho_ADC_weekly_24thJan2012

Download Report

Transcript Gancho_ADC_weekly_24thJan2012

ADC weekly, 24th January 2012
Status of the ATLR and ADCR after the database upgrade
and hardware migration on 17th Jan 2012
Gancho Dimitrov
24-Jan-2012
Gancho Dimitrov
1
ATLR and ADCR
 Successful and within the foreseen downtime interventions on the
ATLR and ADCR databases (17th January 2012 10:00-14:00). Thanks
to all people involved and in particular to the the PhyDB DBAs.
 ADCR is with increased number of nodes in the database cluster: from
3 before to 4 nodes now.
 No issues with connectivity. All applications were able to connect
without problems to the new database.
 Visible decrease in the load on the database machines. In particular on
the ADCR1 (DQ2) and ADCR2 (PanDA) instances (not much user
workload on the ATLR yet).
The new HW with 3 times larger data pool cache and storage with SSD
cache brought down the IO requests and thus the CPU usage as well.
24-Jan-2012
Gancho Dimitrov
2
ADCR1 database instance (DQ2)
24-Jan-2012
Gancho Dimitrov
3
ADCR2 database instance (PanDA)
24-Jan-2012
Gancho Dimitrov
4
ADCR3 database instance (LFC)
24-Jan-2012
Gancho Dimitrov
5
Issues found
 1) Row lock contention from update activity on the
PANDAMETA.CACHE table.
=> Initially under suspition was a contention on the LOB (Large Object)
segments of the table. However the encoded implementation on the
client side needs to be verified. The issue currently under investigarion
with trials on reproducing the problem on a the INT8R testbed.
 2) Partial replication of the ATLAS_MDT_DCS schema.
=> Turned to be due to silent supress of the tables which have
partitions with compressed data. Implemented workaround last Friday
(20th Jan) and since then the replication for this account is in synch.
 3) Increased rate of disk reads on the ADCR2 instance (PanDA)
Under investigation now... Few queries on focus with high execution
rate or changed execution plan or changed time range of interest (for
example, the ‘wnList’ query of PanDA monitor)
24-Jan-2012
Gancho Dimitrov
6
ADCR2 (PanDA) megabytes read per sec
24-Jan-2012
Gancho Dimitrov
7
ADCR1 (DQ2) megabytes read per sec
24-Jan-2012
Gancho Dimitrov
8
Conclusions (+ a reminder)
 Successful DB interventions with a minimal possible
database downtime. All that due to the efforts of many
people. Special thanks to the PhyDB support.
 Ongoing work on evaluating the user workload and the
Oracle 11g behaviour.
 And a reminder for the need of a short downtime of the
databases for change of the Oracle ‘compatible’ parameter.
Below is extract from Dario’s presentation ADC weekly,
20th Dec 2011.
24-Jan-2012
Gancho Dimitrov
9