Oracle Streams Replication to T1 Dimitrov, Gancho - Indico

Download Report

Transcript Oracle Streams Replication to T1 Dimitrov, Gancho - Indico

TAG Monitoring,
Performance and
Scalability
Florbela Viegas, CERN ADP
11/04/2016
1
Monitoring
 The path to performance is aided by monitoring. There are
two sources of information gathering :
 Watch -> Log what the users are doing.
 Ask
-> Understand what the usage patterns are and will
be , by knowing the users and knowing the data.
 If we can’t see what users are doing, we don’t know what to
improve.
 At this time, we watch the usage at several levels:



Database SQL
Webservices statistics (awstats)
Service usage queries and statistics (Logging WebService)
11/04/2016
2
TAG Logging Service





Logging activity from:

iELSSI, Extract, Event Lookup /GUID Counting, Histogramming
What‘s being logged:

Use cases (which action?), logical queries, DB connections,
timing, users etc.

Analysis jobs on raw logging data:

Aggregate data per collection / per run -> which
collections/runs/passes etc. are accessed?
Example statistics:

number of (distinct) users per time period

number of queries per service/deployment per time period

Data popularity

Usage of resources
What can be taken out of it:

Optimization of data distribution

Performance of sites -> improved site selection
Interface to the logging information under development
11/04/2016
3
Performance
 Database Performance (see https://savannah.cern.ch/task/?19056)
 SQL has been regularly analyzed to check execution plans.
 Services have implemented improved queries.
 Lower Optimizer_index_cost_adj on logon trigger for better
optimizer plans.
 11g databases have produced better plans due to better default
statistics gathering of statistics: better global partitioned statistics,
good histogram buckets.
 Service Performance:
 Services have been streamlined for network traffic with result
caching in memcached and data pruning at database level, instead
of client level.
 Firefox 4.0 has improved the Javascript speed: huge gains for
ELSSI Web
 Further improvements are dependent on knowledge of user
patterns – which are the more used bits? Which are the
most used attributes? What can we cache?
11/04/2016
4
Scalability
 TAG is a true distributed database. The data is scattered
across 5 sites at present.
 Distributed storage capacities :
DB
Used
Free
Total
ATLARC
10 TB
10TB
20 TB
PIC
4 TB
1 TB
5 TB
DESY
7 TB
10 TB
17 TB
RAL
2 TB
8 TB
10 TB
TRIUMF
7 TB
20 TB
27 TB
TAG Distributed Global DB
30 TB
49 TB
79 TB
11/04/2016
5
TAG Data Distribution
DESY
PIC
ELSSI Suite
ELSSI Suite
CERN
COMA DB
COMA DB
COMA DB
All Data except
Monte Carlo
Monte Carlo and
some recent
data
Most Recent
Data (no MC)
TRIUMF
RAL
CNAF
COMA DB
COMA DB
COMA DB
Most Recent
Data ( prepare
for MC)
Future Site (June
2011)
TASK DB
TASK DB
All Data except
Monte Carlo
11/04/2016
6
TAG Geo Coverage
Europe & West Asia
Americas , East Asia & Australia
CERN ELSSI Suite
TRIUMF ELSSI Suite
11/04/2016
7
Scalability & Performance
 Catalog integration in the services has given them
capabilities for parallelizing queries across remote
sites.
 We can take advantage of this to query faster, even
at the expense of more data redundancy across
sites.
 ELSSI Suite of Services is today in a position to be
« distributed aware » and take full advantage of
data and service catalog.
 Queries can be broken down across servers, and
parallelized for optimization.
11/04/2016
8