CHEP2009-BDII - Indico

Download Report

Transcript CHEP2009-BDII - Indico

Information System Evolution
Enabling Grids for E-sciencE
Overview
BDII v5
The information system is a mission-critical component of the EGEE
production infrastructure. It provides the detailed information about Grid
services which is required to discover, select and use them during Grid
related activities such as job and data management. The information
system components are found throughout the infrastructure, and are
especially sensitive to the information volume and query rate. As such it
must be ensured that current components can meet the scalability
requirements due to the growth of the infrastructure. An improved
Berkley Database Information Index (BDII) [1] architecture is presented
that has the potential to meet these future requirements.
Number of cores/jobs/sites
1000000
The new architecture for the BDII consists of a standard LDAP database
which is updated by an external process. The update process obtains
LDIF from a number of sources and merges them. It then compares this
to the contents of the database and creates an LDIF file of the
differences. This is then used to update the database. The aim of this
approach is to reduce complexity within the BDII and speed up the
update cycle, therefore enabling more data to be handled in a given
time period. This increased efficiency can be directly seen from viewing
the graph below, which shows the once minute load average before and
after upgrading from BDII v4 to BDII v5.
Query
2170
LDAP
100000
10000
Update
Query
One minute load average before and after upgrading
1000
Update LDIF
LDIF DIFF
100
LDAP_ADD
LDIF
No. Cores
No. Sites
No. Jobs
10
LDAP_ADD
Merge
New LDIF
1
Sep
03
Log Scale!
Mar
04
Sep
04
Feb
05
Apr
05
Sep
05
Dec
05
Mar
06
Oct
06
Dec
06
Mar
07
Aug
07
Provider
Jun
08
Improved
Performance!
LDAP_MODIFY
Plugin
The growth of the number of sites, cores and jobs per day
Infrastructure Growth
GLUE 2.0
The graph above shows that the rate of increase with respect to the
number of sites joining the infrastructure is slowing; however, for the
number of cores and jobs per day it is increasing. Assuming a growth
rate of 50 sites per year, by 2015 there could potentially be 550 sites.
Each new site would contribute more fundamental services, users and
resources. Assuming an exponential growth rate for the number of
cores and computing activities (jobs), by 2015 the number of cores in
the EGEE infrastructure could reach 500,000 and the number of jobs
per day could reach 2 million.
The Glue[2] information model version 2.0 is an official recommendation
from the Open Grid Forum [3]. It consolidates over 4 years of
production experience with the Glue 1.x series. A common information
model is required to facilitate interoperation between Grid
infrastructures, and the definition of version 2.0 in an open forum will
increase its adoption by other infrastructures. Migrating the EGEE
information system from Glue 1.3 to 2.0 will occur in three stages.
Firstly the information system will be updated to support both versions.
Secondly the information providers will be updated to produce both 1.3
and 2.0 information. Finally, applications can start migrating from using
version 1.3 to 2.0. Glue 1.3 information will only be removed once
applications have migrated to version 2.0.
350
300
Number of Update Cycles
250
GlueCEStateTotalJobs
9.41%
GlueCEStateFreeCpus
9.52%
GlueSAStateUsedSpace
5.38%
GlueCEStateFreeJobslots
19.36%
GlueCEStateWorstResponseTime
11.79%
GlueSASateAvailableSpace
200
100
50
Negotiates Share with
Admin
Domain
6.57%
GlueCEStateEstimatedResponseTime
150
User
Domain
Provides
12.50%
GlueCEStateRunningJobs
7.90%
GlueCEInfoTotalCpus
4.67%
GlueCEStateWaitingJobs
6.37%
GlueCEPolicyAssignedJobSlots
0.90%
GlueServiceStartTime
0.71%
GlueSAUsedOnlineSize
1.34%
GlueSAFreeOnlineSize
1.37%
Contacts
Service
Manager
Manages
Maps User to
Defined on
End Point
Share
Resource
0
0
2000
4000
6000
8000
10000
12000
14000
16000
Num be r of Entrie s M odifie d
Has
Access
Policy
Runs
Has
Mapping
Policy
Activity
Investigation into the frequency of changes
The information changes in the information system were monitored by
recording the modified entries during each BDII update. Over a period
of 9 days the changes for 1932 update cycles were recorded, which
corresponds to approximately one update cycle every 7 minutes. A
graph of the number of changes per cycle can be seen above. The
average number of entries modified per update cycle was 12771 which
corresponds to 21.8% of the total number of entries. A further
investigation was conducted to find out how often each attribute type
was changed and the results can be found in the table above. 97.8% of
the changes are confined to 14 attributes which is only 4% of the total
attributes used. In the current implementation all the entries are
transported and updated during each cycle, which is inefficient.
References:
[1] http://twiki.cern.ch/twiki//bin/view/EGEE/BDII
Future Directions
With the information being inserted in to the resource BDIIs as
modifications to the database, this opens up number of possibilities.
One possibility is to use LDAP replication mechanisms to automatically
propagate these changes to the higher levels in the system. This would
be a possibility for the site level BDIIs and would reduce the latency
between the update of the resource BDII and the site level BDII. Due to
the use of the Freedom of Choice for Resources (FCR) [4] mechanism,
it may not be possible to use LDAP replication technologies. To improve
efficiency in this case a compressed content exchange mechanism
could be employed or the FCR mechanism may need to be reevaluated.
[2] http://forge.gridforum.org/sf/projects/glue-wg
[3] http://www.ogf.org
Authors:
[4] https://lcg-fcr.cern.ch:8443/fcr/fcr.cgi
EGEE-III INFSO-RI-222667
[email protected]
M. W. Schulz and L. Field CERN-IT