Transcript Document

GridICE
The eyes of the grid
A monitoring tool for a
Grid Operation Center by
DataTAG WP4
Sergio Fantinel, INFN LNL/PD
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 1 / 29
Sergio Fantinel’s Outline
• Discovery: resources, components
• Server side processes layout
• DB schema
• Graphs/data presentation
• Next steps
• Packaging/installation
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 2 / 29
Discovery entities
• RESOURCES: are the entities discovered
from the GIS, ex:
• Cluster Head Nodes
• Storage Services
• Cluster Worker Nodes
• COMPONENTS: are the entities belonging
to resources and discovered directly from
resource itself, ex:
• Computing Elements
• Storage Space
• Network Adapters
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 3 / 29
Discovery purposes
•
track the life of the entities: they are
characterized by a status (new, available,
disappeared, dead).
•
Configure the monitoring system
accordingly the status of these entities to
collect metrics, status and other info.
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 4 / 29
Discovery entities list
This is the list of entities currently tracked by
the monitoring system:
•
•
•
•
•
•
•
•
•
•
Clusters
Storage Services
Worker Nodes (CL)
Computing Elements (CL)
Run Time Environments (CL)
Virtual Organizations (CE)
Storage Extents (WN)
Network Adapters (WN)
Storage Space (SE)
Storage Protocols (SE)
DataTAG is a project funded by the European Union
CL
WN
SE
= Cluster
= Worker Node/host
= Storage Service
CERN, 8 May 2003 – no 5 / 29
Discovery entities (2)
• Every entity (resource or component) is
described by a number of characterizing
information.
• Entities may be linked together:
Ex. Network Adapter -> Worker Node ->
Cluster
• To track the life of the entities it is used a
SQL database where are stored also all the
information related to every single entity.
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 6 / 29
Discovery entities (3)
GIIS Server
LDAP
Monitoring
DB
SQL
Monitorig
Server
LDAP Query
available CE/SE
LDAP Query
CEIDs, WNs,
Steps 3,4 repeated for every CE/SE
DataTAG is a project funded by the European Union
GIIS
2
LDAP
4
1:
2:
3:
4:
1
3
GRIS
Computing Element/
Storage Element
CERN, 8 May 2003 – no 7 / 29
Discovery config/check processes
• All the info collected are used to generate a
number of Nagios configuration files
(configuration process).
• Nagios schedule, according to some other
DB stored parameters, at different interval
times, the execution of a number of scripts
(Nagios plug-ins wrote by the DataTAG
WP4) that collect the info associated to
every entity (check process) and put those in
the DB.
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 8 / 29
Server Side processes layout
GRID
WEB
5B
Discovery
1C
2A
Gfx/Presentation
Config
1A
1B
GIIS
2B
5A
Monitoring
DB
4B
1: entities discovery
2: generation of config files
3: check scheduling
4: entities info collection
5: DB info rendering
GRIS
Nagios/scheduler
3
4A
Check
Grid Information System
LDAP Interface
DataTAG is a project funded by the European Union
Developed by
DataTAG WP4
CERN, 8 May 2003 – no 9 / 29
Scheduling
• Discovery and config generation run as cron
jobs; although the two processes can be
scheduled independently at different time
intervals, a discovery is just followed by a
config generation.
• Check plug-ins are scheduled by Nagios; the
interval for each one is set by a
corresponding parameter in the DB.
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 10 / 29
DataBase stored info
• Three types of information are stored in the
database (~50 tables):
•
Entities: actual status, historical
status (fed by discovery process )
•
Info about entities (fed by check
process)
•
Monitoring configuration
parameters (fed manually by
monitoring administrator)
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 11 / 29
DataBase date/time attributes
• Many record types has associated two
date/time attributes (Date, LastCheck) that
determines the time interval where the record
is valid.
• Checking the time distance between the Date
attribute of a tuple and the LastCheck of the
tuple that immediately precede, we can
figure out problems of the resource or the
monitoring system (VETOES generation, see
later).
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 12 / 29
DataBase (2)
• For many entities the related info are
grouped into two sets: slow, fast parameters;
in the first group are those params that
change very slowly in the time (ex. SO
version, RAM size of a node,
bogoMIPS,…); in the second are all the
other parameters.
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 13 / 29
DataBase check process
• For a set of parameters: the check process
get the info from the GRIS, then compare the
values against the last record in the DB and,
if no one has changed, it update only the
LastCheck attribute of the tuple; otherwise it
insert a new tuple with the current date/time
in Date/LastCheck attributes and the new
info.
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 14 / 29
DataBase thresholds
• We are introducing a threshold mechanism:
•
For a selected set of attributes,
nearly all the fast parameters, we
define a threshold
•
If the new values for a tuple
doesn’t exceed the related
threshold over the old values, we
change only the LastCheck
attribute
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 15 / 29
DataBase thresholds (2)
• This threshold mechanism become very
useful to save DB space resource allocation.
• For example
•
We doesn’t need to appreciate a
variation of a CPU load less the 510% (think to the fluctuations
around the 100% idle when no job
is running on a CPU)
•
The same concept may be applied
to a Storage Area with a 5-10MB
threshold
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 16 / 29
DB Example: GE simple info
RL_GE_CPU_SLOWPARAM
Date
date
LastCheck
date
Vendor
string(32)
Model
string(64)
LastCheck Data
Version
string(32)
IP
string(16)
ClockSpeed
int
ResName
string(64)
TimeInterval int
RESOURCE
Date
Date
ID_GeType Int
Group
string(64)
Cluster
int
NB: this is a simplified structure
RL_GE_CPU_FASTPARAM
Date
date
User
int
LastCheck
date
Nice
int
Load1Min
int
System
int
Load5Min
int
Idle
int
Load15Min
int
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 17 / 29
DB Example: queues
RL_CE_CEID
RESOURCE
Date
date
LastCheck
date
CeidName
string(256)
RL_GE_CPU_FASTPARAM
Date
date
LastCheck
date
FreeCpus
int
EstimatedResponseTime int
RunningJobs
int
WaitingJobs
int
WorstResponseTime
int
RL_GE_CPU_SLOWPARAM
Date
date MaxTotalJobs
int
LastCheck
date MaxWallClockTime
int
CeidStatus
int
Priority
int
TotalCpus
int
LRMSType
string(16)
MaxCpuTime
Int
LRMSVersion
string(16)
MaxRunningJobs int
NB: this is a simplified structure
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 18 / 29
DB Example: status history
RESOURCE
RL_SA_FASTPARAM
RL_SE_SA
RL_SA_SLOWPARAM
RL_SA_CHANGE_STATUS
ID_ChgStatus
int
Date
date
RL_RES_CHANGE_STATUS
ID_ChgStatus
int
Date
date
DataTAG is a project funded by the European Union
NB: this is a simplified structure
CERN, 8 May 2003 – no 19 / 29
DB Example: configuration
GETYPE
RL_GETYPE_SERVICE
SERIVCE
ServiceName
string(64)
CheckInterval
Int
Template
string(32)
Command
string(32)
NB: this is a simplified structure
DataTAG is a project funded by the European Union
Service List:
CheckCeidFP
CheckCeidSP
CheckSeFP
CheckSeSP
CheckWnCpu
CheckWnCtxc
CheckWnInfo
CheckWnIntc
CheckWnMem
CheckWnMpc
CheckWnNet
CheckWnPrcc
CheckWnSckc
CheckWnSpc
CheckWnStg
CheckWnVirtual
CERN, 8 May 2003 – no 20 / 29
Data presentation
• We based our work on the JpGraph tool
(PHPv4), a very powerful set of API to
generate a number of graph types that use
directly the GDLib API (v2 is needed for an
accurate colour management)
• To overcame some limitation in the
managing of date/time labels, merging and
plotting of large set of date of JpGraph, and
to have in general a fine control, we
developed a decoupling layer between the
DB and the render engine.
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 21 / 29
Data presentation (2)
Data Load
Main Analysis
Process
Data Merging
Graph generation
Monitoring
DB
JpGraph
GDLib
Resample
Developed by
DataTAG WP4
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 22 / 29
Data presentation (3)
Monitoring
DB
Veto threshold
Time interval
Array of homogeneous
tuples with their time
interval validity (ALV)
Data Load
Object descriptors
(Table, fields,…)
DataTAG is a project funded by the European Union
Array of VETOES
(if option enabled) (AVT)
CERN, 8 May 2003 – no 23 / 29
Data presentation (4)
(ALV)
(AVT)
Time interval
Resample
X-axis plot (dot) size
(xSize)
DataTAG is a project funded by the European Union
Array with xSize number of elements;
Every element is a record of metrics
values; each value is the average
of the metric function in the time
interval slot represented by the xpoint
considered
Graph Labels
Graph VETO
bars (if enabled)
CERN, 8 May 2003 – no 24 / 29
Data presentation (5)
• The presentation of the date was made
addressing different user types (see later on
the live demo session):
•
Vo views, for a VO manager
•
Site views, system manager
•
Single entity views
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 25 / 29
Next steps, integration
• the next HIGH PRIORITY step is the
integration with the EDG 2.0 information
system. Due to the structure of our
monitoring system, the migration should be
very smooth and fast.
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 26 / 29
Next steps, long term
•
Glue Network Service (NE)
•
Job monitoring
•
Collective services (RB, LB, ROS, RLS,…)
•
We are thinking to evaluate different technologies that
can help us to improve our tool; we have in mind, for
example, OGSA and JINI.
•
Major issue: distribution of the monitoring DB to get
more scalability
•
Authentication/authorization
•
Remote system management
•
Better analysis of monitoring date and their
presentation
•
Personal monitoring
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 27 / 29
Packaging/installation
• Server side
•
Linux Red Hat 7.3
•
PostgreSQL 7.2.3
•
PHP 4.3.1 (GDLibv2)
•
JpGraph 1.2
•
Nagios
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 28 / 29
Packaging/installation (2)
• Server side:
•
Guide to installation and
configuration
• Client side:
•
Packaged rpms to install on the
Head Node and on the Worker
Nodes (fmon server/agents)
•
Replacing the Glue-CE.schema
file on the Head Node (GLUE+)
DataTAG is a project funded by the European Union
CERN, 8 May 2003 – no 29 / 29