R-GMA - SNS Courseware
Download
Report
Transcript R-GMA - SNS Courseware
Performance of the Relational Grid
Monitoring Architecture (R-GMA)
•
•
•
•
•
•
CMS data challenges. The nature of the problem.
What is GMA ?
And what is R-GMA ?
Performance test description
Performance test results
Conclusions
The Nature of the problem
• As part of the preparations for data taking CMS is
performing DATA CHALLENGES.
• Large number of simulated events to
optimise detectors and prepare software
• Enormous processing requirements
BUT
each event is independent of all the others
each event can be generated on a machine
without any interaction with any other
The local solution
Work split between farms.
How to handle the book-keeping ?
a data-base automatically
updated
Implemented via a job wrapper BOSS
Output to <stdout> and <stderr> is intercepted and the
information is recorded in a mySQL production database.
Event generation and job accounting decoupled
The local solution (schematic)
Worker
Node (WN)
WN
Database
Machine
WN
WN
WN
WN
Submission
Machine
UI
WN
WN
WN
The grid solution (schematic)
Database
Machine
Submission
Machine
UI
Grid Monitoring Architecture (GMA) of the GGF
register producer
Producer
data
data
data
Ask for
data
Registry
(Directory services)
data
data
data
Consumer
locate producer
address of producer
R-GMA (Relational GMA)
Developed for E(uropean) D(ata) G(rid)
Extends the GMA in two important ways
1. Introduces a time stamp on the data.
Can be used for information
and monitoring
2. A relational implementation
3. Hides the registry behind the API
Each Virtual Organisation appears
to have one RDBMS
The syntax of R-GMA
The user interface to R-GMA is via SQL statements
(not all SQL statements and structures are supported)
Information is advertised via a
Information is published via
Information is read via
table create
insert
select … from table
The first read request registers the consumer as interested
in this data.
Relational queries are supported
NOTE : sql is the interface – it should not be supposed an
actual database lies behind it.
Fit between R-GMA and BOSS
R-GMA can be dropped into the framework with very little
disruption
1. Set up calls for mySQL are replaced by those for R-GMA
producers
2. An archiver (joint consumer/producer) runs on a single
machine which collects the data from all the running jobs
and writes it to a local database (and possible republishes it).
The data can then be queried either by direct mySQL calls or via
R-GMA consumer (a distributed database has been
created)
Fit between R-GMA and BOSS (i)
WAN
LAN Connection
Connection
BOSS
R-GMA
R-GMA
R-GMA
R-GMA
R-GMA
R-GMA
R-GMA
R-GMA
Database
R-GMA Measurements
• The architecture of GMA clearly provides a putative
solution to the wide area monitoring problem.
BUT
Does a specific implementation provide a practical solution
Before entrusting CMS production to R-GMA, we must be
confident that it will perform.
What load will it fail at and why ?
Message time distribution from 44 jobs
<Message length> 35 chars.
Simulation of a CMS job
Multi-threaded job
each thread produces messages. Length 35 chars,
suitable distribution.
Threads starting time distribution can be altered.
One machine delivers the R-GMA load of a farm.
R-GMA
servlet
R-GMA
consumer
Simulation of the CMS Grid
One machine per grid cluster providing loads of greater
than the cluster
R-GMA
servlet
R-GMA
servlet
R-GMA
consumer
R-GMA
servlet
R-GMA
servlet
Current status
R-GMA can survive loads of around 20% of the current
CMS requirements and does provides a grid method
for monitoring. An overload of a factor 2 jobs causes
problems after about five minutes running.
We believe these instabilities are soluble.
When production starts in earnest we will compare reality with our
model.
GridICE Server
Installation
16
Brief Introduction
GridICE:
– is a distributed monitoring tool for grid
systems
– integrates with local monitoring systems
– offers a web interface for publishing
monitoring data at the Grid level
– fully integrated in the LCG-2 Middleware
• gridice-clients data collector installation and
configuration for each site ralized by the Yaim
scripts.
17
System Requirements
• Suggested Operating system is Scientific
Linux with a minimal installation
• The GridICE server should be installed on a
performant machine
– PostgreSQL service - RAM intensive demand
– Apache web server - RAM-CPU intensive
demand
18
Core Packages & Dependencies
The GridICE server software is composed by three core
packages:
1. gridice-core
(setup and maintenance scripts / discovery components)
2. gridice-www
(web interface scripts and components)
3. gridice-plugins
(monitoring scripts)
Plus several dependencies:
– Apache http web server
– PostgreSQL database server
– Nagios monitoring tool
– ...
19
The Four Main Phases of Monitoring
Processing
Presenting
Distributing
Generation
Processing and abstract the number
of received events in order to
enable the consumer to draw
conclusions about the operation of
Transmission
of the
events
the monitored
system
from the source to any
interested parties (data delivery
model: push vs. pull; periodic vs.
aperiodic)
Sensors inquiring entities and
encoding the measurements
according to a schema
e.g., filtering according to some
predefined criteria, or
summarising a group of events
20
The GridICE Approach
21
Generating Events
• Generation of events:
– Sensors: typically perl scripts or c programs.
– Schema:
• GLUE Schema v.1.1 + GridICE extension.
–
–
–
–
System related (e.g., CPU load, CPU Type, Memory size).
Grid service related (e.g., CE ID, queued jobs).
Network related (e.g., Packet loss).
Job usage (e.g., CPU Time, Wall Time).
– All sensors are executed in a periodic fashion.
22
Distributing Events
• Distribution of events:
– Hierarchical model.
• Intra-site: by means of the local monitoring service
– default choice, LEMON (http://www.cern.ch/lemon).
• Inter-site: by offering data through the Grid Information Service.
• Final Consumer: depending on the client application.
– Mixed data delivery model.
• Intra-site: depending on the local monitoring service (push for
lemon).
• Inter-site: depending on the GIS (current choice, MDS 2.x, pull).
• Final consumer: pull (browser/application), push
(publish/subscribe notification service coming on the next
release).
23
Presenting Events
• Data stored in a RDBMS used to build
aggregated statistics.
• Data retrieved from the RDBMS are
encoded in XML files.
• XSL to XHTML transformations to publish
aggregated data in a Web context.
24
Monitoring a Grid
25
Challenges for Data Collection
• The distribution of monitoring data is strongly
characterised by significant requirements
(e.g., Scalability, Heterogeneity, Security, System Health)
• None of the existing tools satisfy all of these
requirements
• Grid data collection should be customized depending on
what are the needs of your Grid users selected
26
Challenges for Data Presentation
• Different Grid users are interested in different subset of
Grid data and different aggregation levels
• Usability principles should be taken into account to help
users finding relevant Grid monitoring information
• A sintetic data aggregation is crucial to permit a drilldown navigation (from the general to te detailed) of the
Grid data
27
Grid Monitoring Architecture (GMA) of the GGF
register producer
Producer
data
data
data
Ask for
data
Registry
(Directory services)
data
data
data
Consumer
locate producer
address of producer
R-GMA (Relational GMA)
Developed for E(uropean) D(ata) G(rid)
Extends the GMA in two important ways
1. Introduces a time stamp on the data.
Can be used for information
and monitoring
2. A relational implementation
3. Hides the registry behind the API
Each Virtual Organisation appears
to have one RDBMS
The syntax of R-GMA
The user interface to R-GMA is via SQL statements
(not all SQL statements and structures are supported)
Information is advertised via a
Information is published via
Information is read via
table create
insert
select … from table
The first read request registers the consumer as interested
in this data.
Relational queries are supported
NOTE : sql is the interface – it should not be supposed an
actual database lies behind it.
Fit between R-GMA and BOSS
R-GMA can be dropped into the framework with very little
disruption
1. Set up calls for mySQL are replaced by those for R-GMA
producers
2. An archiver (joint consumer/producer) runs on a single
machine which collects the data from all the running jobs
and writes it to a local database (and possible republishes it).
The data can then be queried either by direct mySQL calls or via
R-GMA consumer (a distributed database has been
created)
Fit between R-GMA and BOSS (i)
WAN
LAN Connection
Connection
BOSS
R-GMA
R-GMA
R-GMA
R-GMA
R-GMA
R-GMA
R-GMA
R-GMA
Database
How is Ganglia different from
Nagios
•
Ganglia is architecturally designed to perform efficiently in very large
monitoring environments: each Ganglia gmond performs its service checks
locally, reporting in at a regular interval to the gmetad. Nagios performs its
service checks by polling each device across a network connection and waiting
for a response (known as "active checks"), which can be more resource and
bandwidth intensive.
•
Nagios uses the results of its active checks to determine state by comparing the
metrics it polls to thresholds. These state changes can in turn be used to
generate notifications and customizable corrective actions. Ganglia, by
contrast, has no built-in thresholds, and so does not generate events or
notifications.
•
The general rule of thumb has been: if you need to monitor a limited number
of aspects of a large number of identical devices, use Ganglia; if you want to
monitor lots of aspects of a smaller number of different devices, use Nagios.
But those distinctions are blurring as Ganglia supports more and more devices,
and as Nagios' scalability improves.
4/2/2016
T.R.LEKHAA/AP/IT/SNSCE
33
How is Ganglia different from
Nagios
• The problem with ganglia and all the other
external web pages we have been looking at
is that you have to look at them!
• If all is well with your system you don’t
want to have to look.
• This is where Nagios comes in. It can be
setup to alert you when something goes
wrong, or a value passes a threshold.
4/2/2016
T.R.LEKHAA/AP/IT/SNSCE
34
Monitoring: What?
Metric
Relevance
Packet loss
Lost packets re-Tx traffic jam s
Connectivity Can you connect?
RTT
TCP = send -acknow led ge protocol
d elayed acknow led ge = d elayed traffic
TCP/ UDP
N etw ork view ?
Thru 'pu t
Application view ?
Jitter
Variation in d elay - UDP & m u lticast
(UDP) only
Monitoring: How(1)?
IperfER
PingER
UDPmon
MiperfER
bbcp/ftp
30 mins
Monitor
Node
Publication
service
www.visualisation
Tools installed on dedicated &
similar node at each centre
Grid
middleware
Monitoring
Architecture
MESH
Monitoring: How(2)?
Metric
Tool
Origin
RTT
packet loss
connectivity
TCP thru'put
PingER
Ping
SLAC et al
IperfER
NCSA's iperf
SLAC and UCL
SLAC
bbcp
bbftp
UDP thru'put
UDPmon
Multicast thru’put MiperfER
packet loss
jitter
Tool
= IN2P3
Monitoring
= SLAC
[email protected], EDG
IperfER
Manchester Computing
Network Weather Service
Introduction
• “NWS provides accurate forecasts of
dynamically changing performance
characteristics from a distributed set of
metacomputing resources”
• What will be the future load (not current
load) when a program is executed?
• Producing short-term performance
forecasts based on historical performance
measurements
• The forecasts can be used by dynamic
scheduling agents
Introduction
• Resource allocation and scheduling
decisions must be based on
predictions of resource performance
during a timeframe
• NWS takes periodic measurements of
performance and using numerical
models, forecasts resource
performance
NWS Goals
• Components
– Persistent state
– Name server
– Sensors
• Passive (CPU availability)
• Active (Network measurements)
– Forecaster
Architecture
Architecture
Performance measurements
• Using sensors
• CPU sensors
– Measures CPU availability
– Uses
• uptime
• vmstat
• Active probes
• Network sensors
– Measures latency and bandwidth
• Each host maintains
– Current data
– One-step ahead predictions
– Time series of data
Network Measurements
Issues with Network Sensors
• Appropriate
transfer size for
measuring
throughput
• Collision of network
probes
• Solutions
– Tokens and
hierarchical trees
with cliques
Available CPU measurement
Available CPU measurement
• The formulae shown
does not take into
account job
priorities
• Hence periodically an
active probe is run
to adjust the
estimates
Predictions
• To generate a forecast, forecaster requests
persistent state data
• When a forecast is requested, forecaster makes
predictions for existing measurements using different
forecast models
• Dynamic choice of forecast models based on the best
Mean Absolute Error, Mean Square Prediction Error,
Mean Percentage Prediction Error
• Forecasts requested by:
– InitForecaster()
– RequestForecasts()
• Forecasting methods
– Mean-based
– Median based
– Autoregressive
Forecasting Methods
Notations:
Prediction Accuracy:
Mean Absolute Error (MAE) is the average of
Prediction Method:
the above
Forecasting Methods – Mean-based
1.
2.
3.
Forecasting Methods – Mean-based
4.
5.
Forecasting Methods – Median-based
1.
2.
3.
1.
Autoregression
ai found such that it minimizes the overall error.
ri ,j is the autocorellation function for the series of N
measurements.
Forecasting Methodology
Forecast Results
Forecasting Complexity vs Accuracy
•Semi Non-parametric Time Series Analysis (SNP)
– an accurate but complicated model
•Model fit using iterative search
•Calculation of conditional expected value using
conditional probability density
Sensor Control
• Each sensor connects to
other sensors and perform
measurements O(N2)
• To reduce the time
complexity, sensors
organized in hierarchy
called cliques
• To avoid collisions, tokens
are used
• Adaptive control using
adaptive token timeouts
• Adaptive time-out
discovery and distributed
leader election protocol