Monitoring Grid Services - Informatics Homepages Server
Download
Report
Transcript Monitoring Grid Services - Informatics Homepages Server
Monitoring Grid Services
Yin Chen
[email protected]
June 2003
1
Contents
Issues of Monitoring
Project Proposal
2
Issues of Monitoring
What the goals of Grid monitoring
What's the characteristics of Grid system
What may need to be Monitored
What’s the characteristics of Monitoring Data
Related Work
3
What the goals of Grid monitoring
Propagate errors to users/management
Performance monitoring to
tune the application
use the Grid more efficiently
The question is
Not how to measure resources
But how to deliver information to end-users and
system/Grid
4
What's the characteristics of Grid system
Complex distributed system =>often observe
unexpectedly low performance
Where is the bottleneck?
-
application
operating system
disks
network adapters on either the sending or the receiving host
network switches, routers
Experience of the Netlogger group
- 40% network, 40% application, 20% host problems
- application: 50% client, 50% server process problems
5
What's the characteristics of Grid system
(cont..)
Dynamic environment
World-wide distributed environment with
- high latency
- frequent faults
- very heterogeneous resources
6
What may need to be Monitored
Disk space, speed of processor, network bandwidth,
CPU load, memory load, network load, network
communication time, number of parallel streams, stripes
TCP/IP buffer size, disk access time that includes time to
copy data to or from the local hard disk on the
server.[2][3]
Some of this information are relative static information
while others are run-time dynamic information.
7
What’s the characteristics of Monitoring Data
Run-time monitoring data goes "Old" quickly
Producer should near the entities.
Rapidly and efficiently transport from producer to
consumer.
Information should be explicate, e.g. by timestamps
Updates are frequent
Performance information is often stochastic
8
Related Work
Monitoring and Discovery Service (MDS)
Grid Monitoring Architecture (GMA)
Relational Grid Monitoring Architecture
(R-GMA)
Hawkeye
Globus Heartbeat Monitor (HBM)
Network Weather Service (NWS)
GridRM
9
MDS Architecture
10
GMA Architecture
11
R-GMA Architecture
12
Hawkeye Architecture
13
HBM Architecture
14
NWS Architecture
15
The Global Layer of GridRM
16
The Local GridRM Layer
17
Summary and Conclusion
Varieties of different systems exist for
monitoring
Each system has its own strengths and
weaknesses
Tend to use standard and open
components
GGF advocated architecture GMA
18
Summary and Conclusion (cont.)
The similarities in architecture
At the lowest level, have a sensor or other program
that generates a piece of data.
Some systems allow data to be aggregated from a set
of resources
At the resource level, gather together the data from
several information collectors into one component
Directory component
Decentralised hierarchy structure, which have higher
ability in fault tolerance
Differences in using push or pull mechanism
19
Project Proposal
Goal
Requirement
Architecture -- Pull Model
Specification
Implementation
Testing
Schedule
20
Goal
Realisation
Lightweight & Simple design
Reliability & Robustness
21
Architecture
What is Pull model
The monitor sends requests to the service for
information. This implies repeated queries of resource
attributes over some time period at a specific frequency
On the other hand in a Push model the service sends
out notifications to a subscribed sink.
22
Benefits of Pull
Less network traffic: collections initiated only from top
Has no time synchronisation problem: collect data from
resources at the same time.
The server can determine the size of the file, select the
appropriate alternate server, and passively control the
bandwidth and storage space.
According to Globus, "push" model "generates a large
amount of data and results in constant updates to the
MDS.
Standard LDAP databases are not designed to handle
frequent updates.
23
Benefits of Pull (Cont.)
The Pull model is based on distributed intelligence to the
asset site - it becomes automated.
Using machine-to-machine communications with
connected sensors and autonomic computing the asset
does self-diagnostics, self maintain and repair, re-routes
energy flows, schedules non-routine maintenance and
reports on any out of the ordinary activity that poses a
security threat.
IBM calls it autonomic computing where machine to
machine communications take place to optimise the
performance of computing and network resources.
24
Problems of Pull
must gathering current measurements from all
resources.
if the data volume is large in real-time may cause
bottleneck problem.
may be not useful in fault detection -- heartbeat events
are valid only for a short time interval and should be
delivered in this time constraint.
may be not useful in dynamic sensor management.
The push model is the most efficient in terms of
bandwidth as requests are not sent, just responses from
the service.
25
Monitoring Grid Services
Thanks
26