Monitoring Grid Services - Informatics Homepages Server

Download Report

Transcript Monitoring Grid Services - Informatics Homepages Server

Monitoring Grid Services
Yin Chen
[email protected]
June 2003
1
Contents
Issues of Monitoring
Project Proposal
2
Issues of Monitoring
What the goals of Grid monitoring
What's the characteristics of Grid system
What may need to be Monitored
What’s the characteristics of Monitoring Data
Related Work
3
What the goals of Grid monitoring
Propagate errors to users/management
Performance monitoring to
 tune the application
 use the Grid more efficiently
The question is
 Not how to measure resources
 But how to deliver information to end-users and
system/Grid
4
What's the characteristics of Grid system
Complex distributed system =>often observe
unexpectedly low performance
Where is the bottleneck?
-
application
operating system
disks
network adapters on either the sending or the receiving host
network switches, routers
Experience of the Netlogger group
- 40% network, 40% application, 20% host problems
- application: 50% client, 50% server process problems
5
What's the characteristics of Grid system
(cont..)
Dynamic environment
World-wide distributed environment with
- high latency
- frequent faults
- very heterogeneous resources
6
What may need to be Monitored
 Disk space, speed of processor, network bandwidth,
CPU load, memory load, network load, network
communication time, number of parallel streams, stripes
TCP/IP buffer size, disk access time that includes time to
copy data to or from the local hard disk on the
server.[2][3]
 Some of this information are relative static information
while others are run-time dynamic information.
7
What’s the characteristics of Monitoring Data
Run-time monitoring data goes "Old" quickly
 Producer should near the entities.
 Rapidly and efficiently transport from producer to
consumer.
 Information should be explicate, e.g. by timestamps
Updates are frequent
Performance information is often stochastic
8
Related Work
Monitoring and Discovery Service (MDS)
Grid Monitoring Architecture (GMA)
Relational Grid Monitoring Architecture
(R-GMA)
Hawkeye
Globus Heartbeat Monitor (HBM)
Network Weather Service (NWS)
GridRM
9
MDS Architecture
10
GMA Architecture
11
R-GMA Architecture
12
Hawkeye Architecture
13
HBM Architecture
14
NWS Architecture
15
The Global Layer of GridRM
16
The Local GridRM Layer
17
Summary and Conclusion
Varieties of different systems exist for
monitoring
Each system has its own strengths and
weaknesses
Tend to use standard and open
components
GGF advocated architecture GMA
18
Summary and Conclusion (cont.)
The similarities in architecture
 At the lowest level, have a sensor or other program
that generates a piece of data.
 Some systems allow data to be aggregated from a set
of resources
 At the resource level, gather together the data from
several information collectors into one component
 Directory component
 Decentralised hierarchy structure, which have higher
ability in fault tolerance
 Differences in using push or pull mechanism
19
Project Proposal
Goal
Requirement
Architecture -- Pull Model
Specification
Implementation
Testing
Schedule
20
Goal
Realisation
Lightweight & Simple design
Reliability & Robustness
21
Architecture
What is Pull model
 The monitor sends requests to the service for
information. This implies repeated queries of resource
attributes over some time period at a specific frequency
 On the other hand in a Push model the service sends
out notifications to a subscribed sink.
22
Benefits of Pull
 Less network traffic: collections initiated only from top
 Has no time synchronisation problem: collect data from
resources at the same time.
 The server can determine the size of the file, select the
appropriate alternate server, and passively control the
bandwidth and storage space.
 According to Globus, "push" model "generates a large
amount of data and results in constant updates to the
MDS.
 Standard LDAP databases are not designed to handle
frequent updates.
23
Benefits of Pull (Cont.)
 The Pull model is based on distributed intelligence to the
asset site - it becomes automated.
 Using machine-to-machine communications with
connected sensors and autonomic computing the asset
does self-diagnostics, self maintain and repair, re-routes
energy flows, schedules non-routine maintenance and
reports on any out of the ordinary activity that poses a
security threat.
 IBM calls it autonomic computing where machine to
machine communications take place to optimise the
performance of computing and network resources.
24
Problems of Pull
 must gathering current measurements from all
resources.
 if the data volume is large in real-time may cause
bottleneck problem.
 may be not useful in fault detection -- heartbeat events
are valid only for a short time interval and should be
delivered in this time constraint.
 may be not useful in dynamic sensor management.
 The push model is the most efficient in terms of
bandwidth as requests are not sent, just responses from
the service.
25
Monitoring Grid Services
 Thanks
26