MonALISA - Internet2

Download Report

Transcript MonALISA - Internet2

An Agent Based, Dynamic Service System to Monitor,
Control and Optimize Distributed Systems
February 2006
Iosif Legrand
California Institute of Technology
1
February
2006
Iosif Legrand
The MonALISA Framework

MonALISA is a Dynamic, Distributed Service System capable to
collect any type of information from different systems, to analyze it
in near real time and to provide support for automated control
decisions and global optimization of workflows in complex grid
systems.

The MonALISA system is designed as an ensemble of autonomous
multi-threaded, self-describing agent-based subsystems which are
registered as dynamic services, and are able to collaborate and
cooperate in performing a wide range of monitoring tasks. These
agents can analyze and process the information, in a distributed
way,
and to provide optimization
decisions in large scale
distributed applications.
2
February
2006
Iosif Legrand
MonALISA is A Dynamic, Distributed
Service Architecture

The framework is based on a hierarchical structure of loosely
coupled agents acting as distributed services
which are
independent & autonomous entities able to discover themselves
and to cooperate using a dynamic set of proxies or self describing
protocols.

An agent-based architecture provides the ability to invest the
system with increasing degrees of intelligence;
to reduce
complexity and make global systems manageable in real time. For
an effective use of distributed resources, these services provide
adaptability and self-organization.
3
February
2006
Iosif Legrand
MonALISA service & Data Handling
Client
(other service)
Web client
WEB
Service
Data Stores
WSDL
SOAP
Data Cache
Service & DB
Lookup
Service
Postgres
MySQL
Lookup
Service
Communications
via the ML Proxy
data
MonALSIA
Service
Client
(other service)
Java
Predicates & Agents
Applications
Configuration Control (SSL)
User defined loadable
Modules to write /sent data
4
February
2006
Iosif Legrand
The MonALISA Discovery System & Services
Fully Distributed System with no Single Point of Failure
Clients , HL services
repositories
Proxies
AGENTS
MonALISA services
Global Services or
Clients
Dynamic load balancing
Scalability & Replication
Security AAA for Clients
Distributed System
for gathering and
Analyzing Information.
Distributed Dynamic
Network of JINI-LUSs Discovery- based on a lease
Mechanism and REN
Secure & Public
5
February
2006
Iosif Legrand
Monitoring Internet2 backbone Network
 Test for a Land Speed Record
 ~ 7 Gb/s in a single TCP stream
from Geneva to Caltech
6
February
2006
Iosif Legrand
The UltraLight Network
BNL ESnet IN /OUT
7
February
2006
Iosif Legrand
Monitoring Network Topology
Latency, Routers
NETWORKS
ROUTERS
AS
8
February
2006
Iosif Legrand
Monitoring The GLORIAD Ring
9
February
2006
Iosif Legrand
Monitoring Grid sites, Running Jobs,
Network Traffic, and Connectivity
JOBS
TOPOLOGY
ACCOUNTING
10
February
2006
Iosif Legrand
Monitoring OSG: Resources, Jobs & Accounting
Running Jobs
Accounting
42 SITES
~ 4 000 Nodes ( 10 000 CPUs)
Thousands of Jobs
60 000 parameters
11
February
2006
Iosif Legrand
FTP Data Transfer between GRID sites
Total FTP Traffic per VO
12
February
2006
Iosif Legrand
Bandwidth Challenge at SC2005
151 Gbs
~ 500 TB Total in 4h
13
February
2006
Iosif Legrand
End User / Client Agent
LISA- Localhost Information Service Agent




Authorization
Service discovery
Local detection of the hardware and software configuration
Complete end-system monitoring: Per-process load, I/O and
network throughputs, etc.
 End-to-end performance measurements
 Will act as an active listener for all events related with the requests generated
by its local applications.
14
February
2006
Iosif Legrand
Host Monitoring at SC2005





Many “network” problems are actually endhost problems:
misconfigured or underpowered end-systems
Network
Device
Information
TCP
Settings
Host/System
Information
The LISA application was designed to monitor the
endhost and its view of the network.
For SC|05 we developed we used LISA to gather the
relevant host details related to network performance
Information on the system information, TCP configuration
and network device setup was gathered and accessible
from one site.
Future plans are to coordinate this with LISA and deploy
this as part of OSG. The Tier-2 centers are a primary
target.
15
February
2006
Iosif Legrand
Available Bandwidth Measurements
Embedded Pathload module.
16
February
2006
Iosif Legrand
Coordination Service for Available
Bandwidth Measurements





Enforces measurement fairness
Avoids multiple probes on shared network segments
Dynamic
configuration of
measurements
timing
Logs events
Provides service
redundancy by
using a masterslave model
17
February
2006
Iosif Legrand
Monitoring the Execution of Jobs
and the Time Evolution
SPLIT JOBS
LIFELINES for JOBS
Summit a Job
Job
Job
Job1
Job2
Job3
DAG
18
Job
31
Job
32
February
2006
Iosif Legrand
ApMon – Application Monitoring
Library of APIs (C, C++, Java, Perl. Python) that can be used to send any
information to MonALISA services
Flexibility,
dynamic configuration, high communication performance
APPLICATION
Accounting
information

App. Monitoring
Time;IP;procID
parameter1: value
parameter2: value
70
MonALISA CPU Usage (%)
dynamic
reloading
Config Servlet
Automated system
monitoring

UDP/XDR
Monitoring
Data
ApMon
MonALISA
Service
...
60
UDP/XDR
Monitoring
Data
APPLICATION
50
App. Monitoring
Mbps_out: 0.52
Status: reading
MB_inout: 562.4
40
30
20
No Lost Packages
10
0
0
1000
2000
3000
4000
Messages per second
19
5000
6000
System Monitoring
load1: 0.24
processes: 97
pages_in: 83
February
2006
ApMon
ApMon
Config
UDP/XDR
Monitoring
Data
MonALISA
Service
ApMon configuration
generated automatically
by a servlet / CGI script
Iosif Legrand
MonALISA agents to create on demand
on an optical path or tree
Discovery &
Secure Connection
2
ML Agent
MonALISA
ML Demon
3
Optical
Switch
Optical
Switch
1
Control and
Monitor the
switch
Optical
Switch
ML Agent
MonALISA
ML Agent
MonALISA
Runs a ML Demon
>ml_path IP1 IP4 “copy file IP4”
Time to create a
path on demand
<1s independent
of the location
and the number
of connections
4
ML proxy services
used in Agent Communication
20
February
2006
Iosif Legrand
Monitoring and Controlling Optical Planes
Controlling
Port power monitoring
21
February
2006
Iosif Legrand
Monitoring Optical Switches
Agents to Create on Demand an Optical Path
22
February
2006
Iosif Legrand
Communities using MonALISA
Major Communities
 OSG
 CMS
 ALICE
 D0
 STAR
 VRVS
 LGC RUSSIA
 SE Europe GRID
 APAC Grid
 UNAM Grid
 ABILENE
 ULTRALIGHT
 GLORIAD
 LHC Net
 RoEduNET
23
MonALISA ABILENE
Demonstrated at:
Running 24 X 7
SC2003
at 250 Sites
 Collecting 250,000
Telecom
CMS-DC04 World
parameters in near
2003
real-time
GRID3
 Update rate of 25,000
WSIS 2003
parameter updates per
SC 2004
second
VRVS
 Monitoring
I2 2005
12,000 computers
ALICE
TERENA 2005
 > 100 WAN Links
 Thousands of Grid
IGrid 2005
jobs running concurrently
SC 2005
February
2006
Iosif Legrand
The MonALISA Architecture Provides:

Distributed Registration and Discovery for Services and Applications.

Monitoring all aspects of complex systems :
 System information for computer nodes and clusters
 Network information : WAN and LAN
 Monitoring the performance of Applications, Jobs or services
 The End User Systems, its performance
 Video streaming

Can interact with any other services to provide in near real-time customized
information based on monitoring data

Secure, remote administration for services and applications

Agents to supervise applications, trigger alarms, restart or reconfigure
them, and to notify other services when certain conditions are detected.

The MonALISA framework is used to develop higher level decision services,
implemented as a distributed network of communicating agents, to perform
global optimization tasks.

Graphical User Interfaces to visualize complex information
24
February
2006
Iosif Legrand