On-line Farm Monitor and Control

Download Report

Transcript On-line Farm Monitor and Control

LHCb PC Farm Monitoring
and Control System
Domenico Galli, Bologna
JCOP Project Team Meeting
Genève, 28 April 2005
Outline

Overview, aim of the system, software framework,
software component and their deployment, releases.

Guidelines followed in software development.

Main Components:


Task Manager

Monitoring System

IPMI Power Manager
Utilities:

Logger

Process Controller
LHCb PC Farm Monitoring and Control System. 2
Domenico Galli
Overview



LHCb PC Farm Monitoring and Control System has
been designed mainly for the LHCb L1/HLT event
filter farm.
The system has been developed for Linux PCs but will
be ported to MS Windows in order to be used also
for PC involved in LHCb detector monitor and control.
It uses DIM (Distributed Information Management
System) as communication layer and is accessible
both through a command line interface and through a
PVSS graphical interface.
LHCb PC Farm Monitoring and Control System. 3
Domenico Galli
Aim of the System

Monitoring



Display of relevant parameters concerning the
status of the farm.
Induce the transition of a state machine to an
alarm state when the monitored parameters
indicate error/warning conditions.
Control

Action execution (system reboot, process
start/stop, etc.) triggered by manual command or
by a state machine transition.
LHCb PC Farm Monitoring and Control System. 4
Domenico Galli
PVSS (Prozessvisualisierungs- und
Steuerungs-System)


To build an interface of the farm monitor system
coherent with the monitor of the detector hardware,
we make use of PVSS SCADA (Supervisory Control
and Data Acquisition) tool.
PVSS provides:




runtime DB, automatic archiving of data to permanent
storage;
DIM
alarm generation;
easy realization of graphical
panels;
various protocols to
communicate via network.
LHCb PC Farm Monitoring and Control System. 5
Domenico Galli
Sensors and Actuators



PVSS need to be interfaced with farm nodes:

to receive monitor data;

to issue command to the nodes;
On each node a few light processes (light-weight
servers) runs:

monitor sensors;

command actuators.
PVSS-to-nodes interface is achieved using DIM lightweight network communication layer.
LHCb PC Farm Monitoring and Control System. 6
Domenico Galli
DIM (Distributed Information
Management System)

DIM network communication layer is already
integrated with PVSS:

It is light-weight and efficient.

It allows bi-directional communication.

It uses a name server for services/commands
publication and subscription.
PVSS
sensor
actuator
Farm node
LHCb PC Farm Monitoring and Control System. 7
Domenico Galli
Monitoring and Control System
Components




Light-weight servers (to be installed on each farm node):

Task Manager Server

9 Monitor Servers

Logger Server
Control software to be installed on each control PC:

Power Manager (IPMI) Server

Process Controller
Command line clients (can run on any node on the network):

3 Task Manager Clients

2 Power Manager Clients

1 Logger Client
PVSS clients (can run on any node on the network):

Task Manager & logger Panel.

9 Monitor Panels.

Power Manager Panel.
LHCb PC Farm Monitoring and Control System. 8
Domenico Galli
Monitoring and Control System
Component Deployment
Sub-Farm 1
Control PC
DIM server
IPMI DIM
Server
sensor
actuator
DIM-DNS
PVSS Remote UI
DIM server
IPMI BMC
(firmware)
Sub-Farm node
PVSS-DIM
DIM server
PVSS-FSM
IPMI BMC
(firmware)
PVSS EVM
Monitor Console PC
Sub-Farm node
PVSS DBM
PVSS ARCH
PVSS CTRL
PVSS Dist
Sub-Farm node
DIM server
IPMI BMC
(firmware)
Global Control PC
PVSS Dist
Sub-Farm 2
Control PC
LHCb PC FarmPVSS
Monitoring
and Control System. 9
Dist
Domenico Galli
sensor
actuator
sensor
actuator
sensor
actuator
Releases

0.1 – 22 November 2004

0.2 – 13 January 2005


New Task Manager features

Process Controller
1.0 – before June 2005

IPMI Power Manager.

New PVSS panels.

PVSS panels to set the threshold for Finite State Machine.

PVSS trend plots and archiving.
LHCb PC Farm Monitoring and Control System. 10
Domenico Galli
Sensor Access to System Data



In Linux kernel 2.4 procfs is the only file systemlike interface to internal data structures in the
kernel (the other interface is ioctl()/sysctl()).
In Linux kernel 2.6 sysfs has been added to shows
all of the devices (virtual and real) and their interconnectedness within the system.
In forthcoming kernel versions:


procfs: will contain only process statistics.
sysfs: will contain device statistics (network interfaces,
TCP/IP stack, temperatures and fan speed, SCSI interfaces,
etc.).
LHCb PC Farm Monitoring and Control System. 11
Domenico Galli
Sensor Access to System Data (II)



At present, in kernel 2.6:

network interfaces on both procfs and sysfs;

temperatures and fan speed on sysfs;

TCP/IP stack, CPU states, memory, on procfs.
Monitor sensors at present use procfs (excluding
temperature sensor which uses sysfs in the version
for the kernel 2.6).
In the future probably most of the sensors (all but
prosess sensor) should be modified to access sysfs.
LHCb PC Farm Monitoring and Control System. 12
Domenico Galli
Sensor Access to System Data (III)



In MS Windows the interface to internal data
structures in the kernel is different.
A procfs interface is provided by cygwin, but is
rather poor.
We are investigating on using WMI API (Windows
Management Interface, .NET platform) to access
internal data structures in the kernel for monitoring.
LHCb PC Farm Monitoring and Control System. 13
Domenico Galli
Guidelines for Light-Weight Server
Developement

Guidelines followed in sensors and actuators
development:


Function written in plain C with particular control of memory
allocation (e.g., if possible, memory is allocated once and for
all during sensor initialization).
Low level access (not stream access) to procfs and sysfs and
one-shot data read.


Each time a (even partial) read operation is performed on the
procfs/sysfs the kernel takes time to produce the entire set
of data provided in the file.
When possible for complex tasks use maintained libraries
(like libprocps) to cope with changes in kernel version.
LHCb PC Farm Monitoring and Control System. 14
Domenico Galli
Task Manager

It is a tool to start, list and stop processes on every
farm node from a central console.

It uses TCP as transport protocol, through the DIM
network communication layer.

To track processes (e.g. to list or to stop them) it set
an additional environment variable (a descriptive
string, named UTGID, User Assigned Thread Group
Identifier) to every started process.

no more then one process can be started with the same
UTGID.

This way, the stamp imposed to processes survives to an
incidental Task Manager crash.

UTGID can be defined by the user or can be automatically
generated as <binary executable image>_<instance #>
LHCb PC Farm Monitoring and Control System. 15
Domenico Galli
Task Manager Components



tmSrv: Task Manager Server (to be executed on each
farm node).
tmStart, tmLs, tmKill, tmStop: Task Manager
command-line clients (can be executed on any PC on
the network).
tableViewUTGID.pnl, startPanel.pnl,
killPanel.pnl, stopPanel.pnl: PVSS GUI clients
(can be executed on any PC on the network with a
PVSS remote UI).
LHCb PC Farm Monitoring and Control System. 16
Domenico Galli
Task Manager Features





The started process can have a clean environment
(except UTGID) or can inherit the task manager
environment.
An arbitrary number of new evironment variables can
be added to started processes.
The stdout and stderr of started processes can be
thrown to /dev/null or can be redirected to the
logger.
It can start processes as daemons (process group
leader, i.e. new session ID, umask reset, no controlling
tty, ignored SIGCLD).
It can set the scheduler of the started processes
(time sharing, fifo or round-robin).
LHCb PC Farm Monitoring and Control System. 17
Domenico Galli
Task Manager Features (II)




It can set the nice level (for the evaluation of the
dynamic priority) to the processes started with the
time sharing scheduler.
It can set the static (real-time) priority to processes
started with fifo and round-robin scheduler.
It can set the username of the started processes.
It is immediately signaled about the termination of
a started process (through the SIGCHLD signal) and:


It logs alternatively the exit code or the number of the
signal which caused the process to stop;
It immediately refresh the list of started processes
(published using DIM).
LHCb PC Farm Monitoring and Control System. 18
Domenico Galli
Task Manager Features (III)


The kill DIM CMD can send a chosen signal to a single
process or to all the processes whose UTGID
matches a certain POSIX.2 wildcard pattern.
The stop DIM CMD sends a chosen signal to a process
and returns immediately but it also triggers off the
deferred sending of a SIGKILL signal to the same
process (in a different thread).

This way the process is left the chance to exit gracefully
on a SIGTERM reception, but, if it fails, it is stopped
abruptly by a SIGKILL after a certain, chosen, delay.
LHCb PC Farm Monitoring and Control System. 19
Domenico Galli
Task Manager Command Line Interface

tmStart


Sends a signal to process(es).
tmStop[-m hostname_pattern][-s sig][-d delay] utgid_pattern



Lists processes which have UTGID set.
tmKill[-m hostname_pattern][-s sig] utgid_pattern


Starts a new process on one or more farm PCs.
tmLs[-m hostname_pattern][utgid_pattern]


[-m hostname_pattern][-c][-D NAME=value...][-d]
[-s scheduler][-p nice_level][-r rt_priority]
[-n user_name][-u utgid][-o][-e][-w wd] path [arg...]
Sends a signal to process(es).
If the process(es) is not dead after delay seconds, sends a SIGKILL signal.
Without blocking the client.
Recognize POSIX.2 wildcard pattern (*, ?, character classes [027], ranges
[3-7], complementation [!027] or [!3-7]) in hostname and UTGID.
LHCb PC Farm Monitoring and Control System. 20
Domenico Galli
Task Manager PVSS interface
LHCb PC Farm Monitoring and Control System. 21
Domenico Galli
Task Manager PVSS interface (II)
LHCb PC Farm Monitoring and Control System. 22
Domenico Galli
Task Manager PVSS interface (III)
LHCb PC Farm Monitoring and Control System. 23
Domenico Galli
Task Manager PVSS interface (IV)
LHCb PC Farm Monitoring and Control System. 24
Domenico Galli
Task Manager PVSS interface (V)
LHCb PC Farm Monitoring and Control System. 25
Domenico Galli
Monitor Sensors

9 light-weight monitor sensors for nodes developed:

Temperatures and fans speeds;

CPU info (CPU number, brand and model, cache size, clock).

CPU states (user, system, nice, idle, iowait, irq, softirq);




Hardware interrupt rates (separately per CPU and per irq
source);
Memory usage;
Process status (including scheduling class and real time
priority);
Network Interface Card counters’ rates and error
fractions;

Network Interface Interrupt Coalescence

TCP/IP stack rates and error fraction.
LHCb PC Farm Monitoring and Control System. 26
Domenico Galli
1 - Temperature and Fan Speed Sensor

It collects temperature of the sensors integrated on
the motherboard and fan speeds.


Uses lm_sensors software.
Needs bus drivers (for ISA or I2C/SMBus) and sensor chip
drivers.


Integrated in Linux kernel tree since kernel 2.6.
This server has been developed in 2 versions: one for
Linux kernel 2.4 which get data from procfs and the
other for Linux kernel 2.6, which get the data from
sysfs.

E.g., on sysfs:

/sys/devices/platform/i2c-0/0-0290/temp1_input

/sys/devices/platform/i2c-0/0-0290/fan1_input
LHCb PC Farm Monitoring and Control System. 27
Domenico Galli
1 - Temperature and Fan Speed Sensor
(II)


Hardware compatibility issues:

Most bus drivers have been ported to kernel 2.6;

However, only 43% of the chip drivers have been ported.
The IPMI alternative:



IPMI (Intelligent Platform Management Interface) v1.5 can
get the same information in a more portable way (it is OSindependent).
The configuration of the IPMI environment monitoring
system is responsibility of the hardware vendor (more
uniformity and reliability of data is expected).
Probably, in new software releases, temperatures and fan
speeds will be collected using IPMI LAN interface.
LHCb PC Farm Monitoring and Control System. 28
Domenico Galli
1 – Temperature and Fan Speed Sensor:
PVSS Interface
Node hostname
LHCb PC Farm Monitoring and Control System. 29
Domenico Galli
1 – Temperature and Fan Speed Sensor:
PVSS Interface (II)
LHCb PC Farm Monitoring and Control System. 30
Domenico Galli
2 – CPU Information Sensor

This server provides static informations
about the CPU(s), e.g.:





The CPU brand and model identifier string;
The CPU family, model and sub-version (revision)
identifier;
The clock frequency of the CPU and the CPU
cache size.
The number of hyper-threading cores in that
physical CPU.
The CPU computational power in bogomips.
LHCb PC Farm Monitoring and Control System. 31
Domenico Galli
2 – CPU information Sensor: PVSS
Interface
right-click
LHCb PC Farm Monitoring and Control System. 32
Domenico Galli
3 - CPU States Sensor


It collects, and evaluates as a percentage, both the
aggregate values and the specific per-CPU values of
the fraction of time spent by the CPUs performing
different kinds of work:

user: normal processes executing in user mode.

nice: niced processes executing in user mode.

system: processes executing in kernel mode.

idle: not working.

iowait: waiting for I/O to complete.

irq: servicing interrupts (only in kernel ≥ 2.6).

softirq: servicing softirqs (only in kernel ≥ 2.6).
Moreover it collects the global context switch rate.
(useful to check the operation of process scheduling).
LHCb PC Farm Monitoring and Control System. 33
Domenico Galli
3 - CPU States Sensor: PVSS Interface
LHCb PC Farm Monitoring and Control System. 34
Domenico Galli
3 - CPU States Sensor: PVSS Interface
LHCb PC Farm Monitoring and Control System. 35
Domenico Galli
4 - Hardware Interrupt Sensor

It collects the interrupt rates issued by
hardware device drivers.



Data are partitioned per-CPU and per-driver
(timer/PIC/local APIC, rtc, eth0, eth1, etc.).
Average values and maximum values (since DIM
server startup) are evaluated.
Useful to control IRQ-to-CPU affinity of the
network interfaces.
LHCb PC Farm Monitoring and Control System. 36
Domenico Galli
4 - Hardware Interrupt Sensor: PVSS
Interface
kernel 2.6
LHCb PC Farm Monitoring and Control System. 37
Domenico Galli
5 - Memory Usage Sensors

It collects memory usage statistics.


Available quantities depends on kernel version.
Main collected quantities (more details):

Total/Low/High Memory occupation.

Disk cache.

Virtual memory management.

Swapping and paging.

Vmalloc.
LHCb PC Farm Monitoring and Control System. 38
Domenico Galli
5 - Memory Usage Sensors: PVSS
Interface
More recently used
Still not copied to disk
over-committing
LHCb PC Farm Monitoring and Control System. 39
Domenico Galli
5 - Memory Usage Sensors: PVSS
Interface (II)
right-click
LHCb PC Farm Monitoring and Control System. 40
Domenico Galli
6 - Process Status Sensor

It collects for each task the process status, like “top”
or “ps”.

Must cope with deep changes in Linux threading model.

LinuxThreads, kernel ≤ 2.4.19:



Each thread has a unique process ID (PID).
getpid() function therefore returns different values (PID) for
the different threads of the same process.
NPTL, Native POSIX Threading Library, kernel ≥ 2.4.20


Each thread has a unique identifier called TID (Thread
Identifier) while the PID has been replaced by TGID (Thread
Group Identifier).
getpid() function therefore returns the same value (TGID) for
all threads in a process.
LHCb PC Farm Monitoring and Control System. 41
Domenico Galli
6 - Process Status Sensor (II)


Following changes in threading model, process data
format in procfs depends on kernel version:

/proc/<pid>/… until kernel ≤ 2.4.19.

/proc/<tgid>/task/<tid>/… starting with kernel ≥ 2.4.20.
To cope with changes in kernel version, the process
status sensor access kernel data in procfs by means of
the maintained (but undocumented) library libproc3.2.3.so:


from http://procps.sourceforge.net/
(more details)
LHCb PC Farm Monitoring and Control System. 42
Domenico Galli
6 - Process Status Sensor: PVSS
Interface (basic panel)
TS: time sharing (Linux Default)
No UTGID (not started by the Task Manager)
LHCb PC Farm Monitoring and Control System. 43
3-thread process:
same TGID, same UTGID, different TIDs
Domenico Galli
6 - Process Status Sensor: PVSS
Interface (advanced panel I)
LHCb PC Farm Monitoring and Control System. 44
Domenico Galli
6 - Process Status Sensor: PVSS
Interface (advanced panel II)
SIZE: size of the core image of the task (code+data+stack)
VSIZE: virtual memory usage (lib+exe+data+stack)
LHCb PC Farm Monitoring and Control System. 45
RSS: Non-swapped physical memory
Domenico Galli
6 - Process Status Sensor: PVSS
Interface (advanced panel III)
signals
LHCb PC Farm Monitoring and Control System. 46
Domenico Galli
6 - Process Status Sensor: PVSS
Interface (advanced panel IV)
processor
S: Interruptible sleep
s: session leader
l: multi-threaded
LHCb PC Farm Monitoring and Control System. 47
Domenico Galli
7 - Network Interface Sensor


It collects Network Interface Card counters and
evaluates transmission rates and error fractions.
It reads also interface name (eth0, eth1, etc.), IP
address and MAC address of the system boards.



Cope with the problem of 32-bit rx_bytes/tx_bytes
counters in kernel when a Gigabit Ethernet interface is
transmitting/receiving at full speed (but sensor must be
called at least once every 34 s).
Average values and maximum values (since DIM server
startup) are evaluated and can be reset.
(more details)
LHCb PC Farm Monitoring and Control System. 48
Domenico Galli
7 - Network Interface Sensor: PVSS
Interface
bit/s
frame/s
bytes/frame
If ≠ 0, cable problem or loose connector
LHCb PC Farm Monitoring and Control System. 49
Domenico Galli
7 - Network Interface Sensor: PVSS
Interface (II)
LHCb PC Farm Monitoring and Control System. 50
Domenico Galli
8 – Network Interface Interrupt
Coalescence Sensor

Evaluate the interrupt coalescence ratio of the
network interface, i.e., the ratio between the number
of frames received/transmitted by a network
interface and the number of interrupts raised by the
network interface card.


Usually the Gigabit Ethernet NICS but store the received
frames in a buffer and raise an interrupt after an
appropriate delay to deliver more than one frame for each
interrupt handle execution (thus reducing CPU utilization up
to 30% in frame receiving and up to 11% in frame sending).
This mechanism can be tuned by setting appropriate
parameters on the NIC (e.g., for Intel e1000,
InterruptThrottleRate, RxIntDelay, RxAbsIntDelay,
TxIntDelay, TxAbsIntDelay).
LHCb PC Farm Monitoring and Control System. 51
Domenico Galli
8 – Network Interface Interrupt
Coalescence Sensor: PVSS Interface
1.8 Ethernet frames/interrupt
LHCb PC Farm Monitoring and Control System. 52
Domenico Galli
9 - TCP/IP Stack Sensor

It collects TCP/IP stack counters and evaluates
rates and error fraction.



Essentially routing errors and fragmentation/reassembly
errors.
Average values and maximum values (since DIM server
startup or since the last reset) are evaluated and can be
reset.
Collected quantities (more details):

IP, TCP, UDP I/O rates;

IP forwarding and fragmentation/reassembling rates;

IP, TCP, UDP error fractions;

IP forwarding and fragmentation/reassembling fractions;

IP forwarding and fragmentation/reassembling error
fractions;
LHCb PC Farm Monitoring and Control System. 53
Domenico Galli
9 - TCP/IP Stack Sensor: PVSS Interface
LHCb PC Farm Monitoring and Control System. 54
Domenico Galli
9 - TCP/IP Stack Sensor: PVSS Interface
(II)
LHCb PC Farm Monitoring and Control System. 55
Domenico Galli
9 - TCP/IP Stack Sensor: PVSS Interface
(III)
LHCb PC Farm Monitoring and Control System. 56
Domenico Galli
9 - TCP/IP Stack Sensor: PVSS Interface
(IV)
LHCb PC Farm Monitoring and Control System. 57
Domenico Galli
9 - TCP/IP Stack Sensor: PVSS Interface
(V)
LHCb PC Farm Monitoring and Control System. 58
Domenico Galli
9 - TCP/IP Stack Sensor: PVSS Interface
(VI)
LHCb PC Farm Monitoring and Control System. 59
Domenico Galli
Finite State Machine Alarm Generation


A PVSS script periodically compare the monitored
value with its threshold, and if it is exceeded, a
state machine
transition is
triggered.
A button on
the monitor
panels to
configure the
thresholds for
warning &
error state of
the state
machine.
LHCb PC Farm Monitoring and Control System. 60
Domenico Galli
Finite State Machine Alarm Generation (II)

If the button is pressed, a new panel is open, in which
an “expert user”
can set-up the
alarm
thresholds.
LHCb PC Farm Monitoring and Control System. 61
Domenico Galli
Power Manager





It is a tool to switch-on, switch-off, power-cycle,
shut-down and show power status of every farm node
Farm Node
from a central console.
power
Control PC
Runs on the Control PCs.
manager
DIM
DIM
client
Power Manager
Server
IPMI
SFN-001-03
BMC
To communicate with clients (which send commands and
check status), uses the DIM network communication
layer, as server.
To operate on nodes (which are switched on and off),
uses IPMI (Intelligent Platform Management
Interface), as client.
Make use of IPMItool’s libintf_lan.so library, hacked,
in order to make it thread-safe (no more global
variables, no more signals & longjmps to time-out).
LHCb PC Farm Monitoring and Control System. 62
Domenico Galli
Power Manager – IPMI Interfaces

IPMI has two kinds of interfaces:


KCS (Keyboard Controller Style) interface (AKA open
interface)

Local interface (interface to the host OS), unauthenticated.

Can be accessed through the openIPMI linux software.

Can’t be used to swich on a PC or to power cycle a hung-up PC.
LAN interface


Network interface, session-based, authenticated.
Designed to be always available (even when the system is
powered down or when the OS is hung or inactive).

Hardware implementation.

OS independent.
LHCb PC Farm Monitoring and Control System. 63
Domenico Galli
Power Manager – IPMI LAN Interface

Server side (farm node):






Harware implementation.
NIC hardware redirects also to BMC the Ethernet frames
containing datagrams destined to UDP port 623.
Configured by means of PC startup configuration utility.
May use DHCP to set up network
parameters.
No need of additional
software.
Client side (control PC).

Management
Network
Controller
Control PC
LAN
(IPMI client)
Client software, e.g.: IPMItool,
freeIPMI, IPMIsh, LHCb Power
Manager Server.
UDP port 623
LHCb PC Farm Monitoring and Control System. 64
Domenico Galli
(BMC)
Baseboard
Management
Controller
Farm node
other
Ethernet
frames
Power Manager Deployment

Each Control PC runs a DIM server interfaced to
IPMI and publishes, for each node, a command and a
Farm Nodes
service.
SFN-001-01
BMC
DIM Services:
/SFN-001-01/power_status
/SFN-001-02/power_status
/SFN-001-03/power_status
PVSS
GUI
PVSS-DIM DIM
client
CMD-line
client
SFN-001-02
BMC
Control PC
Power Manager
Server
DIM Commands:
/SFN-001-01/power_switch on|off|soft_off|cycle
/SFN-001-02/power_switch on|off|soft_off|cycle
/SFN-001-03/power_switch on|off|soft_off|cycle
LHCb PC Farm Monitoring and Control System. 65
Domenico Galli
IPMI
SFN-001-03
BMC
SFN-001-04
BMC
SFN-001-05
BMC
Power Manager Commands

DIM CMD: /<HOSTNAME>/power_switch takes an
argument, which can be (from IPMI specifications):






on: power-up the chassis.
off: power-down the chassis (without a clean shut-down of
the OS).
cycle: power-down, wait 1 second, and power-up again.
soft_off: initiate a soft-shutdown of OS via ACPI by
emulating a fatal over-temperature condition.
hard_reset: pulse the system reset signal.
pulse_diag: pulse a version of a diagnostic interrupt that
goes directly to the processor(s). This is typically used to
cause the operating system to do a diagnostic dump (OS
dependent).
LHCb PC Farm Monitoring and Control System. 66
Domenico Galli
Power Manager Features

It copes with long IPMI response times (> 0.7 s) and
with very long timeout times (~ 16 s) in case of a
disconnected node:


by using one thread for each node to be contacted, in
order to parallelize IPMI connections.
It copes with IPMI ability to receive only one
command at a time:

if the NIC BMC is processing a command, it is not able to
receive or queue other commands. The second command
fails.


E.g.: while the Power Manager is doing a periodic update of the
power status of a certain node, it is not able to switch on/off
the same node.
Power Manager arbitrates between commands sent to the
same node and is able to defer overlapped commands.
LHCb PC Farm Monitoring and Control System. 67
Domenico Galli
Power Manager Features (II)

It copes with IPMI configuration in which OS and
BMC have the same IP address:



Two answers are sent back from the node:

one from the OS (ECONNREFUSED);

one from the BMC.
ECONNREFUSED takes priority over any other received
datagram;
that means that the Connection Refused shows up before the
response packet, regardless of the order they were sent out.
(unless the response is read before the connection refused is
returned)
LHCb PC Farm Monitoring and Control System. 68
Domenico Galli
Power Manager Development Status

A power manager DIM server and 2 command-line
DIM clients are ready and working:



ipmiSrv: Power Manager Server (to be executed on each Control
PC). At present recognize only on and off command arguments.
pwSwitch: Power Manager command-line client (can be executed on
any PC on the network).
pwStatus: Power Manager tty-oriented client (can be executed on
any PC on the network).

Tested on a Dell PowerEdge SC 1425 without OS.

A PVSS DIM client is under development.

Basically one PVSS panel showing:


A list of the controlled nodes with their power status (on, off).
Buttons for power on / off / soft_off / cycle / power_reset /
pulse_diag.
LHCb PC Farm Monitoring and Control System. 69
Domenico Galli
Power Manager Command-Line Clients

pwSwitch [-m hostname] on|off|cycle|soft_off

issues a switch command to host hostname;

hostname can be a POSIX.2 wild-card pattern;


if hostname is not specified command is issued to all nodes found
on DIM DNS.
pwStatus [-m hostname]



returns the power status of the host hostname (on, off, not
reachable);
hostname can be a POSIX.2 wild-card pattern;
if hostname is not specified the status of all nodes found on
DIMDNS is returned.
LHCb PC Farm Monitoring and Control System. 70
Domenico Galli
Power Manager Command-Line Clients
service time out
command time out
N.B.: node
lhcbcn2 is LHCb PC Farm Monitoring and Control System. 71
command time out
Domenico Galli
disconnected!
DIM Logger



It is a tool which allows to centrally collect messagges
(debug/info/warning/error/fatal) from applications in
execution on the farm.
Uses TCP as transport layer through the DIM network
communication layer.
Uses a
POSIX.1
FIFO (aka
named pipe)
as local
buffer.
sent to stderr
by ld.so
LHCb PC Farm Monitoring and Control System. 72
Domenico Galli
DIM Logger Features


It can collect messagges sent to stderr/stdout from
system software (e.g. the dynamic linker, see screen
shot) by redirecting stderr to the FIFO.
It is congestion-proof.


In a quasi-congested network all application trying to send
log messages via TCP could hung-up.
DIM logger can use (and can, of course, don’t use) a Linux
extension of POSIX.1 FIFO (non-blocking RW open of the
FIFO, O_RDWR|O_NONBLOCK|O_APPEND) which allows:


to make non-blocking write to the FIFO.
if the FIFO fills-up completely due to network congestion, to
automatically drop messages.
LHCb PC Farm Monitoring and Control System. 73
Domenico Galli
DIM Logger Components



logSrv: Logger Server (to be executed on each farm
node).
logViewer: Logger tty-oriented client (can be
executed on any PC on the network).
loggerPanel.pnl: Logger PVSS GUI clients (can be
executed on any PC on the network with a PVSS
remote UI).
LHCb PC Farm Monitoring and Control System. 74
Domenico Galli
Process Controller

It is a process to be executed on the control PCs,
which controls the processes executing on all the
farm nodes, immediately restarting them in case of
crash.




Reads from an XML file (in future through DIM) the list of
the processes which must be executed on each farm node
and their execution environment (command-line arguments,
environment variables, user, scheduler type, priority,
respawn parameters, etc.).
Works by contacting the Task Manager on each farm node.
A process crash triggers the process respawn in a few
tenths of seconds (through a mechanism based on SIGCHLD
asynchronous signal).
Moreover, a respawn control mechanism is implemented:

If a process is respawned more than N times in T1 seconds,
respawn is disabled for T2 seconds.
LHCb PC Farm Monitoring and Control System. 75
Domenico Galli
More Details
5 - Memory Usage Sensors (I)

It collects memory usage statistics.


Available quantities depends on kernel version.
Main collected quantities:



Total/Low/High Memory occupation. High memory (above
~896 MiB of physical memory) can’t be used for kernel data
structures and is slower to access than low memory (the
kernel must use tricks to access this memory). A system crash
if it gets out of low memory.
Disk cache: Buffers (relatively temporary storage for raw
disk blocks), Cached (in-memory cache for files read from the
disk), Mapped (files which have been mmaped, such as
libraries).
Slab: in-kernel data structures cache.
LHCb PC Farm Monitoring and Control System. 77
Domenico Galli
5 - Memory Usage Sensors (II)

Main collected quantities (cont’d):

Virtual memory management. Active (used more recently and
usually not reclaimed unless absolutely necessary), Inactive
(less recently used and more eligible to be reclaimed for other
purposes), Dirty (waiting to get written back to the disk),
Committed_AS (related to Linux memory over-committing
policy: estimate of how much RAM is needed to make a 99.99%
guarantee that the OOM killer, Out Of Memory killer, is not
invoked).
LHCb PC Farm Monitoring and Control System. 78
Domenico Galli
5 - Memory Usage Sensors (III)

Main collected quantities (cont’d):



Swapping and paging: SwapTotal, SwapFree and SwapUsed
(memory which has been evicted from RAM, and is
temporarily on the disk); SwapCached (memory that once was
swapped out, is swapped back in, but still also is in the
swapfile; if memory is needed, it doesn't need to be swapped
out AGAIN because it is already in the swapfile; this saves
I/O); PageTables: amount of memory dedicated to the
lowest level of page tables.
Vmalloc: total/used/chunk: vmalloc is a mechanism to map
physically non-contiguous memory areas to a contiguous area
in virtual memory. Used for storing the swap map information
and for loading kernel modules into memory.
back
LHCb PC Farm Monitoring and Control System. 79
Domenico Galli
6 - Process Status Sensor (I)

Collected quantities for each task:

CMD: command name (only the executable name).

CMDLINE: command with all its arguments as a string.

USER (alias EUSER): effective user name.

GROUP (alias EGROUP): effective group ID of the process.

TGID: (Was PID) thread group ID number of the process.

UTGID: User assigned unique Thread Group Identifier.

TID (alias LWP, SPID): Thread ID, aka light-weight process
ID.

PPID: Parent process ID.

NLWP: (alias THCNT) Number of light-weight process
(threads) in the process.
LHCb PC Farm Monitoring and Control System. 80
Domenico Galli
6 - Process Status Sensor (II)

Collected quantities for each task (cont’d):





SIZE (alias SZ): size (in kB) of the core image of the task
(code+data+stack).
RSS: (alias RSZ, RES) resident set size (in kB): the nonswapped physical memory that a task has used.
SHARE (alias SHR): the amount of shared memory (in kB)
used by the task.
VSIZE (alias VSZ, VIRT): virtual memory usage (in kB) of
entire process (lib+exe+data+stack).
PSR (alias P): processor that process is currently assigned
to (useful to check the operation of process CPU affinity).
LHCb PC Farm Monitoring and Control System. 81
Domenico Galli
6 - Process Status Sensor (III)

Collected quantities for each task (cont’d):

STAT (alias S): multi-character process state:

First character:







D Uninterruptible sleep (usually I/O);
R Running or runnable (on run queue);
S Interruptible sleep (waiting for an event to complete);
T Stopped, either by a job control signal or because it is being
traced;
X dead (should never be seen);
Z Defunct ("zombie") process, terminated but not reaped by its
parent.
Following characters:






< high-priority (not nice to other users);
N low-priority (nice to other users);
L has pages locked into memory (for real-time and custom IO);
s is a session leader;
l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do);
+ is in the foreground process group.
LHCb PC Farm Monitoring and Control System. 82
Domenico Galli
6 - Process Status Sensor (IV)

Collected quantities for each task (cont’d):



%CPU: The task's share of the CPU time since the last update
(like top, not like ps), expressed as a percentage of total CPU
time per processor.
%MEM: ratio of the process's resident set size to the physical
memory on the machine, expressed as a percentage.
CLS: scheduling class of the process:

–
not reported

TS
SCHED_OTHER (time-sharing, dynamic priority, linux default);

FF
SCHED_FIFO (real-time, static priority, first in first out);

RR
SCHED_RR (real-time, static priority, round robin);

?
unknown value.
LHCb PC Farm Monitoring and Control System. 83
Domenico Galli
6 - Process Status Sensor (V)

Collected quantities for each task (cont’d):



RTPRIO: real-time (static) priority. Defined only for realtime tasks (scheduled with SCHED_FIFO or SCHED_RR). It
is set to “N/A” for time-sharing tasks (SCHED_OTHER) .
NI: nice value. This ranges from 19 (nicest) to -20. Defined
only for time-sharing tasks (SCHED_OTHER). It is set to
“N/A” for real-time tasks (SCHED_FIFO or SCHED_RR).
PRI (alias PR): the (dynamic) priority of the task. Defined
only for time-sharing tasks. It is set to “RT” for real-time
tasks.

STARTED (alias START): time the command started.

ELAPSED: elapsed time since the process was started.

TIME: cumulative CPU time.
LHCb PC Farm Monitoring and Control System. 84
Domenico Galli
6 - Process Status Sensor (VI)


Collected quantities for each task (cont’d):

TTY (alias TT): controlling tty (terminal).

PENDING: mask of the pending signals.

CATCHED (alias CAUGHT): mask of the caught signals.

IGNORED: mask of the ignored signals.

BLOCKED: mask of the blocked signals.
back
LHCb PC Farm Monitoring and Control System. 85
Domenico Galli
7 - Network Interface Sensor (I)

Collected quantities:






rx_bitRate, tx_bitRate: the total number of bits received
and transmitted in a second.
rx_packetsRate, tx_packetsRate: the total number of
Ethernet frames received and transmitted in a second.
rx_multicastRate: The total number of multicast Ethernet
frames received in a second.
rx_bytes4packet, tx_bytes4packet: the average number of
bytes contained in a received or transmitted Ethernet
frame.
rx_errorsFrac: fraction of bad Ethernet frames received.
tx_errorsFrac: fraction of transmitted Ethernet frames
with packet transmit problems.
LHCb PC Farm Monitoring and Control System. 86
Domenico Galli
7 - Network Interface Sensor (II)

Collected quantities (cont’d):


rx_fifo_errorsFrac, tx_fifo_errorsFrac: fraction of
received and transmitted Ethernet frames which
encountered the condition of receiver/transmitter fifo
overrun.

rx_frame_errorsFrac: fraction of received Ethernet frames
with frame alignment errors.

collisionsFrac: fraction of transmitted frames which
generates Ethernet collisions in half duplex network.


rx_droppedFrac, tx_droppedFrac : fraction of received and
transmitted Ethernet frames dropped by operating system
due to buffer overflows or throttling policy.
tx_carrier_errorsFrac: fraction of transmitted Ethernet
frames which encountered a condition of transmission errors
due to loss of carrier. If this ratio is greater than zero
there is probably a cable/connector problem or a bad duplex
setting.
back
LHCb PC Farm Monitoring and Control System. 87
Domenico Galli
9 - TCP/IP Stack Sensor (I)

Collected quantities: IP rates:



InReceivesRate, InDeliversRate , ForwDatagramsRate: the
rate of IP datagrams received, delivered (to IP userprotocols) and forwarded (to their final destination) from all
the network interfaces.
OutRequestsRate: the rate of IP datagrams which local IP
user-protocols supplied to IP in requests for transmission.
ReasmReqdsRate, FragReqdsRate: the rate of IP fragments
received in a second, which needed to be reassembled at this
entity and the rate of datagrams in a second that need to be
fragmented.
LHCb PC Farm Monitoring and Control System. 88
Domenico Galli
9 - TCP/IP Stack Sensor (I)

Collected quantities: TCP rates:


InSegsRate, OutSegsRate: the total number of TCP
segments received or sent in a second.
Collected quantities: UDP rates:

InDatagramsRate, OutDatagramsRate: the rate of UDP
datagrams delivered to UDP users or sent.
LHCb PC Farm Monitoring and Control System. 89
Domenico Galli
9 - TCP/IP Stack Sensor (III)

Collected quantities: IP error fractions:






InHdrErrorsFrac: fraction of input IP datagrams discarded
due to errors in their IP headers.
InAddrErrorsFrac: fraction of input IP datagrams discarded
because the IP address was not a valid address to be
received at this entity.
InUnknownProtosFrac: fraction of input IP datagrams
discarded because of an unknown or unsupported protocol.
InDiscardsFrac: fraction of valid input IP datagrams which
were discarded (e.g., for lack of buffer space).
OutNoRoutesFrac: fraction of output IP datagrams
discarded because no route could be found to transmit them
to their destination.
OutDiscardsFrac: fraction of valid output IP datagrams
discarded (e.g., for lack of buffer space).
LHCb PC Farm Monitoring and Control System. 90
Domenico Galli
9 - TCP/IP Stack Sensor (IV)

Collected quantities: IP
forwarding/fragmentation/reassembling fractions:






ForwDatagramsFrac: fraction of input datagrams which are
forwarded.
InDeliversFrac: fraction of input datagrams which are
successfully delivered to IP user-protocols.
ReasmReqdsFrac: average number of fragments received for
each datagram received.
ReasmOKsFrac: fraction of received IP datagrams which needed
to be reassembled which was successfully reassembled.
FragReqdsFrac: fraction of output datagrams that needed to be
fragmented.
FragCreatesFrac: average number of fragments created for
each datagram to be sent.
LHCb PC Farm Monitoring and Control System. 91
Domenico Galli
9 - TCP/IP Stack Sensor (V)

Collected quantities. IP fragmentation/reassembling
error fractions (cont’d):



ReasmTimeoutFrac: fraction of failures detected by the IP
re-assembly algorithm due to reassembling time-out.
ReasmFailsFrac: fraction of failures detected by the IP reassembly algorithm for whatever reason.
FragFailsFrac: fraction of output datagrams that needed to
be fragmented whose fragmentation failed.
LHCb PC Farm Monitoring and Control System. 92
Domenico Galli
9 - TCP/IP Stack Sensor (VI)

Collected quantities. TCP error fractions:



RetransSegsFrac: fraction of output segments which are
retransmitted.
OutRstsFrac: fraction of output segments containing the
RST flag.
InErrsFrac: fraction of input segments received in error.
LHCb PC Farm Monitoring and Control System. 93
Domenico Galli
9 - TCP/IP Stack Sensor (VII)

Collected quantities. UDP error fractions:



NoPortsFrac: fraction of received UDP datagrams for which
there was no application at the destination port.
InErrorsFrac: fraction of received UDP datagrams that
could not be delivered for reasons other than the lack of an
application at the destination port.
back
LHCb PC Farm Monitoring and Control System. 94
Domenico Galli