LHCb Trigger and Data Acquisition System
Download
Report
Transcript LHCb Trigger and Data Acquisition System
Management of the LHCb Online
Network Based on SCADA System
Guoming Liu*†, Niko Neufeld†
* University of Ferrara, Italy
† CERN, Geneva, Switzerland
Outline
Introduction to LHCb Online system
LHCb online network
Network management based on SCADA system
Summary
ICALEPCS2009
Guoming Liu
2
LHCb online system
LHCb is one of the large particle physics experiments on
LHC at CERN
Online system is one of the infrastructures for LHCb,
providing IT services for the entire experiment
Three major components:
Data Acquisition (DAQ)
Transfers the event data from the detector front-end
electronics to the permanent storage
Timing and Fast Control (TFC)
Provides fast clock and drives all stages of the data readout of
the LHCb detector between the front-end electronics and the
online processing farm
Experiment Control System (ECS),
Controls and monitors all parts of the experiment
ICALEPCS2009
Guoming Liu
3
LHCb online system
L0
Trigger
L0 trigger
LHC clock
TFC
System
VELO
ST
OT
RICH
ECal
HCal
Muon
FEE
FEE
FEE
FEE
FEE
FEE
FEE
Readout Readout Readout Readout Readout Readout Readout
Board
Board
Board
Board
Board
Board
Board
Front-End
CASTOR
MEP Request
READOUT NETWORK
Event building
SWITCH SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
CC C C
P P P P
UU U U
SWITCH
CCCC
PPPP
UUUU
MON farm
C C C C
P P P P
U U U U
HLT farm
Experiment Control System (ECS)
Detector
Event data
Timing and Fast Control Signals
Control and Monitoring data
ICALEPCS2009
Guoming Liu
4
LHCb Online Network
Two dedicated networks:
Control network: general purpose network for experiment control
system
Connects all the Ethernet devices in LHCb
Data network: dedicated to data acquisition
Performance critical
ICALEPCS2009
Guoming Liu
5
LHCb Online Network
Two geographic parts: surface and underground
Connected by two 10G links
ICALEPCS2009
Guoming Liu
6
LHCb Online Network
On the surface
Core CTRL
Routers
Core DAQ
Router
DAQ Access
Switches (~50)
CTRL Access
Switches (~100)
ICALEPCS2009
Guoming Liu
7
Network Monitoring System based on SCADA
Motivation
This large network needs sophisticated monitoring
Integration into LHCb ECS coherently
Provides homogeneous interfaces for non-expert shift-crew
Commercial network management software?
Expensive
Integration?
ICALEPCS2009
Guoming Liu
8
Network Monitoring System: Architecture
Supervisory layer
PVSS II: commercial
SCADA system
JCOP: Joint Control
Project for LHC
experiments
DIM
Front–end Processes:
SNMP
sFlow
syslog
Data communication
SNMP / sFlow / Syslog
DIM: Distributed
Information Management
ICALEPCS2009
Guoming Liu
9
Network Monitoring System: FSM
All behaviors are modeled as Finite State Machines (FSM)
Hierarchical structure: status/command propagated
Device Units:
Device Description
Device Access
Based on PVSS II
datapoint: Alarm
Handling, Archiving,
Trending etc.
Control Units
Abstract behavior
modeling
Represents the
associated sub-tree
ICALEPCS2009
Guoming Liu
10
Network Monitoring System
The major items under monitor
Physical topology
Discovery of the network topology based on the Link Layer
Discovery Protocol (LLDP)
Discovery of the network nodes: based on the information in
switches (ARP, MAC forwarding table)
Traffic
Octet / packet counters
Discard/Error counters
...
Switch status: CPU/Memory, temperature, power supply , . . .
Data Paths for DAQ
ICALEPCS2009
Guoming Liu
11
Network Monitoring Snapshot(1): Topology
ICALEPCS2009
Guoming Liu
12
Network Monitoring Snapshot(2): traffic
ICALEPCS2009
Guoming Liu
13
Summary
The network management system has been implemented
based on the commercial SCADA system PVSS II and the
framework JCOP
It provides sophisticated monitoring of the network which
are essential for our operation, i.e. switch status, traffic
It provides the homogenous operation interface and
intuitive display as well
Currently only monitoring is provided, some control
commands of switches to be integrated
ICALEPCS2009
Guoming Liu
14
Thanks for your attention!
ICALEPCS2009
Guoming Liu
15
Backup
ICALEPCS2009
Guoming Liu
16
NMS Architecture:
front-end processes
SNMP: Simple network management protocol
Used for general network monitoring, configuring
sFlow:
A sampling mechanism to capture traffic data
Based on hardware.
Two kinds of sFlow samples: flow samples and counter
samples.
Used on the core switch to collect traffic counters:
SNMP too slow, and consumes high CPU/Memory
Syslog: event notification messages
Three distinct parts: priority, header and message.
The priority part represents both the facility and severity
of the message.
ICALEPCS2009
Guoming Liu
17
Network Monitoring: hardware/system
Syslog can collect some information not covered by SNMP
Syslog server is setup to receive the syslog messages
from the network devices and parse the messages.
Alarm information:
Hardware: temperature, fan status, power supply status
System: CPU, memory, login authentication etc.
All the messages with the priority higher than warning,
will be sent to PVSS for further processing
ICALEPCS2009
Guoming Liu
18
Network Monitoring: IP routing
Monitoring the
status of the
routing using
“ping“/”arping”
Three stages for
the DAQ:
1. From readout
board to HLT
farm
2. From HLT Farm to
the LHCb online
storage
3. From the online
storage to CERN
CASTOR
ICALEPCS2009
Detector
L0
Trigger
VELO
ST
OT
RICH
ECal
HCal
Muon
FEE
FEE
FEE
FEE
FEE
FEE
FEE
Readout Readout Readout Readout Readout Readout Readout
Board
Board
Board
Board
Board
Board
Board
Front-End
CASTOR
READOUT NETWORK
1
Event building
3
2
SWITCH SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
CC C C
P P P P
UU U U
SWITCH
CCCC
PPPP
UUUU
MON farm
C C C C
P P P P
U U U U
HLT farm
Event data
Timing and Fast Control Signals
Control and Monitoring data
Guoming Liu
19