Middleware renovation – technical overview 16th - Indico

Download Report

Transcript Middleware renovation – technical overview 16th - Indico

Wojciech Sliwinski BE-CO-IN
for the Middleware team:
Felix Ehm, Kris Kostro, Joel Lauener,
Radoslaw Orecki, Ilia Yastrebov, [Andrzej Dworak]
Special thanks to: Vito Baggiolini and Pierre Charrue
Agenda

Context & Motivation for Renovation

Middleware Review process

Technical evaluation of the transport layer

Changes in the MW Architecture in LS1

Conclusions
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
2
Agenda
Context & Motivation for Renovation
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
3
MW Mandate & Scope





Standard set of MW solutions
Centrally managed services
Track & optimize runtime parameters
Well defined feedback channel for users
Provide support & follow-up issues
Control System
GUI Applications
Control Logic
Middleware

Scope: CERN Accelerator Complex
 Operational 24*7*365
 Must be Reliable & High Quality
 73’000 HW devices, 3’150 servers
 In all Eqp. groups (4 dpts: BE, EN, GS, TE)
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
4
CMW in the Controls System
GENERAL
PURPOSE
NETWORK
FIXED
DISPLAYS
OPERATOR
CONSOLES
FILE SERVERS
JMS client (Java)
TCP/IP
GUIs communication services
APPLICATION SERVERS
CMW client (Java)
JAPC
Logging, LSA, InCA, SIS
SCADA SERVERS
CMW client/server (C++/Java)
Proxy, DIP, AlarmMon, AQ
JMS client
(Java)services
TCP/IP
communication
Servers: Logging, InCA, SIS
TIMING GENERATION
RT Lynx/OS
VME FRONT ENDS
WORLDFIP
Front Ends
M IDDLE TIER
CERN GIGABIT ETHERNET TECHNICAL NETWORK
CMW client (C++/Java)
JAPC
GUIs, LabView, RADE
PRESENTATION TIER
OPERATOR
CONSOLES
T
T
T
T
PLCs
BEAM POSITION MONITORS,
BEAM LOSS MONITORS,
BEAM INTERLOCKS,
RF SYSTEMS, ETC…
T
QUENCH PROTECTION AGENTS,
POWER CONVERTERS FUNCTIONS
GENERATORS, CRYO TEMPERATURE
SENSORS…
DIRECT I/O
T
T
FIP/IO
OPTICAL
FIBERS
T
PROFIBUS
T
T
CMW server (C++)
PVSS (Cryo, Vacuum)
RESOURCE TIER
CMW server (C++)
FESA, FGC, GM
WorldFIP SEGMENT
(1, 2.5 MBits/sec)
TCP/IP communication services
ACTUATORS AND SENSORS
CRYOGENICS, VACUUM, ETC…
LHC MACHINE
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
5
Motivations for MW Renovation

Current CORBA-based CMW-RDA
 Integrated in the Control system
 Used to operate all CERN accelerators
 Provides widely accepted Device/Property model
 > 10 years old

Why to review & upgrade MW ?
 CORBA was choosen 15 years ago
 Technical limitations of CORBA-based transport
 Functional limitations of the current CMW-RDA
 Codebase with long history  difficult to maintain, needs architecture review
 Major issue of long-term support & future evolution
 Evolution of technology over last 10 years: HW, OS, middleware, 3rd party libraries
 Human factor  less & less CORBA expertise on the market
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
6
Technical limitations of CORBA transport

Became legacy, not actively supported  maintenance issue
 Shrinking community, slow response time
 omniORB (C++) – 1 developer/maintainer, last release mid-2011
 JacORB (Java) – few developers, small community

Major technical limitations
 Lack of fully asynchronous processing channel
 Blocking communication  infamous JacORB blocking issue
 Lack of low-level control of IO resources (sockets, request queues)

Development issues
 Difficult to extend the wire protocol  Backward compatibility issue
 Complex, error prone API
 Heavy in memory usage
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
7
Summary: Why change CORBA?
CORBA was choosen 15 years ago
 Not actively maintained  big risk for the MW project
 Better solutions exist on the market
 Invest in future solution rather than maintaining old one

16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
8
Functional limitations of CMW-RDA

Several pending operational issues
 Difficult (or hardly possible) to resolve with current library
 Any major change very difficult to introduce
○ Technical Stops & Xmas breaks too short for massive deployment
○ High risk  Major impact on front-end frameworks and applications

No protection against ’slow/bad’ client applications
 Misbehaving application may destabilise front-end server
 Affects reliability of the subscription channel
 Workaround: introduction of Proxy

Poor scalability when many clients subscribed
 Stability issues observed when >200 clients subscribed (even for Proxy)
 Threading model doesn’t scale well with many clients

Missing support for priority clients (e.g. SIS, PM, InCA, Logging)
 Non-critical clients (e.g. GUIs) have the same communication priority

+ others …
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
9
Summary: Why change CMW-RDA?
With current CORBA-based middleware we can’t solve
the pending operational issues
 We can’t provide better scalability & reliability
 CMW-RDA is difficult to evolve & extend

16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
10
Agenda
Middleware Review process
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
11
Middleware Renovation process

MW Renovation = MW Review + MW Upgrade
 MW Review aims to provide the most appropriate technical solution satisfying the
user requirements
 MW Upgrade establishes the plan & strategy for introduction of the new MW
 Objective: LS1 the unique opportunity for the major MW upgrade

Middleware Review Process
 Gathering of users feedback and requirements (2010-11)
 Review of communication and serialization libraries (2011-12)
 Prototyping using selected communication products (2012)
 Design & impl. of new RDA3: Data, Client & Server (2012-13)
 Testing & validation of core MW infrastructure (summer’13)
 Upgrade of all dependent MW libraries & services (2013-14)
○ JAPC, Directory Service, Proxy, DIP Gateway
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
12
Review of users requirements

2010-11 – series of interviews with major users
 Lars Jensen, Stephen Jackson (BI)
 Andy Butterworth, Frode Weierud, Roman Sorokoletov (RF)
 Brice Copy, Clara Gaspar (DIP, DIM)
 Frederic Bernard, Herve Milcent, Alexander Egorov (PVSS)
 Alexey Dubrovskiy (CTF), Kris Kostro (DIP gateways)
 Marine Gourber-Pace, Nicolas Hoibian (Logging)
 Nicolas De Metz-Noblat (Front-Ends), Alastair Bland (Infrastructure)
 Michel Arruat (FESA), Stephen Page (FGC)
 Niall Stapley, Mark Buttner, Marek Misiowiec (LASER & DIAMON)
 Nicolas Magnin, Christophe Chanavat (ABT)
 Stephane Deghaye, Jakub Wozniak (InCA, SIS)
 Vito Baggiolini, Roman Gorbonosov (JAPC & DA systems)
 + regular feedback from OP
 + internal team input

http://wikis/display/MW/Interviews+with+Experts
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
13
New RDA3: Accepted requirements








New requirement
General
Java & C++ API, Win (64-bit) & Linux (SLC5 32-bit & SLC6 64-bit)
Accelerator Device Model (i.e. Device/Property)
Get, Set, Async-Get, Async-Set, Subscribe
Early detection of communication failures
Improve error reporting in all the layers: client, server, gateways
Admin interface & runtime diagnostics & statistics
Data support
 Data object: primitives, n-dim arrays, data structures

Subscription mechanism




Subscription behaviour the same regardless condition of the server (active, down)
Several client subscription policies (default: continuous)
Provide subscription notification ordering
First-Update enforced via CMW on server-side
○ Provide callback to front-end framework for the server-side Get




Drop support for on-change flag
Standardise use of subscription filters and update flags (e.g. immediate update)
Add header for acquired Data  common metadata (e.g. acq. stamp, cycle name)
All loss of data (dropped updates) must be notified to clients
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
14
New RDA3: Accepted requirements

New requirement
Client side
 RDA3 client API connects with both: RDA2 (old) & RDA3 (new) servers
 Efficient mechanism for: connection, disconnection & reconnection
 Must be able to recover from any interruption of communication with the server
○ Server restarts, IP address change, rename/move of a device to another server
 Improved semantics of Array Calls, i.e. handling of individual parameters
 Enhanced diagnostics & collection of statistics

Server side
 Policies for discarding notifications, i.e. deal with overflows and ’bad clients’
○ Instrument with counters & timings allowing to diagnose the notifications delivery
 Prioritisation of Get/Set requests for high-priority clients
 Server-side subscription tree fully managed by CMW
○ Server does not need to manage client subscriptions any more
 Manage the client connections, e.g. forced disconnect of a client
 Client lifetime callbacks (i.e. connected, disconnected)
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
15
New RDA3: Accepted requirements
New requirement

Server side (cont.)
 Client discovery for the diagnostics purposes (i.e. connected clients with payload)
 Enhanced diagnostics & collection of statistics

Ongoing discussions (not accepted yet)
 Prioritisation of subscription notifications for high-priority clients

Technical notes
 Invest in asynchronous & non-blocking communication
 Prefer 0-copy & lock-free data structures, message queues

http://wikis/display/MW/Design+of+New+RDA
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
16
New RDA3: Summary of requirements

Unchanged
 Device/Property model
 Set of basic operations (Get, Set, Subscribe)

Fixes & improvements
 Subscription mechanism
 Connection management
 Diagnostics & statistics

New functionality





Policies for subscription management (client & server)
Client priorities
Server-side subscription tree
Extended Data support
Standardise First-Update concept
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
17
Agenda
Technical evaluation of the
transport layer
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
18
Middleware transport requirements
Lightweight
Desirable
Friendly API, documentation
Request/reply & pub/sub patterns
Asynchronous
Performance & Scalability
Mandatory
Stability, Maturity & Longevity
Active community
Open source license
C++/Java
Fundamental
Linux/Windows
Over TCP/IP LAN
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
19
Evaluation process –> our criteria
Appearance
Simple usage
• Creators
• specification
• documentation
• Users
• forums
• bug reports
• Internet
Testing
• Download
• licensing
• Compile
• Linux & gcc
• Run examples
• Communication
patterns
• Performance
• Exceptional
situations
• QoS
• Configuration
CRITERIA
API, look & feel,
documentation
16th April 2013
Resources,
binary size,
memory
Community,
Communications
maturity
patterns
QoS
Performance
Andrzej Dworak, ICALEPCS 2011
Wojciech Sliwinski, Middleware Renovation: Technical Overview
20
Evaluated middleware products
All opinions are based only on our knowledge and evaluation. Each of the
products, depending on the requirements, may constitute a good solution.
CoreDX
OpenAMQ
RTI DDS
QPid
ZeroMQ
OpenSpliceDDS
RabbitMQ
YAMI
Ice
omniORB
JacORB
16th April 2013
MQtt RSMB
Thrift
Mosquito
Andrzej Dworak, ICALEPCS 2011
Wojciech Sliwinski, Middleware Renovation: Technical Overview
21
16th April 2013
Sync, async &
msg patterns
QoS
Dependencies
& memory f-p
Performance
Look & feel,
API, docs
Community &
maturity
Score
Products comparison (according to the criteria)
ZeroMQ






6
Ice






5
YAMI4






4
RTI






3
Qpid






3
CORBA






2
Thrift






2
Andrzej Dworak, ICALEPCS 2011
Wojciech Sliwinski, Middleware Renovation: Technical Overview
22
Conclusions




Several good middleware solutions available
The choice is dictated by the most critical requirements
Not easy  performance matters but also ease of use, community, …
Prototyping was done with the most promising candidates:
 ZeroMQ, Ice & YAMI

Finally we decided to choose ZeroMQ (http://www.zeromq.org/)
 Asynchronous & non-blocking communication
 0-copy & lock-free data structures, message queues
 Nice API, good documentation & active community
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
23
New RDA3 Java – Sync Get round-trip time
Syn Get round-trip (1kB message payload)
18
16
14
Round-trip (ms)
12
10
max
8
average
6
4
2
0
0
100
200
300
400
500
600
700
800
900
1000
Number of clients
Test setup: 1kB message payload, cs-ccr-* machines, 1 server host & 10 client hosts
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
24
New RDA3 Java – subscription notification latency
Subscription notification latency (1kB message payload)
250
Latency (ms)
200
150
min
max
100
average
50
0
0
100
200
300
400
500
600
700
800
900
1000
Number of clients
Test setup: 1kB message payload, cs-ccr-* machines, 1 server host & 10 client hosts
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
25
New RDA3 Java – subscription notification latency
Subscription notification latency (a closer look)
6
5
Latency (ms)
4
min
3
max
average
2
1
0
0
20
40
60
80
100
120
140
160
180
200
Number of clients
Test setup: 1kB message payload, cs-ccr-* machines, 1 server host & 10 client hosts
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
26
Agenda
Changes in the MW Architecture in LS1
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
27
User written
Current MW Architecture
Java Control
Programs
Central services
VB, Excel, LabView
C++ Programs
Passerelle C++
RDA Client API (C++/Java)
Administration
console
Clients
JAPC API
Middleware
Device/Property Model
Configuration
Database
CCDB
CMW Infrastructure
CORBA-IIOP
Directory
Directory
Service
Service
RBAC
RBAC
A1
Service
Service
RDA Server API (C++/Java)
Device/Property Model
Virtual Devices
(Java)
CMW int.
CMW int.
CMW int.
CMW int.
CMW int.
FESA
Server
FGC
Server
PS-GM
Server
PVSS
Gateway
More
Servers
Servers
CMW integr.
Physical Devices (BI, BT, CRYO, COLL, QPS, PC, RF, VAC, …)
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
28
User written
Changes in MW Architecture in LS1
Middleware
Central services
Java Control
Programs
C++ Programs
Passerelle C++
RDA Client API (C++/Java)
Administration
console
Clients
JAPC API
Upgrade in LS1
VB, Excel, LabView
Device/Property Model
Configuration
Database
CCDB
CMW Infrastructure
ZeroMQ
Directory
Directory
Service
Service
RBAC
RBAC
A1
Service
Service
RDA Server API (C++/Java)
Device/Property Model
Virtual Devices
(Java)
CMW int.
CMW int.
CMW int.
CMW int.
CMW int.
FESA
Server
FGC
Server
PS-GM
Server
PVSS
Gateway
More
Servers
Servers
CMW integr.
Physical Devices (BI, BT, CRYO, COLL, QPS, PC, RF, VAC, …)
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
29
LS1: Changes in RDA

New major version: RDA3 (June’13 – alpha version)
 Public API NOT backward compatible
 New protocol, new architecture, new design
 Same Device/Property model & Get/Set/Subscribe calls
 Announcement via accsoft-java-announce list

Required Actions for RDA Users
 For Java: Use new version of JAPC (API unchanged)
 For Java: New JAPC will support communication with RDA2 & RDA3 servers
 For C++: Upgrade user code to new RDA3 API
 For C++: RDA3 will support communication with RDA2 & RDA3 servers

Consequences if NO Action  staying with old RDA2
 NOT possible to communicate with new RDA3 servers (FESA3, FGC, etc.)
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
30
LS1: Changes in JAPC

New major JAPC version  upgrade for RDA3 (September’13)
 Public API backward compatible
 Possible API extensions, but always compatible
 Announcement via accsoft-java-announce list

Required Actions for JAPC Users
 Update JAPC jars (via CommonBuild)
 Re-release your product (via CommonBuild)
 New JAPC will support communication with RDA2 & RDA3 servers
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
31
Middleware Team
 Wojtek Sliwinski (Lead) 100% – Directory, RDA, Proxy, RBAC
 Felix Ehm 30% – JMS, Log/Tracing, Feedback/Metrics
 Joel Lauener 90% – CMW Admin, Directory, RDA, GM, DIP Gw.
 Kris Kostro 20% – DIP Gateways, RDA3
 Wojtek Buczak 30% – JAPC Core
 Ilia Yastrebov 100% – RDA, RBAC, Passerelle, Proxy, Log
 Radoslaw Orecki 100% – Directory, RDA3

Support: [email protected], [email protected]

Docs: http://wikis/display/MW
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
32
Conclusions

We have to replace CORBA with a new solution

We want to resolve the pending operational issues

We collected updated users requirements

New product (ZeroMQ) was choosen to replace CORBA

We will minimize changes to client SW during LS1
16th April 2013
Wojciech Sliwinski, Middleware Renovation: Technical Overview
33