transparencies

Download Report

Transcript transparencies

Network Processors for a 1 MHz
Trigger-DAQ System
RT2003, Montreal
Artur Barczyk, Jean-Pierre Dufey,
Beat Jost and Niko Neufeld
CERN-EP & Université de Lausanne
1
Network Processors
• Developed for high-end routers, on the
•
•
•
•
market since 1999
Dedicated processors optimised for high
speed packet processing
Large I/O capabilities (up to 10 Gigabit/s),
and up to 10 Mp/s
Large and fast buffer memories
Software programmable
 Use them in a network based DAQ system,
wherever PCs can’t do it (easily)
Niko NEUFELD
CERN, EP
2
Anatomy of a Network Processor
On-chip Memory
+ Interfaces for external memories
Control and
Monitoring
Multiple RISC
processor cores
Several hardware
threads
Coprocessors for
many common
networking tasks
General
Purpose
CPU
Scheduler
Routing and
Bridging Tables
Processor
Complex
Search
Engine
HW
Assist
Packet Buffer
Memory
MAC/FRAME
Processor
Integrated
Network
Interfaces
To and From PHYs
Niko NEUFELD
CERN, EP
3
NP module as PCI card
•
•
•
•
•
All infrastructure to
operate one IBM
PowerNP NP4GS3
3 x 1000 BaseT ports
One port converted
into PCI, for
development purposes
2 NPs can be
connected via special
cable
Build by S3 corp.,
Ireland
Niko NEUFELD
CERN, EP
4
Network Processors in a 1 MHz DAQ
Frame Merging
125-239
Links
1.1 MHz
8.8-16.9 GB/s
77-135 NPs
FE
FE
FE
FE
FE
FE
FE
FE
Multi-stages switching
Network
Event
NPBuilding
77-135 Links
6.4-13.6 GB/s
FE
Front-end Electronics
FE FE FE TRM
Switch
Switch
NP
Decision
Sorting
NP
Readout Network
30 Switches
Multiplexing
Layer
L1-Decision
24 NPs
Sorter
73-140 Links
7.9-15.1 GB/s
NP
NP
Switch
SFC
SFC
SFC
24 Links
1.5 GB/s
TFC
System
37-70 NPs
Switch
349
Links
40 kHz
2.3 GB/s
NP
50-100 Links
5.5-10 GB/s
SFC
Niko NEUFELD
CERN, EP
50-100
SFCs
Event
Builder
NP
Switch
SFC
SFC
5
Frame Merging
Input
Event
Builder
Output
RU/FEM Application
Works up to 4 MHz of
incoming packets  A.
Barczyk’s presentation
Input
Event
Builder
Output
EB Application
Works for at least 2 x
100 MB streams
Niko NEUFELD
CERN, EP
6
Frame Merging
• Helps to optimise link usage
• Reduces number of links into readout
•
•
network
Can do re-formatting of data – e.g.
protocol adaptation (raw Ethernet  IP)
Can change Maximum Transmission Unit
(MTU)
– some Ethernet segments provide payload > 1500
bytes
• Reduce packet rate at output - important
for receiving PCs (interrupt rate!)
Niko NEUFELD
CERN, EP
7
Building your own switching
network from NP modules
• Using NP modules gives you full freedom in
•
•
doing the switching
Large output buffers
Disadvantage:
– Module has only eight ports (otherwise switch chip
is needed)  need a large number of modules to
build a big network
• Solution:
– Use optimised connection topologies to reduce
number of elementary modules, while keeping the
load on interconnecting links acceptable
Niko NEUFELD
CERN, EP
8
Network Topologies
•
•
Banyan Topology
64 x 64 port configuration
“Fully Connected”
Topology
Basic Structure
D
S
D
D
Sources
0
1
2
3
4
5
6
7
8
D
9
10
11
12
13
14
15
D
S
D D D
S
S
S
S
S
63 Sources x 72 Destinations
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Destinations
Niko NEUFELD
CERN, EP
9
Decision Sorting
•
•
•
•
•
In the LHCb trigger decisions are generated as
small Ethernet packets in one of 1400 PCs  1 MHz
of un-ordered decisions in
Processing time limited but unknown  decisions are
taken and sent in arbitrary order
Front-end electronics requires decisions to be
ordered before sent to the trigger distribution
system  1 MHz of ordered decisions out
Limited buffer size entails maximum trigger latency
Each event entering is made known to central entity
(Decision Sorter)  1 MHz of frames
Niko NEUFELD
CERN, EP
10
Decision Sorting
1
Front-end Electronics
FE FE FE FE FE FE TRM
2
3
4
TFC
System
CPU
CPU
CPU
CPU
CPU
CPU
4
CPU
CPU
CPU
Switch Switch
~1400
CPUs
2
SFC
1
SFC
4
Switch Switch Switch
3
SFC
4
SFC
5
3
SFC
90-153
Links
5.5-10
90-153
GB/s
SFCs
2
6
2
Readout Network
1
L1-Decision
CPU
CPU
CPU
CPU
CPU CPU
CPU Farm
Niko NEUFELD
CERN, EP
1
3
Sorter
11
Test Set-up
4 x Tigon2 1000 SX
Measurement
Procedure
•Dual NP connected via
back-plane to form 8-port
module
•Download code into
8 x 1000 SX Full Duplex Ports
Tigon2 NIC
Features
IBM NP4GS3R2.0 Reference Kit
•Up to 620 kHz
PPC 750
fragment rate
•1 s resolution timer
NP4GS3 via RISC Watch
(JTAG) or PCI of PPC
control point
•Generate traffic either
via Gigabit Ethernet NICs
(Tigon) or using one NP to
feed the other
•Can use the internal
RISC Watch =
JTAG via Ethernet
Niko NEUFELD
CERN, EP
timers of the NP and/or
the NICs
12
Network Processors
☺Packet processing for ☹Processing power
several millions of
packets per second
☺Fast and big buffer
memories
☺Hardware assists for
many common tasks,
like check-summing,
re-framing, tree lookups
☺Software
programmable
optimised for header
region of packets
☹Memory model
optimised for the
hardware (no linear
addressing)
☹Programs need to be
written in proprietary
assembly language
Niko NEUFELD
CERN, EP
13
Conclusions
• Network Processors are a powerful tool for
•
•
•
packet processing
They are especially useful, whenever very
high rates of packets need to be coped with
We have found a lot of useful applications, all
could be done with the same standard NP
module – the software defines the
functionality
But what if PCs can do it too…?
Niko NEUFELD
CERN, EP
14
Backup Slides
Niko NEUFELD
CERN, EP
15
Data flow in the NP4GS3
DASL
DASL
Access to frame data
Access to frame data
Ingress Event Building
Egress Event Building
Niko NEUFELD
CERN, EP
16