Network Performance for ATLAS Real

Download Report

Transcript Network Performance for ATLAS Real

Network Performance for ATLAS Real-Time Remote Computing Farm Study
Alberta, CERN Cracow, Manchester, NBI
250
200
150
100
2000
3000
Data Collection
Network
L2PU
L2PU
L2PU
L2PU
SFI
SFI
PF
Event
Builders
Back End
Network
PF
SFOs
Local Event
Processing Farms
Experimental Area
CERN B513
Mass storage
Switch
0
5000
PF
200
400
600
800
1000
time
1200
1400
1600
0
2000
1800
TCP Congestion window in Red
This is reset by TCP on each Request due to
lack of data sent by the application over the network.
TCP obeys RFC 2518 & RFC 2861
num Packets
800000
500
600000
300
400000
200000
150000
150000
100000
100000
50000
50000
0
0
200
200
400
600
200000
100
0
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0
20000
800
1000
1200
time ms
1400
1600
0
2000
1800
Observation of TCP with no Congestion window reduction
time ms
800
700
600
500
400
300
200
100
0
200000
1000000
TCP Congestion window in Red grows nicely
Request-response takes 2 rtt after 1.5 s
Rate ~ 10 events/s with 50 ms processing time
800000
600000
400000
700
200000
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
PktsOut (Delta
PktsIn (Delta
CurCwnd (Value
800
0
20000
time ms
1200000
1000000
600
800000
500
400
600000
300
400000
200
200000
100
0
0
2 RoundTrips
Request event
Send
event data
Request-Response
time (Histogram)
Process event
Request Buffer
Send OK
Send
processed
event
●●
●
Time
 Event Request:
 EFD requests an event from SFI
 SFI replies with the event data
Processing of the event occurs
CERN-Kracow TCP Activity
64 byte Request 1 Mbyte Response
Steady state request-response latency
~140 ms
Rate ~ 7.2 events/s
First event takes 600 ms
due to TCP slow start
Web100 parameters on the server located at CERN (data source)
Return of Computation:
EF asks SFO for buffer space
SFO send OK
EF transfers the results of the computation
Green – small requests
Blue – big responses
TCP ACK packets also counted (in each direction)
Onepartners
response = 1 MB ~ 380 packets
Principal
TCPAchive Mbit/s
SFI and SFO
500
1000
1500
time ms
2000
2500
0
3000
Transfer achievable throughput grows to 800 Mbit/s
Data Transferred when the Application requires
the data
The ATLAS Application Protocol
Event Filter EFD
CurCwnd
1000000
400
250000
TCP/IP behaviour of the ATLAS Request- Response
Application Protocol observed with Web100
Data Bytes Out
PktsOut (Delta
PktsIn (Delta
CurCwnd (Value
DataBytesOu t (Delta
DataBytesIn (Delta
Cu rCwn d (Valu e
250000
Transfer achievable throughput grows from 250 to 800 Mbit/s
GÉANT
Level 2 Trigger
0
TCP Congestion window in Red grows gradually
after slowstart
Request-response takes 2 rtt after ~2.5 s
Rate ~ 2.2 events/s with 50 ms processing time
PF
lightpaths
SFI
PF
100
50000
Cwnd
PF
150
900
800
700
600
500
400
300
200
100
0
1200000
1000000
800000
600000
400000
200000
0
3 Round Trips
1000
2000
3000
4000
time ms
5000
6000
2 Round Trips
7000
0
8000
Cwnd
ROB
100000
50
num Packets
ROB
4000
250
200
0
1000
300
150000
50
Cwnd
ROB
200000
time
TCPAchive Mbit/s
ROB
350
Data Bytes Out
300
600
ATLAS Detectors – Level 1 Trigger
400
350
0
DataBytesOu t (Delta
400
DataBytesIn (Delta
250000
Data Bytes In
DataBytesOu t (Delta
DataBytesIn (Delta
700
Remote Event Processing
Copenhagen
Farms
Edmonton
Krakow
Manchester
 64 Byte Request in Green
 1 Mbyte reponse in Blue
 TCP in Slow Start takes 19 round trips or ~ 380 ms
 64 Byte Request in Green
 1 Mbyte reponse in Blue
 TCP in Slow Start takes 12 round trips or ~ 1.67 s
1000000
900000
800000
700000
600000
500000
400000
300000
200000
100000
0
Observation of the Status of Standard TCP with web100
Cwnd
Remote Computing Concepts
Observation of TCP with no Congestion window reduction
with web100
Data Bytes In
To test the feasibility of using remote farms for real-time processing, a collaboration
was set up between members of ATLAS Trigger/DAQ community, with support from
several national research and education network operators (DARENET, Canarie,
Netera, PSNC, UKERNA and Dante) to demonstrate a Proof of Concept and measure
end-to-end network performance. The testbed was centred at CERN and used three
different types of wide area high-speed network infrastructures to link the remote
sites:
• an end-to-end lightpath (SONET circuit) to the University of Alberta in Canada
• standard Internet connectivity to the University of Manchester in the UK and the
Niels Bohr Institute in Denmark
• a Virtual Private Network (VPN) composed out of an MPLS tunnel over the GEANT
and an Ethernet VPN over the PIONIER networks to IFJ PAN Krakow in Poland.
CERN-Manchester TCP Activity
CERN-Alberta TCP Activity
Data Bytes Out
MOTIVATION
Several experiments, including ATLAS at the Large Hadron Collider (LHC) and D0 at
Fermi Lab, have expressed interest in using remote computing farms for processing
and analysing, in real time, the information from particle collision events. Different
architectures have been suggested from pseudo-real-time file transfer and
subsequent remote processing, to the real-time requesting of individual events as
described here.