Transcript inaba

Round the World Data Transfer
End of LSR (Land Speed Record)
in 10Gbps era
Data Reservoir Project
Mary Inaba
University of Tokyo
First of ALL
Big appologize for Kei’s absence
Hero of this year’s LSR achievement
Takeshi in his experiment
What is Data Reservoir?
• Share Scientific Data over long distance
– Physics, astronomy, earth science, biology
• High-speed data transfer on Long Fat pipe Network
• Easy to use
– File system transparent
Data Reservoir System
• Using iSCSI protocol
• Without any modification on
applicatoins
User Programs
File Server
File Server
IP Switch
Disk Server
Disk Server
iSCSI Bulk Transfer
IP Switch
IP Switch
Global Network
Disk Server
Disk Server
Disk Server
Disk Server
Hisotry of Data Reservoir and
SC BandWidth Challente
1st
2nd Generation
SC03
Aggregated 10Gbps
24,000Km
1 and a half round trip
between U.S. Tokyo
Generation
SC02
26 to 26 servers
1GbE interface
RTT 200ms,
90% usage of
bottleneck
OC-12
4th Generation
SC06
A pair of machines
Disk to Disk transfer
Single 7.2Gbps
Dual
8.65 Gbps
32 to 32 Servers
too many :-<
3rd Generation
SC04
SC05
Round the World 31,248km
1 to 1, memory to memory transfer
Single Stream, Longest Path, Standard
MTU TCP Throughput Award
Fastest IPv6
Once upon a time,
There started an ambitious project
to construct an L2 network
between CERN and Tokyo
via Amsterdam, Canada, and U.S.
Fortunately ( ! ),
our team got a chance to try it ♪
Network
CANARIE
Amsterdam
Calgary
Vancouver Minneapolis
IEEAF/Tyco/WIDE
Chicago
Geneva
Seattle
SURFnet
Abilene
CERN
WIDE
APAN/JGN II
Pittsburgh
Tokyo
3rd Generation Data Reservoir
started
Background
• WAN PHY over the world
• Programmable 10GbE NIC
is available
Challenge
How much bandwidth can we use
by single stream?
Struggles while the 1st experiment
Almost no information
– Ping + loopback is the only source
– Different network, different timezone
– TELEPHONE must be the most important
equipment.
Over 7Gbps between Tokyo and CERN
It is nice of this experiment
to have a lot of new friend!
We really appreciate nice adivces.
Submission to
Internet2 Land Speed Record
Experiments while X’mas vacation,
the smallest traffic season!
Some Results
SC04 Band Width Challenge
U.S. – Tokyo – U.S. – CERN 31,248km,
RTT 433ms, 7.57Gbps
Xmas Experiment
Season with smallest network traffic.
Very Very strict dead-line for preparation
Tokyo Chicago Amsterdam Siattle Tokyo
RTT 498ms
7.21Gbps
:
Update LSR 8times.
33,979km,
Network
CANARIE
Amsterdam
Calgary
Vancouver Minneapolis
IEEAF/Tyco/WIDE
Chicago
Geneva
Seattle
SURFnet
Abilene
CERN
WIDE
APAN/JGN II
Pittsburgh
Tokyo
Challenge in 2006
To attain 90% of 10Gbps
The difficulty
WAN PHY (MAX 9.6Gbps) ⇔ LAN PHY
Only 4% of 10Gbps,
But, if RTT = 500,
the difference is 25MBytes for Round Trip
(TCP can control transmission rate with RTT grain)
Another difficulty
PCI-X bottleneck →
Now, cleared
LSR in 2006 -- New players
• Circuit
• GSO
-- NetIron 40G  NetIron RX-4 in Seattle
(Generic Segmentation Offload )
– Offloading CRC calculation
• Chelsio T310 -- PCI-X2.0 support
IPG tuning is available
• Iperf modification with sendfile()
• Hardware Approach for 10Gbit Network
TAPEE: Network Analyzer
2006 LSR Challenge,
again on X’mas
•
•
•
•
Around Dec/10: Seattle line test
Around Dec/20: Round-The-World up
Dec/31: Submission
Jan/8/2007: Round-The-World down
Host
• Xeon 5160 * 1
– Woodcrest core
– Dual core
• DDR400 2GB
• Chelsio T310-SR on PCI-Express x8
– There is no longer bus speed bottleneck
• Linux 2.6.18
Circuit
• Round The World circuit
– 522ms RTT
– Trans Pacific & Trans Atlantic
– WAN PHY & LAN PHY mixed
– Tokyo – [Los Angels] – Chicago – Amsterdam
– Amsterdam – [Chicago] – Seattle – Tokyo
Seattle
Pacific Northwest
Gigapop
T-LEX
ONS
15454
WAN PHY
ONS
15454
LSR 200612-2 Network Topology
Foundry
RX-4
IEEAF
CISCO
7609
Age-1
Intel Xeon
Fujitsu
XG800
NYC MANLAN
Foundry
NI40G
TransLight
LAN PHY
SURFnet
WAN PHY
SURFnet
Age-2
Intel Xeon
HDXc
WAN PHY
WAN PHY
CANARIE
CA* NET 4
HDXc
Atlantic
Foundry
RX-4
Ocean
Force10
WIDE
LAN PHY
HDXc
E1200
SURFnet
Pacific
LAN PHY
GS4000
Ocean
WAN PHY
SURFnet
WAN PHY
WAN PHY
Chicago StarLight
LAN PHY
JGN2
GS4000
L3 switch
L2 switch
JGN2
Force10
E300
WAN PHY
WAN PHY
JGN2
Amsterdam
NetherLight
At SARA
JGN2
Tokyo
Los Angels
L1 switch
Others
WIDE
IEEAF
JGN2
CANARIE
SURFnet
TransLight
LSR distance
From
To
Distance
HND (35°33'08"N 139°46'47"E)
ORD (41°58'43"N 87°54'17"W)
10147 km
ORD (41°58'43"N 87°54'17"W)
AMS (52°18'31"N 04°45'50"E)
6630 km
AMS (52°18'31"N 04°45'50"E)
SEA (47°26'56"N 122°18'34"W)
7864 km
SEA (47°26'56"N 122°18'34"W)
HND (35°33'08"N 139°46'47"E)
7730 km
4 segment path:
32372 km
IPG Tuning
• Chelsio T310 has special function of
setting IPG (Inter Packet Gap)
– Enables to control the Ethernet NIC
transmission rate
– Upto 2048 octet (IEEE standard IPG 12 octet)
• Fine Grain Tuning
For Standard Frame control 50 ~ 100 %,
For 8000B Jumbo Frame 80 ~100%
Without pacing
(IPG 136)
600MB RWIN
Pacing (IPG 800)
600MB RWIN
Pacing (IPG 700)
600MB RWIN
Pacing (IPG 720)
600MB RWIN
Iperf modification
• We have been used Iperf
• Iperf transmission flow
– Allocate several kB buffer
– Initialize buffer with random data
– while() { write(sock, buffer) }
• This invokes copy between user and
kernel space
Iperf modification (cont’d)
• An advice from Chelsio
– “Use netperf’s sendfile mode to confirm receiver performance”
• Modification
– Iperf-zerocopy transmission flow
•
•
•
•
open(temporary file)  file descriptor fd
buffer = mmap(fd)
initialize buffer with random data
while() { sendfile(sock, fd) }
– sendfile(2) sends data from kernel
• After some discussion, we concluded that using this
version of Iperf meets LSR rule
GSO
GSO + zerocopy
New submission
• 7.67Gbps average
– Standard-Iperf
– Peak 8.10Gbps, 20 minutes, No packet loss
• 9.08Gbps average
– Iperf-zerocopy
– Peak 9.11Gbps, 5 hours, No packet loss
History of single-stream IPv4 Land Speed Record
Distance bandwidth product
Pbit m / s
1,000
10 Gbps * 30,000km
2004/12/24
216 Pbit m / s
100
2006/2/20
264 Pbit m / s
2005/11/10
240 Pbit m / s
10
2004/11/9
Data Reservoir project
WIDE project
149 Pbit m / s
1
2000
2001
2002
2003
2004
Year
2005
2006
2007
History of single-stream IPv6 Land Speed Record
2006/12/28
Data Reservoir project
WIDE project
272 Pbit m / s
Distance bandwidth product
Pbit m / s
1,000
10 Gbps * 30,000km
2005/11/13
Data Reservoir project
WIDE project
208 Pbit m / s
100
10
2004/10/29
Data Reservoir project
WIDE project
167 Pbit m / s
1
2000
2001
2002
2003
2004
Year
2005
2006
2007