GNET-1の概要

Download Report

Transcript GNET-1の概要

Network measurement, emulation,
protocol benchmarking and scheduling
Tomohiro Kudoh
Grid Technology Research Center
National Institute of Advanced Industrial
Science and Technology (AIST)
Outline
Network measurement, emulation, protocol
benchmarking
GtrcNET
PSPacer
Network scheduling
G-lambda project
2
GtrcNET
Measurement
Microscopic behavior observation
Burstiness (i.e. burst transfer with in RTT period) can
not be observed by software (such as iperf)
Emulation
Reproducible environment
Emulated WAN environment is preferable than “real
network” in for reproducible experiment.
Software delay emulation is not stable
Clock accurate delay emulation can be achieved by
hardware
Hardware Network Test-bed
3
GtrcNET-1
GNET-1
GtrcNET-1 Control
SNMP Agent
4
Block Diagram of GtrcNET-1
- Clock-accurate Behavior
- Network emulation, Traffic measurement, …
5
GtrcNET usage
Measuerment
Sub-ms BW measurement
μs accurate one-way delay
measurement using GPS
Emulation
GNET-1
GNET-1
WAN delay and error(Gbps, 300ms
RTT)
Stable environment for software
development
Interne
t
GNET-1
Protocol benchmarking
Emulation, measurement and
bandwidth control for protocol
benchmarking
GNET-1
GNET-1
6
Pure Grid
Pure Grid is a network emulation environment
High Controllability : various precise parameters
High Performance : 10GbE wire-rate operation
High Resolution : less than 1ms measurement interval
PC Cluster 1 Emulate bottleneck network
GbE
…
Measure precise network behavior PC Cluster 2
One way latency (0 - 800ms)
Bandwidth (Interval 100usec - 32sec)
Bandwidth (1Mbps -10Gbps)
Frame capture
Buffer size (0 - 1GBytes)
Stream-wise bandwidth
Frame loss (5.0x10-10 step)
GPS synchronized time
GbE
…
Buffer control (Tail-drop, random, RED)
10GbE
GtrcNET
10GbE
New protocol prototyping
Smooth traffic shaping (1Mbps – 10Gbps)
Multi-path transfer for dependable communication
7
Example Usage of GtrcNET-1 (1)
GridMPI has been evaluated on Pure Grid
Latency: 0, 4ms, 20ms, 200ms
8nodes
8nodes
GbE
GbE
GbE
GbE
1.2
Relative Performance
PerformanceNASofParallel
GridMPI
is evaluated
Benchmarks (NPB2.3)
1
on various network parameters.
0.8
http://www.gridmpi.org/
0.6
0.4
0.2
0
GridMPI (20ms)
GridMPI (4ms)
GridMPI (0ms)
BT
CG
FT
IS
LU
MG
GridMPI (200ms)
SP
8
Example Usage of GtrcNET-1 (2)
Measuring fine-grain bandwidth
Latency: 100ms
Measure bandwidth every 1ms
GbE
GbE
GbE
1 stream with 16MB socket buffer
1ms average bandwidth is
bursty.
16M-WADIFQ
Bandwidth (Mbps)
1000
Real-time measurement
200ms average bandwidth is about
500Mbps because socket buffer size
is small. It looks like stable.
800
100ms ave.
600
400
bandwidth
200
ave(200ms)
1ms ave.
0
0
1
2
3
time (sec)
4
5
9
Example Usage of GtrcNET-1 (3)
Measuring per-stream bandwidth
Latency: 100ms
Bandwidth: 500Mbps
PC-1
PC-2
GbE
GbE
PC-3
GbE
GbE
PC-4
Bandwidth of each stream is controlled by
PSPacer realizes precise
pacing by software
PSPacer to 256Mbps.
Bandwidth of each stream is controlled by
socket buffer size(8MB,250Mbps).
1200
600
http://www.gridmpi.org/pspacer-1.0/
Bandwidth (Mbps)
1000
800
PC2→PC4
600
400
500
Bandwidth (Mbps)
Txall
TxCH1
PC1→PC3
TxCH3
Ave(1sec)
400
300
200
100
200
0
0
Each stream exceeds bottleneck link
capacity,
and
2
4 frames
6 are dropped.
8
10
time (second)
0
0
Txall
TxCH1
TxCH3
Ave(1sec)
Each stream is with in the specified
2
6 stable.
8
rate, and
they4 are very
time (second)
10
10
GtrcNET-10
Two types of GtrcNET-10 have been
developed
GtrcNET-10p2
10GbE (MSA300) x 2 ports with 1GBytes / ports
GtrcNET-10p3
10GbE (XENPAK) x 3 ports with 1GBytes / ports
Shown on the desk
11
Architecture of GtrcNET-10p3
1GBytes
64bit x 333MHz
(XC2VP100)
SO-DIMM
DDR333
SO-DIMM
DDR333
SO-DIMM
DDR333
System
ACE/CF
4bit x 3.125GHz
FPGA
10GbE
MAC
10GbE
10GbE
MAC
10GbE
10GbE
MAC
10GbE
XENPAK.
XENPAK
XENPAK
MICTOR
USB2.0
GPS
12
Current implemented functions of GtrcNET-10
Delay Emulation (-800ms)
Precise Bandwidth Measurement (1ms
interval)
Port Replication
Output Rate Control with Pacing
(64Kbps10Gbps)
Random Frame Loss (min-rate 4.7E-10)
Buffer Size Control (1KB-1GB)
All the functions currently implemented on
GtrcNET-1 will be implemented soon
13
Real Network Measurement using GtrcNET-10
GbEx8
GbEx8
10GbE(JGNII)
SW
Tsukuba (8PC)
SW
60Km (RTT 1.36ms)
GtrcNET-10
Measure bandwidth in every 100ms
Akihabara (8PC)
NPB class B (JGNII)
Bandwidth (Gbps)
5
4
3
2
1
0
0
100
200
300
time (sec)
400
500
BT Tx
CG Tx
EP Tx
FT Tx
IS Tx
LU Tx
MG Tx
SP Tx
14
Network Emulation using GtrcNET-10
10GbE
SW
SW
GtrcNET-10
Emulate a network with 1.36ms RTT
Measure bandwidth in every 100ms
8PC (GbE)
8PC (GbE)
NPB ClassB (emulation)
Bandwidth (Gbps)
5
4
BT Tx
CG Tx
EP Tx
FT Tx
IS Tx
LU Tx
MG Tx
SP Tx
3
2
1
0
0
100
200
300
time (sec)
400
500
15
PSPacer
Quite accurate software pacing mechanism
Works on Linux
A classful queuing discipline for tc
Effective for long fat pipe TCP transfer
Can be used for per-flow traffic engineering
16
Pacing
Pacing has been proposed to avoid burstiness of
TCP traffic over long fat network
The sender adjusts the Inter Packet Gap (IPG)
properly to smooth the traffic bandwidth
Bursty traffic occurs
during an RTT
without Pacing:
RTT
Packets are spread out
during an RTT
IPG
With Pacing:
RTT
17
Pacing (cont.)
Hardware approach
Use special hardware to realize precise pacing
We have evaluated the effects of pacing using a
hardware (GtrcNET) in SC2003 BW challenge.
18
Bandwidth Challenge with pacing
Tx total
2500
2000
with Pacing
1500
The total bandwidth is quite stable,
and the bandwidth utilization is high
1000
500
0
375
3000
375.5
376
376.5
377
Time (sec)
Bandwidth (Mbps)
Bandwidth (Mbps)
3000
Tx total
2500
377.5
378
2000
1500
1000
500
0
245
More than 95% efficiency
245.5
246
246.5
Time (sec)
247
247.5
19
248
Pacing (cont.)
Software approach
Software-base pacing mechanisms use a software
timer to adjust the IPG
This approach has some problems: coarse resolution
(1-10ms), fluctuation, increase in system load
Need a precise software pacing mechanism
20
Gap Packet: Virtual Inter Packet Gap
A dummy packet (gap packet) is inserted between
real packets to control the IPG
A
:
B
Bursty traffic occurs
during an RTT
Without Pacing:
RTT
We insert a gap packet
between real packets
A : B
With Pacing:
Gap packets
real packet
gap packet
21
Gap Packet (cont.)
A gap packet should be an actual packet which is
transmitted from a network interface
A gap packet should not propagate beyond switches
or routers
Sender
A sender transmits
real packets and
gap packets
Switch
Gap packets are
discarded in an input
port
IPG
The interval between
real packets is
preserved
real packet
gap packet
22
Gap Packet Format
We employ a PAUSE packet as a gap packet
Gap packet size is set to the required gap size in
byte unit (i.e. 8ns)
Pause time = 0
MAC
Header
MAC
control
(88 08)
MAC
opcode
(00 01)
pause
time=0
padding (variable)
Gap packet size (= IPG)
IEEE 802.3x flow control
If a host receives a PAUSE packet, the host suspend to
transmit packets until the pause time is expired
23
Effects of Gap Packets
Bandwidth while varying the IPG (Packet size = 1500B)
We can transmit real packets to meet the target rate accurately
Bandwidth (MB/s)
120
Theoretical Bandwidth
Theoretical Bandwidth
Actual Bandwidth
100
80
60
40
20
0
0
2000
4000
6000
8000
Inter Packet Gap (Byte)
10000
12000
24
Evaluation using Emulated WAN
Two streams share single GigE bottleneck link
Scalable TCP
Execute iperf, with pacing and without pacing
alternatively.
Emulated WAN
RTT: 200ms
BW: 125 MB/s
iperf -c
iperf
GtrcNET-1
Catalyst 3750
-s
Catalyst 3750
Microscopic BW measurement
25
w/o
with w/o
with w/o
with w/o
with
w/o
125MB/s
pacing pacing pacing pacing pacing pacing pacing pacing pacing
50 s
Stream 1
Stream 2
Total
26
Network scheduling and G- lambda project
Co-scheduling of computing and network resources
Advance reservation
Network service interface (i.e. interface to reserve
network resources) is being defined in the G-lambda
project
27
Grid
Grid provides a single system image to users by
virtualization of service infrastructure such as
computing, data and network resources from multiple
domains.
Users do not care about actual resources they are
using. Grid middleware (such as planner, broker and
scheduler) coordinates resources and provides virtual
infrastructure.
Software
catalogs
user
Grid
Middleware
virtualizes
resources
Computers
Sensor
nets
Data archives
28
Network service for Grid
To realize such virtual infrastructure for Grid,
resource management is one of the key issues.
Grid middleware should allocate appropriate
resources, including network resources, according to
user’s request.
Network resource manager should provide resource
management service to Grid middleware.
Network Service
A standard open interface between Grid middleware
and network resource manager is required, but not
yet established.
29
Requirements for the network service interface
Web Service
Grid is being built based on Web Services
technology
Network service should be provided as a “Web
Service”.
SLA support
Bandwidth, latency etc.
Advance reservation
Reserve bandwidth
30
G- lambda project overview
The goal of this project is to establish a standard web
services interface (GNS-WSI) between Grid resource
manager and network resource manager provided by
network operators.
G-lambda project has been started in December 2004.
Joint project of KDDI R&D labs., NTT, and AIST.
We have defined a preliminary interface, and in
cooperation with NICT, conducted a experiment using
a JGN II GMPLS-based network test bed
Live Demonstration at iGrid2005 and SC|05
31
System overview
Grid Application
Request
Grid Portal
5
WSRF
1
Grid Resource Scheduler (GRS)
Computing Resource Managers
10
1Gbps
GNS-WSI
Network Resource
Management System (NRM)
Network Control I/F
GMPLS Network
Cluster Computers
32
Grid Resource Scheduler (GRS)
A Grid scheduler developed by AIST
Implemented using GT4 (Globus Toolkit 4)
According to users’ request, reserves
computing and network resources (lambda
paths) in advance
Accepts requests which specify required # of
clusters, # of CPUs at each clusters, and the
bandwidth between clusters.
GRS selects appropriate clusters by
interworking between the NRM and multiple
CRMs (Computing Resource Manager)
33
Network Resource Management System (NRM)
Current implementation was developed by
KDDI R&D Labs.
Response to the requests from GRS through
GNS-WSI
Hide detailed path implementation. Provide a
path between end points. (Path virtualization)
Schedule and manage lambda paths. When the
reserved time arrives, activate paths using
GMPLS protocol.
34
GNS-WSI (Grid Network Service /
Web Services Interface)
Web services interface between GRS and NRM
KDDI R&D Labs, NTT and AIST are working together to
define the specification of the interface.
Standardization
Preliminary interface has been defined
Polling-based operations
Advance reservation of a path between end points
Modification of reservation (i.e. reservation time or duration)
Query of reservation status
Cancellation of reservation
35
Overview of Demonstration
5
1
GUI
① User requests service
via GUI, specifying the required
number of computers and the
network bandwidth needed
10
1Gbps
WSRF
Grid Resource Scheduler (GRS)
GT4
GNS-WSI
Web Services I/F
Computing Resource Manager
(CRM)
GMPLS
Network Resource
Management System (NRM)
JGN II
Kanazawa
JGN II
Fukuoka
KDDI Labs.
Kamifukuoka
AIST
Tsukuba
GMPLS Router
Optical Cross-Connect
Cluster
Gigabit Ether
( X n streams)
JGN ⅡOsaka
Research Center
JGN Ⅱ
②The computing resources
and GMPLS network resources
are reserved as the result of
interworking between
the GRS and NRM using GNS-WSI
(Grid Network Service /
Web Services Interface)
③ A molecular dynamics
simulation is executed using the
reserved computers and lambda
paths. Ninf-G2 and Globus
Toolkit 2 (GT2) are used at each
cluster.
AIST
Akihabara
36
KAN
TKB
180Miles
KMF
410Miles
40Miles
150Miles
AKB
250Miles
FUK
OSA
37
Demo Environment
JGNⅡ
Fukuoka
JGNⅡ
Kanazawa
4
KAN
FUK
KDDI
Kamifukuoka
16
4
KMF
TKB
3
2
2
2
32
3
4
2
2
OSA
AIST
Tsukuba
AKB
2
16
12
JGN Ⅱ
Osaka
GMPLS Router
Optical Cross-Connect
2
JGN Ⅱ
n
2
AIST
Akiba
Cluster with n processors
Lambda path (GbE)
Clusters distributed over six locations in Japan are connected over
GMPLS network test-bed deployed by JGN II
38
Overview of the Demo Application
A molecular dynamics simulation implemented with a
Grid Middleware called Ninf-G2, that is developed by
AIST, Japan
Ninf-G2 conforms the GridRPC API, a Global
Grid Forum standard programming API for Grid
Uses Globus Toolkit 2 for job invocation and
communication
39
Demonstration replay
40
Thank you
GtrcNET:
Yuetsu Kodama
PSPacer:
Ryousei Takano, Yuetsu Kodama, Motohiko Matsuda, Yutaka
Ishikawa
G-lambda:
Hidemoto Nakada, Atsuko Takefusa, Yoshio Tanaka, Fumihiro
Okazaki, Satoshi Sekiguchi (AIST)
Masatoshi Suzuki, Hideaki Tanaka, Tomohiro Otani, Munefumi
Tsurusawa, Michiaki Hayashi, Takahiro Miyamoto (KDDI R&D lab.)
Akira Hirano, Yasunori Sameshima, Wataru Imajuku, Takuya
Ohara, Yukio Tsukishima , Atsushi Taniguchi, Masahiko Jinno,
Yoshihiro Takigawa (NTT)
Shuichi Okamoto, Shinji Shimojo (NICT)
For more information:
GtrcNET: http://gtrc.aist.go.jp/gnet/
PSPacer: http://www.gridmpi.org/
G-lambda: http://www.g-lambda.net/
41