Transcript Document
Reduced Communication Protocol
for Clusters
Clunix Inc.
Donghyun Kim
2000.9
Introduction
Communication
Sub-system Performance is
decided by followings
• Transmission speed of physical network
• I/O handling capability
• Overheads of the communication protocol
Communication
using traditional protocols
is the bottle-neck of parallel systems
• Myrinet with TCP/IP is not FAST.
• Small-granularity or communication-dense apps
show poor performance
Clunix Inc.
Introduction – cont’d
A high
proportion of apps don’t need very
complicated communication functions
• By practice and theoretic analysis
Clunix Inc.
Overheads analysis of
traditional protocols
Traditional
protocols overheads
• Time of context switching
• Time of data copying
User
space – system space, adjacent protocol layers
• Time of data partitioning, re-constructing, data
analyzing
• Time of transmitting packet headers
• Time of routing, connection maintaining, traffic
controlling, error detecting, recovering, buffer
management
Clunix Inc.
Overheads analysis of
traditional protocols - cont’d
End-to-end latency L, bandwidth W modeling
• Assumptions : homogeneous, low network traffic
L T (0)or T (1)
n max
W
T (nmax )
(1)
m
T (n) T0 (n) 2(τ Ti (n))
(2)
i 1
T(n) : n-bytes transmission time
nmax : comm. subsystem max packet length
m : # of protocol layers
Ti(n) : i-th protocol layer processing time
(T0(n) : physical network transmission time)
Clunix Inc.
Overheads analysis of
traditional protocols - cont’d
n
n i n i 1 ( i 1 1) ρ i 1 i m
(3)
πi
n i 1 n i 1
Ti (n) τ i
Ti 1 (π i ρ i ) Ti 1 (ni 1modπ i ) (4)
ω πi
n
T0 (n)
ω0
(5)
: context switching time
: memory bandwidth
0 : physical network transmission bandwidth
i : max packet length of i-th layer
I : packet header length of i-th layer
ni : data length of i-th layer
i : calling expense (routing,traffic control, error
detecting, buffer management, connection maintaining)
Clunix Inc.
Overheads analysis of
traditional protocols - cont’d
Analytical
& testing results
Protocol
Testing
Analytical
Testing
Layer
L(s)
W(Mbps)
L(s)
W(Mbps)
TCP
1350
8.5
1450
8.6
UDP
1110
9.5
1150
9.5
DLPI
450
10.0
650
10.0
conclusions
• Very large overhead using above IP protocol layer
• Memory-to-memory copying is not neglected
If
transmission bandwidth is the same as memory
bandwidth, data copying(ni+1/) problem is bigger
Clunix Inc.
Design Strategies for RPC
• Support reliable, synchronous, asynchronous
communications
• Implement reliale broadcast and multicast basing
directly on the physical layer
• Lay the protocol below the IP layer
Above
physical or datalink layer
• Avoid data copying AFAP
• If possible, avoid buffer management using
hardware buffering
• Run the protocol entirely in the user space
In
the form of libraries
Clunix Inc.
Implementation of RCP
OSI-DLPI
version
• Standard physical-device independent data link
layer interface
Can
write uniform program on different machines
and network devices
Myrinet
version
Providing
user interface like the TCP-socket
Clunix Inc.
Implementation of RCP – cont’d
RCP supports
unicast, broadcast, multicast
RCP addressing
• Unique source/destination using hostname+port#
• Static address configuration
Supports
No
heterogeneous machines
connection maintaining, error detecting
• Assuming that underlying network is reliable
Clunix Inc.
Implementation of RCP – cont’d
Sequencing
control, traffic control
• Sliding-window algorithm+selective retransmission
• Windows size is adjusted accoring to retransmission
frequency
Fast-Adapt
and Slow-Recover algorithm
• Very efficient traffic control
Data
partitioning and packaging algorithm
• Almost no data-copy, work in user-space
Clunix Inc.
RCP Tesing results
Bandwidth(W)
Lantency(L)
Clunix Inc.
Conclusions and future issues
RCP design
considerations
• How to reduce the overheads
Over-complicated
protocol processing
Context
switching
Overhead of data copying
• How to use the transmission control functions
supported by hardware
To
Future
reduce the protocol processing
Work
• To gurantee the quality of the communication.
Clunix Inc.