Transcript TCP

TDC561
Network Programming
Week 10:
Performance Aspects of End-to-End (Transport) Protocols
and API Programming
Camelia Zlatea, PhD
Email: [email protected]
End-to-End (Transport) Protocols
 Underlying best-effort network
–
–
–
–
–
drops messages
re-orders messages
delivers duplicate copies of a given message
limits messages to some finite size
delivers messages after an arbitrarily long delay
 Common end-to-end services
–
–
–
–
–
–
–
guarantee message delivery
deliver messages in the same order they are sent
deliver at most one copy of each message
support arbitrarily large messages
support synchronization
allow the receiver to apply flow control to the sender
support multiple application processes on each host
Network Programming (TDC561)
Winter 2003
Page 2
Simple Demultiplexor (UDP)




Unreliable and unordered datagram service
Adds multiplexing
No flow control
Endpoints identified by ports
– servers have well-known ports
– see /etc/services on Unix
 Optional checksum
– pseudo header + udp header + data
 Header format
0
16
31
SrcPort
DstPort
Checksum
Length
Data
Network Programming (TDC561)
Winter 2003
Page 3
Simple Demultiplexor (UDP)
Application
process
Application
process
Application
process
Ports
Queues
Packets
demultiplexed
UDP
Packets arrive
Network Programming (TDC561)
Winter 2003
Page 4
Reliable Byte-Stream (TCP)
 Connection-oriented
 Byte-stream
– sending process writes some number of bytes
– TCP breaks into segments and sends via IP
– receiving process reads some number of bytes
 Full duplex
 Flow control: keep sender from overrunning receiver
 Congestion control: keep sender from overrunning
network
Network Programming (TDC561)
Winter 2003
Page 5
Reliable Byte-Stream (TCP)
Application process
Application process
Read
…
…
W rite
bytes
TCP
TCP
Send buffer
Receive buffer
Segment
Segment
…
bytes
Segment
Transmit segments
Network Programming (TDC561)
Winter 2003
Page 6
End-to-End Issues
Based on sliding window protocol used at data link
level, but the situation is very different.
 Potentially connects many different hosts
– need explicit connection establishment and termination
 Potentially different RTT (Round Trip Time)
– need adaptive timeout mechanism
 Potentially long delay in network
– need to be prepared for arrival of very old packets
 Potentially different capacity at destination
– need to accommodate different amounts of buffering
 Potentially different network capacity
– need to be prepared for network congestion
Network Programming (TDC561)
Winter 2003
Page 7
Segment Format
0
 Each connection identified with
4-tuple:
– <SrcPort, SrcIPAddr, DstPort,
DstIPAddr>
10
4
16
31
SrcPort
DstPort
SequenceNum
Acknow ledgment
HdrLen
 Sliding window + flow control
0
Flags
AdvertisedWindow
Checksum
– Acknowledgment, SequenceNum,
AdvertisedWindow
UrgPtr
Options (variable)
Data
Data
(SequenceNum)
 Flags: SYN,
FIN, RESET, PUSH,
URG, ACK
 Checksum: pseudo header +
tcp header + data
Network Programming (TDC561)
Receiver
Sender
Acknowledgment +
AdvertisedWindow
Winter 2003
Page 8
TCP Sliding Window Revisited
 Relationship between TCP send buffer (a) and receive
buffer (b)
Sending application
Receiving application
TCP
TCP
LastByteWritten
LastByteAcked
LastByteSent
LastByteRead
NextByteExpected
(a)
LastByteRcvd
(b)
 Each byte has a sequence number
 ACKs are cumulative
Network Programming (TDC561)
Winter 2003
Page 9
TCP Sliding Window Revisited
 Sending side
– LastByteAcked <= LastByteSent
– LastByteSent <= LastByteWritten
– bytes between LastByteAcked and LastByteWritten must be
buffered
 Receiving side
– LastByteRead < NextByteExpected
– NextByteExpected <= LastByteRcvd + 1
– bytes between NextByteRead and LastByteRcvd must be buffered
Network Programming (TDC561)
Winter 2003
Page 10
Flow Control
 Sender buffer size: MaxSendBuffer
 Receive buffer size: MaxRcvBuffer
 Receiving side
– LastByteRcvd - NextByteRead Š MaxRcvBuffer
– AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd NextByteRead)
 Sending side
– NextByteExpected Š LastByteRcvd + 1
– LastByteSent - LastByteAcked Š AdvertisedWindow
– EffectiveWindow = AdvertisedWindow - (LastByteSent LastByteAcked)
– LastByteWritten - LastByteAcked Š MaxSendBuffer
– block sender if (LastByteWritten - LastByteAcked) + y >
MaxSendBuffer
 Always send ACK in response to an arriving data segment
 Persist when AdvertisedWindow=0
Network Programming (TDC561)
Winter 2003
Page 11
Network as a Pipe
Delay/Latency
Bandwidth
Network Programming (TDC561)
Winter 2003
Page 12
Keeping the Pipe Full
 TCP Correctness and Performance Aspect
– Size of the SequenceNum and AdvertizedWindow affects the
correctness and performance of TCP
– Wrap Around: 32-bit SequenceNum
– Bandwidth & Time Until Wrap Around
Bandwidth
T1 (1.5Mbps)
Ethernet (10Mbps)
T3 (45Mbps)
FDDI (100Mbps)
STS-3 (155Mbps)
STS-12 (622Mbps)
STS-24 (1.2Gbps)
Network Programming (TDC561)
Time Until Wrap Around
6.4 hours
57 minutes
13 minutes
6 minutes
4 minutes
55 seconds
28 seconds
Winter 2003
Page 13
Keeping the Pipe Full
 TCP Correctness and Performance Aspect
– Bytes in Transit: 16-bit AdvertisedWindow ( up to 64KBytes)
– AdvertisedWindow large enough to allow sender to keep the pipe full (Delay x
Bandwidth) Product
– Bandwidth and (Delay x Bandwidth) Product dictates how big AdvertisedWindow
needs to be
– Required window size for 100ms RTT
Bandwidth
Delay x Bandwidth Product
T1 (1.5Mbps)
Ethernet (10Mbps)
T3 (45Mbps)
FDDI (100Mbps)
STS-3 (155Mbps)
STS-12 (622Mbps)
STS-24 (1.2Gbps)
18KB
122KB
549KB
1.2MB
1.8MB
7.4MB
14.8MB
 TCP protocol extensions
Network Programming (TDC561)
Winter 2003
Page 14
Application Programming Interface
 Separate the implementation of protocols from the interface they
export.
 Important at the transport layer since this defines the point where
application programs typically access the network.
 This interface is often called the application programming interface, or
API.
Application
API
Transport protocol
Notes
 The API is usually defined by the OS.
 Example API: sockets
 Defined by BSD Unix, but ported to other systems
Network Programming (TDC561)
Winter 2003
Page 15
Socket Operations
 Creating a socket
– int socket(int domain, int type, int protocol)
– domain=AF_INET, PF_UNIX
– type=SOCK_STREAM, SOCK_DGRAM
 Passive open on server
int bind(int socket, struct sockaddr *address,
int addr_len)
int listen(int socket, int backlog)
int accept(int socket, struct sockaddr *address,
int *addr_len)
 Active open on client
int connect(int socket, struct sockaddr *address,
int addr_len)
 Sending and receiving messages
int write(int socket, char *message,
int msg_len, int flags)
int read(int socket, char *buffer, int buf_len,
int flags)
Network Programming (TDC561)
Winter 2003
Page 16
Performance
Network Programming (TDC561)
Winter 2003
Page 17
Performance Overview
 Bandwidth
– a measure of of the width of a frequency band
» voice grade telephone line supports a frequency band ranging from 300MHz to
3300MHz ; it is said to have”a bandwidth” of 3000MHZ; when given in Hz, it
probably refers to the range of signals that can be accommodated.
– Bandwidth of a communication link = number of bits per second that
can be transmitted on that link.
» ETH bandwidth is 10Mbps
» available bandwidth
» Measured bandwidth = number of bits per second that can be actually
transmitted on that link, in practice
» Throughput = Measured Performance of a system
• A pair of nodes connected by a link with a bandwidth of 10Mbs might
achieve a throughput of 2Mbps; an application on one host could sedn data
to the other host at 2Mbps.
– Bandwidth requirements for an application
» Number of bits per second that it needs to transmit over the network to perform
acceptably
Network Programming (TDC561)
Winter 2003
Page 18
Latency (Response Time, delay) vs. RTT
Latency = Propagation + Transmit + Queue
10,000
5000
Perceived latency (ms)
2000
1000
500
1-MB object, 1.5-Mbps link
1-MB object, 10-Mbps link
2-KB object, 1.5-Mbps link
2-KB object, 10-Mbps link
1-byte object, 1.5-Mbps link
1-byte object, 10-Mbps link
200
100
50
20
10
5
2
1
10
RTT (ms)
Network Programming (TDC561)
100
Winter 2003
Page 19
Performance Overview
Latency and Bandwidth
1-Mbps crosscountry link
Source
Destination
.1 Mb
.1 Mb
…
.1 Mb
.1 Mb
(a) 84 pipes full of data = 8.4Mb file
1-Gbps crosscountry link
Source
Destination
8.4
Mb
(b) 1/12 of one pipe full of data = 8.4Mb file
Network Programming (TDC561)
Winter 2003
Page 20
Performance Overview
 1Mbps and 1Gbps links have the same latency
– limited by the speed of light
 To transfer a 1MB file takes...
– 100ms RTTs on a 1Mbps link
– doesn't fill a 1Gbps link (12.5MB delay x bandwidth)
 In other words:
– 1MB file is to 1Gbps network what 1KB packet is to 1Mbps
network
Delay/Latency
Bandwidth
Network Programming (TDC561)
Winter 2003
Page 21
Latency/Bandwidth Tradeoff
 Throughput = TransferSize / TransferTime
– if it takes 10ms to transfer 1MB, then the effective
throughput is 1MB/10ms = 100MBps = 800Mbps
 TransferTime = Latency + 1/Bandwidth x TransferSize
– if network bandwidth is 1Gbps (it takes 1/1Gbps x
1MB = 0.8ms to transmit 1MB), an end-to-end transfer
that requires 1 RTT of 100ms has a total transfer time
of 100.8ms
– effective throughput is 1MB/100.8ms = 79.4Mbps, not
1Gbps
Network Programming (TDC561)
Winter 2003
Page 22
Round-Trip Latency (ms)
Message size (bytes)
1
100
200
300
400
500
600
700
800
900
1000
UDP
297
413
572
732
898
1067
1226
1386
1551
1719
1878
TCP
365
519
691
853
1016
1185
1354
1514
1676
1845
2015
App1
App2
TCP
UDP
IP
ETH
PHY
Per-Layer Latency (1 byte latency)
 ETH + wire: 216ms
 UDP/IP: 58ms
Network Programming (TDC561)
Winter 2003
Page 23
Throughput (UDP/IP/ETH)
Throughput improves as the message get larger, up to a limit when per-message
overhead becomes insignificant = message overhead/number of bytes = ~16KB
Flattens at ~9.5Mbps < ETH 10Mbs
10
Throughput (Mbps)
9.8
9.6
9.4
9.2
9
8.8
8.6
8.4
8.2
8
1
Network Programming (TDC561)
2
4
8
Message size (KB)
16
32
Winter 2003
Page 24
 Notes
– transferring a large amount of data helps improve the effective
throughput; in the limit, an infinitely large transfer size causes the
effective throughput to approach the network bandwidth
– having to endure more than one RTT will hurt the effective
throughput for any transfer of finite size, and will be most
noticeable for small transfers
Network Programming (TDC561)
Winter 2003
Page 25
Implications
 Congestion control
–
–
–
–
–
feedback based mechanisms require an RTT to adjust
can send 10MB in one 100ms RTT on a 1Gbps network
that 10MB might congest a router and lead to massive losses
can lose half a delay x bandwidth's of data during slow start
reservations work for continuous streams (e.g., video), but
require an extra RTT for bulk transfers
 Retransmissions
–
–
–
–
retransmitting a packet costs 1 RTT
dropping even one packet (cell) halves effective bandwidth
retransmission also implies buffering at the sender
possible solution: forward error correction (FEC)
 Trading bandwidth for latency
– each RTT is precious
– willing to “waste” bandwidth to save latency
– example: pre-fetching
Network Programming (TDC561)
Winter 2003
Page 26
Host Memory Bottleneck
 Issue
– turning host-to-host bandwidth into applicationto-application bandwidth
– have to deliver data across I/O and memory
buses into cache and registers
Network Programming (TDC561)
Winter 2003
Page 27
 Memory bandwidth
– I/O bus must keep up with network speed
(currently does for STS-12, assuming peak rate is
achievable)
– 114MBps (measured number) is only slightly faster
than I/O bus; can't afford to go across memory bus
twice
– caches are of questionable value (rather small)
– lots of reason to access buffers
» user/kernel boundary
» certain protocols (reassembly, check-summing)
» network device and its driver
 Same latency/bandwidth problems as highspeed networks
Network Programming (TDC561)
Winter 2003
Page 28
Integrated Services
 High-speed networks have enabled new applications
– they also need “deliver on time” assurances from the network
 Applications that are sensitive to the timeliness of data
are called real-time applications
– voice
– video
– industrial control
 Timeliness guarantees must come from inside the
network
– end-hosts cannot correct late packets like they can correct for
lost packets
 Need more than best-effort
– IETF is standardizing extensions to best-effort model
Network Programming (TDC561)
Winter 2003
Page 29