TCP - Rudra Dutta

Download Report

Transcript TCP - Rudra Dutta

TCP - Transport in the Internet
Rudra Dutta
ECE/CSC 570 - Fall 2010, Section 001, 601
Transport Layer

First end-to-end layer
 Functions
–
Endpoint abstraction

–
–
Multiplexing on a host
Context establishment
Enhancements



Reliability (ARQ)
Flow Control (ARQ)
Other services?
Copyright Rudra Dutta, NCSU, Fall 2010
2
Communication Endpoints
Copyright Rudra Dutta, NCSU, Fall 2010
3
End-to-End Transport

Peer layer is only at the remote host
–
–
Point-to-point cooperation, like DLC
Hence DLC mechanisms like ARQ can be applied

For flow control
 For reliability (error control - retransmission)
–
However, the challenge is more complex

Multiple applications at each endpoint
 Network may lose, reorder, or duplicate packets
–
Need:

Context
 Explicit addressing, both for destination host and endpoint
Copyright Rudra Dutta, NCSU, Fall 2010
4
Endpoint Access

Transport software built in two parts
–
–
Host specific part - multiplexes network layer, global context
Application specific part - maintains flow state and provides
network

Also provides access point for higher layers (sockets in TCP)
Copyright Rudra Dutta, NCSU, Fall 2010
5
Transmission Control Protocol

Transport layer of the Internet
 Complement the weakness of IP
 End-to-end goals
–
–
Ordered delivery
Guaranteed delivery


Drawback - time-related metrics may suffer as a result
Refinements
–
–
Flow control
Congestion control

Example of how new functionality is introduced where it
is easiest (not necessarily where it is most logical)
Copyright Rudra Dutta, NCSU, Fall 2010
6
TCP Overview



Transmission Control Protocol (RFCs 793, 1122,
1323)
Segment: unit of transfer, usually contained in a
single IP datagram
Reliability achieved using
–
–
–

Sliding window mechanism for
–
–
–

Acknowledgments (therefore, seq. no.s)
Timeouts, retransmissions
Checksums on header and data
efficient transmission
flow control
congestion control
Important: several flavors (largely interoperable)
Copyright Rudra Dutta, NCSU, Fall 2010
7
Properties of TCP Reliable Service

Point-to-point
 Stream orientation: TCP thinks of data as a stream of
bytes
 Unstructured stream: TCP does not honor structured
data
 Connection orientation: state maintained at both ends
 Buffered transfer: the software divides data stream into
segments independent of application program transfers
 Full duplex connection: concurrent data transfer in both
directions
Copyright Rudra Dutta, NCSU, Fall 2010
8
TCP Connections and Endpoints


TCP uses destination port to identify ultimate
destination
TCP uses connection as fundamental abstraction
–
–

TCP ports
–
–

connection: a pair of endpoints
endpoint: (host_IP_address, port #) (socket)
a port # does not correspond to a single object
a TCP port number can be shared by multiple connections
on the same machine (remote endpoints differ)
Sockets
–
–
A “pipe” primitive whose endpoints correspond to
communication endpoints
Application programmer’s interface to TCP functionality

Source code: “Open a TCP socket to communicate to
152.1.226.10, port 80”
Copyright Rudra Dutta, NCSU, Fall 2010
9
TCP Header Format
Copyright Rudra Dutta, NCSU, Fall 2010
10
TCP Header Fields

Sequence number
–
every byte in data stream is numbered
– sequence number = number of the first data byte in the segment in the
sender's byte stream
– wraps around after 232 -1

ACK number
–
next sequence number the sender of the acknowledgment expects to
receive
– i.e., sequence number + 1 of last successfully received consecutive
data byte
– valid only when ACK flag = 1

Header length
–
length of header (in bytes) / 4
– With maximum value of 15 (24-1), header cannot exceed 60 bytes
Copyright Rudra Dutta, NCSU, Fall 2010
11
TCP Header Fields

Window size
–
number of bytes (starting with the one specified in the ACK field) that
receiver is willing to accept
– 16 bits long; max value is 65535
– used for flow control

Checksum
–
Ensures correctness of header and payload
Copyright Rudra Dutta, NCSU, Fall 2010
12
Connection Establishment

Naive approach:
–
–
–
Connection Request message
Connection Accepted message
Sequence numbers always start at 0

Problem: delayed duplicates
 TCP approach
–
–
–
Use long sequence numbers
Choose a different Initial Sequence Number for each
connection, ignore duplicate requests
Estimate the time a duplicate packet may “live” in the
network and avoid when booting up
Copyright Rudra Dutta, NCSU, Fall 2010
13
Three-Way Handshake
Copyright Rudra Dutta, NCSU, Fall 2010
14
Three-Way Handshake (cont'd)
Copyright Rudra Dutta, NCSU, Fall 2010
15
Three-Way Handshake (cont'd)
Copyright Rudra Dutta, NCSU, Fall 2010
16
TCP Connection Termination
A sender closes its part of the connection by
sending a FIN segment
2. After ACKing the FIN, the receiver can still
send data on its part of the connection (halfclose)
3. Finally, the receiver closes its part of the
connection by sending a FIN segment (graceful
close)
1.
Copyright Rudra Dutta, NCSU, Fall 2010
17
TCP Connection Termination (cont'd)
Copyright Rudra Dutta, NCSU, Fall 2010
18
TCP Connection States
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Fall 2010
CLOSE_WAIT
LAST_ACK
19
TCP: Normal Client Open and Close
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Fall 2010
CLOSE_WAIT
LAST_ACK
20
TCP: Normal Server Open and Close
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Fall 2010
CLOSE_WAIT
LAST_ACK
21
Data Transfer in TCP



Data received from an application usually sent in
segments of size MSS (Maximum Segment Size)
ACKs carry the sequence number of the next byte
receiver expects to receive - this is a cumulative count
Unacknowledged data = data sent by sender, but not
yet acknowledged by the receiver
–

i.e., data not yet received, or acknowledgment not yet received
Sender is allowed to send a certain amount of
unacknowledged data
Copyright Rudra Dutta, NCSU, Fall 2010
22
TCP Windowing Mechanism
“Distance”
Sender



Time
Receiver
Like GBN, but some differences:
Buffered data is bytes, not segments
Size of buffer does not solely drive maximum outstanding data
–
Each ACK carries a window advertisement from receiver
– window = how many additional bytes (after last ACK’d byte) the
receiver is prepared to accept
– After this, the sender must stop and wait for an acknowledgment, even
if buffer is available


Naturally, if buffer is unavailable then this dominates
Allows adjustment to “window size” to approach “full window”
–
For TCP, “propagation delay” is through entire network
Copyright Rudra Dutta, NCSU, Fall 2010
23
Sliding Windows

TCP sliding window mechanism allows multiple
segments to be sent before an ACK is returned
 Left boundary of window = earliest
unacknowledged byte
–

An acknowledgment advances this left boundary
Right boundary of window = latest byte that can
be sent (and may have been)
–
–
An updated window advertisement advances this
right boundary
Subtle consequence: window can shrink
Copyright Rudra Dutta, NCSU, Fall 2010
24
Sliding Window
Copyright Rudra Dutta, NCSU, Fall 2010
25
Dynamic Sliding Window - Shrinking
5

Bytes 9-13 are “canceled retroactively”
–
Later, when window slides right, these bytes treated by sender
as if never sent
Copyright Rudra Dutta, NCSU, Fall 2010
26
TCP Window Management Example
Copyright Rudra Dutta, NCSU, Fall 2010
27
Performance Issues

Small packets, and small window
advertisements, create efficiency problems
–

“Silly Window” () Syndrome
These can solved by
–
Delaying sending of data

–
Delaying sending of ACKs/window advertisements


sender “voluntarily” consolidates multiple small packets into
a single larger packet
sender “strongly encouraged” to consolidate multiple small
packets into a single larger packet
TCP “probes” the network for bandwidth (later)
–
–
Attempts to adjust throughput to available b/w
This involves latency, and embeds assumptions
Copyright Rudra Dutta, NCSU, Fall 2010
28
TCP Error Control
1.
Error detection
–
–
–
2.
checksum: to check for corrupted segments at
destination
ACK: to confirm receipt of segment by destination
time-out: one retransmission timer for each segment
sent
Error correction – no FEC
–
source retransmits segments for which
retransmission timer expired
Copyright Rudra Dutta, NCSU, Fall 2010
29
Congestion Control

Congestion - overloaded network
–
Applies to part of the network - the “pain point” or “bottleneck”

Some link(s) in the network being used by a disproportionately
large amount of traffic
 Some node(s) in the network at the head of many highly loaded link
–

Either way, store-and-forward buffer will build up at node and
eventually overflow
Congestion is undesirable - wastes network resources
for no gain, causes oscillations
–
Can try adaptively re-routing traffic, or notifying sources to slow
down
Copyright Rudra Dutta, NCSU, Fall 2010
30
TCP Congestion Control
•
Strictly, not a transport layer issue
•
•
•
•
1.
2.
4.
(Read “Why The Internet Only Just Works”, Mark Handley, BT Technology
Journal, Vol 24, No 3, July 2006 for a discussion)
Uses a closed-loop approach
Feedback is implicit (absence of ACKs)
•
3.
Originally not considered a function of the Internet
“Congestion collapse” grew more serious in late 1980’s
TCP was simply the easiest place to attempt to exercise control
Loss is assumed to indicate congestion
Adapts rate when congestion is sensed
Control mechanism is end-to-end
–
–
no connection state maintained by the network!
“Keep the network stupid” - carried perhaps to unreasonable
lengths
Copyright Rudra Dutta, NCSU, Fall 2010
31
Flow vs. Congestion Control
Copyright Rudra Dutta, NCSU, Fall 2010
32
TCP Implicit Feedback

TCP relies on implicit feedback from the network to
detect congestion
–
–

Assumption: packet loss is always due to congested
routers, not transmission errors
–

Timeout caused by a lost packet (used by TCP-Tahoe, TCPReno)
Duplicate ACK (used by TCP-Reno)
As wired transmission technology had grown more
sophisticated, reasonable assumption
When congestion is detected, slow down rate
–
–
–
Mechanism - make window smaller
smaller window = lower rate, larger window = higher rate
(For TCP, propagation delay is likely to be very much larger
than transmission delay of MSS for long TCP connections)
Copyright Rudra Dutta, NCSU, Fall 2010
33
TCP Dynamic Windows

Window #1: advertised by receiver
–

Window #2: maintained by sender
–
–
–

purpose: avoid overrunning a slow receiver (i.e., for
flow control)
purpose: avoid network overload (i.e. for congestion
control)
Called the congestion window, or cwnd
the sending TCP dynamically manipulates cwnd
“The” window size =
MIN(receiver_advertisement, cwnd)
Copyright Rudra Dutta, NCSU, Fall 2010
34
TCP Congestion Control Overview

Probe for available bandwidth by increasing cwnd
–
“slow” start = initially window is small

–


But every ACK adds 1 MSS to window
exponential increase phase - every RTT doubles window
Congestion avoidance = linear increase phase
–
Slow start threshold (= ssthresh)
–
controls transition from exponential to linear phase
Upon packet loss (timeout, duplicate
acknowledgment), assume congestion
–
–
–
retransmit the packet
reduce the window size - reduce rate
Also re-start probing
Copyright Rudra Dutta, NCSU, Fall 2010
35
Evolution of TCP's Congestion Window
Copyright Rudra Dutta, NCSU, Fall 2010
36
Problems with Assumptions

TCP embeds assumptions
–
–
–
–

“Pipe thickness” - product of transmission rate
and propagation delay
–

Loss indicates congestion
Available bw changes over time
(No other network mechanism controls loss or
congestion)
(Bandwidth is low in general – easy to overwhelm)
Amount of bits required to fill pipe
TCP works well over IP, especially when pipe
thickness values are comparatively low
Copyright Rudra Dutta, NCSU, Fall 2010
37
Pitfalls and Solutions

For high speed optical backbones, TCP takes long time to probe
and find b/w
–

For lossy wireless medium, TCP mistakes wireless loss for limit of
b/w
–

Solution: hide wireless hop from TCP (split connection), but provide
stable b/w info
For networks that intentionally form bursts, such as Optical Burst
Switching, TCP retreats to very low rate
–

Solution: change b/w probing mechanism (more aggressive) – but need
to stay reactive
Solution: balance need to form bursts and remain responsive, and/or
hide from TCP
For networks in which links may disconnect for significant time,
TCP gives up
–
Solution: Change (or remove) congestion semantics from TCP
Copyright Rudra Dutta, NCSU, Fall 2010
38
Summary

Transport layer is logically application’s
interface to network
–
–

Must create endpoint abstractions (ports)
Must maintain state
In the Internet,
–
TCP attempts to impose reliability on unreliable
network layer

–
Requires sliding window management
TCP attempts to perform congestion control

Slow down transmission rate in response to lost segments
Copyright Rudra Dutta, NCSU, Fall 2010
39