TCP - Rudra Dutta
Download
Report
Transcript TCP - Rudra Dutta
TCP - Transport in the Internet
Rudra Dutta
ECE/CSC 570 - Fall 2010, Section 001, 601
Transport Layer
First end-to-end layer
Functions
–
Endpoint abstraction
–
–
Multiplexing on a host
Context establishment
Enhancements
Reliability (ARQ)
Flow Control (ARQ)
Other services?
Copyright Rudra Dutta, NCSU, Fall 2010
2
Communication Endpoints
Copyright Rudra Dutta, NCSU, Fall 2010
3
End-to-End Transport
Peer layer is only at the remote host
–
–
Point-to-point cooperation, like DLC
Hence DLC mechanisms like ARQ can be applied
For flow control
For reliability (error control - retransmission)
–
However, the challenge is more complex
Multiple applications at each endpoint
Network may lose, reorder, or duplicate packets
–
Need:
Context
Explicit addressing, both for destination host and endpoint
Copyright Rudra Dutta, NCSU, Fall 2010
4
Endpoint Access
Transport software built in two parts
–
–
Host specific part - multiplexes network layer, global context
Application specific part - maintains flow state and provides
network
Also provides access point for higher layers (sockets in TCP)
Copyright Rudra Dutta, NCSU, Fall 2010
5
Transmission Control Protocol
Transport layer of the Internet
Complement the weakness of IP
End-to-end goals
–
–
Ordered delivery
Guaranteed delivery
Drawback - time-related metrics may suffer as a result
Refinements
–
–
Flow control
Congestion control
Example of how new functionality is introduced where it
is easiest (not necessarily where it is most logical)
Copyright Rudra Dutta, NCSU, Fall 2010
6
TCP Overview
Transmission Control Protocol (RFCs 793, 1122,
1323)
Segment: unit of transfer, usually contained in a
single IP datagram
Reliability achieved using
–
–
–
Sliding window mechanism for
–
–
–
Acknowledgments (therefore, seq. no.s)
Timeouts, retransmissions
Checksums on header and data
efficient transmission
flow control
congestion control
Important: several flavors (largely interoperable)
Copyright Rudra Dutta, NCSU, Fall 2010
7
Properties of TCP Reliable Service
Point-to-point
Stream orientation: TCP thinks of data as a stream of
bytes
Unstructured stream: TCP does not honor structured
data
Connection orientation: state maintained at both ends
Buffered transfer: the software divides data stream into
segments independent of application program transfers
Full duplex connection: concurrent data transfer in both
directions
Copyright Rudra Dutta, NCSU, Fall 2010
8
TCP Connections and Endpoints
TCP uses destination port to identify ultimate
destination
TCP uses connection as fundamental abstraction
–
–
TCP ports
–
–
connection: a pair of endpoints
endpoint: (host_IP_address, port #) (socket)
a port # does not correspond to a single object
a TCP port number can be shared by multiple connections
on the same machine (remote endpoints differ)
Sockets
–
–
A “pipe” primitive whose endpoints correspond to
communication endpoints
Application programmer’s interface to TCP functionality
Source code: “Open a TCP socket to communicate to
152.1.226.10, port 80”
Copyright Rudra Dutta, NCSU, Fall 2010
9
TCP Header Format
Copyright Rudra Dutta, NCSU, Fall 2010
10
TCP Header Fields
Sequence number
–
every byte in data stream is numbered
– sequence number = number of the first data byte in the segment in the
sender's byte stream
– wraps around after 232 -1
ACK number
–
next sequence number the sender of the acknowledgment expects to
receive
– i.e., sequence number + 1 of last successfully received consecutive
data byte
– valid only when ACK flag = 1
Header length
–
length of header (in bytes) / 4
– With maximum value of 15 (24-1), header cannot exceed 60 bytes
Copyright Rudra Dutta, NCSU, Fall 2010
11
TCP Header Fields
Window size
–
number of bytes (starting with the one specified in the ACK field) that
receiver is willing to accept
– 16 bits long; max value is 65535
– used for flow control
Checksum
–
Ensures correctness of header and payload
Copyright Rudra Dutta, NCSU, Fall 2010
12
Connection Establishment
Naive approach:
–
–
–
Connection Request message
Connection Accepted message
Sequence numbers always start at 0
Problem: delayed duplicates
TCP approach
–
–
–
Use long sequence numbers
Choose a different Initial Sequence Number for each
connection, ignore duplicate requests
Estimate the time a duplicate packet may “live” in the
network and avoid when booting up
Copyright Rudra Dutta, NCSU, Fall 2010
13
Three-Way Handshake
Copyright Rudra Dutta, NCSU, Fall 2010
14
Three-Way Handshake (cont'd)
Copyright Rudra Dutta, NCSU, Fall 2010
15
Three-Way Handshake (cont'd)
Copyright Rudra Dutta, NCSU, Fall 2010
16
TCP Connection Termination
A sender closes its part of the connection by
sending a FIN segment
2. After ACKing the FIN, the receiver can still
send data on its part of the connection (halfclose)
3. Finally, the receiver closes its part of the
connection by sending a FIN segment (graceful
close)
1.
Copyright Rudra Dutta, NCSU, Fall 2010
17
TCP Connection Termination (cont'd)
Copyright Rudra Dutta, NCSU, Fall 2010
18
TCP Connection States
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Fall 2010
CLOSE_WAIT
LAST_ACK
19
TCP: Normal Client Open and Close
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Fall 2010
CLOSE_WAIT
LAST_ACK
20
TCP: Normal Server Open and Close
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Fall 2010
CLOSE_WAIT
LAST_ACK
21
Data Transfer in TCP
Data received from an application usually sent in
segments of size MSS (Maximum Segment Size)
ACKs carry the sequence number of the next byte
receiver expects to receive - this is a cumulative count
Unacknowledged data = data sent by sender, but not
yet acknowledged by the receiver
–
i.e., data not yet received, or acknowledgment not yet received
Sender is allowed to send a certain amount of
unacknowledged data
Copyright Rudra Dutta, NCSU, Fall 2010
22
TCP Windowing Mechanism
“Distance”
Sender
Time
Receiver
Like GBN, but some differences:
Buffered data is bytes, not segments
Size of buffer does not solely drive maximum outstanding data
–
Each ACK carries a window advertisement from receiver
– window = how many additional bytes (after last ACK’d byte) the
receiver is prepared to accept
– After this, the sender must stop and wait for an acknowledgment, even
if buffer is available
Naturally, if buffer is unavailable then this dominates
Allows adjustment to “window size” to approach “full window”
–
For TCP, “propagation delay” is through entire network
Copyright Rudra Dutta, NCSU, Fall 2010
23
Sliding Windows
TCP sliding window mechanism allows multiple
segments to be sent before an ACK is returned
Left boundary of window = earliest
unacknowledged byte
–
An acknowledgment advances this left boundary
Right boundary of window = latest byte that can
be sent (and may have been)
–
–
An updated window advertisement advances this
right boundary
Subtle consequence: window can shrink
Copyright Rudra Dutta, NCSU, Fall 2010
24
Sliding Window
Copyright Rudra Dutta, NCSU, Fall 2010
25
Dynamic Sliding Window - Shrinking
5
Bytes 9-13 are “canceled retroactively”
–
Later, when window slides right, these bytes treated by sender
as if never sent
Copyright Rudra Dutta, NCSU, Fall 2010
26
TCP Window Management Example
Copyright Rudra Dutta, NCSU, Fall 2010
27
Performance Issues
Small packets, and small window
advertisements, create efficiency problems
–
“Silly Window” () Syndrome
These can solved by
–
Delaying sending of data
–
Delaying sending of ACKs/window advertisements
sender “voluntarily” consolidates multiple small packets into
a single larger packet
sender “strongly encouraged” to consolidate multiple small
packets into a single larger packet
TCP “probes” the network for bandwidth (later)
–
–
Attempts to adjust throughput to available b/w
This involves latency, and embeds assumptions
Copyright Rudra Dutta, NCSU, Fall 2010
28
TCP Error Control
1.
Error detection
–
–
–
2.
checksum: to check for corrupted segments at
destination
ACK: to confirm receipt of segment by destination
time-out: one retransmission timer for each segment
sent
Error correction – no FEC
–
source retransmits segments for which
retransmission timer expired
Copyright Rudra Dutta, NCSU, Fall 2010
29
Congestion Control
Congestion - overloaded network
–
Applies to part of the network - the “pain point” or “bottleneck”
Some link(s) in the network being used by a disproportionately
large amount of traffic
Some node(s) in the network at the head of many highly loaded link
–
Either way, store-and-forward buffer will build up at node and
eventually overflow
Congestion is undesirable - wastes network resources
for no gain, causes oscillations
–
Can try adaptively re-routing traffic, or notifying sources to slow
down
Copyright Rudra Dutta, NCSU, Fall 2010
30
TCP Congestion Control
•
Strictly, not a transport layer issue
•
•
•
•
1.
2.
4.
(Read “Why The Internet Only Just Works”, Mark Handley, BT Technology
Journal, Vol 24, No 3, July 2006 for a discussion)
Uses a closed-loop approach
Feedback is implicit (absence of ACKs)
•
3.
Originally not considered a function of the Internet
“Congestion collapse” grew more serious in late 1980’s
TCP was simply the easiest place to attempt to exercise control
Loss is assumed to indicate congestion
Adapts rate when congestion is sensed
Control mechanism is end-to-end
–
–
no connection state maintained by the network!
“Keep the network stupid” - carried perhaps to unreasonable
lengths
Copyright Rudra Dutta, NCSU, Fall 2010
31
Flow vs. Congestion Control
Copyright Rudra Dutta, NCSU, Fall 2010
32
TCP Implicit Feedback
TCP relies on implicit feedback from the network to
detect congestion
–
–
Assumption: packet loss is always due to congested
routers, not transmission errors
–
Timeout caused by a lost packet (used by TCP-Tahoe, TCPReno)
Duplicate ACK (used by TCP-Reno)
As wired transmission technology had grown more
sophisticated, reasonable assumption
When congestion is detected, slow down rate
–
–
–
Mechanism - make window smaller
smaller window = lower rate, larger window = higher rate
(For TCP, propagation delay is likely to be very much larger
than transmission delay of MSS for long TCP connections)
Copyright Rudra Dutta, NCSU, Fall 2010
33
TCP Dynamic Windows
Window #1: advertised by receiver
–
Window #2: maintained by sender
–
–
–
purpose: avoid overrunning a slow receiver (i.e., for
flow control)
purpose: avoid network overload (i.e. for congestion
control)
Called the congestion window, or cwnd
the sending TCP dynamically manipulates cwnd
“The” window size =
MIN(receiver_advertisement, cwnd)
Copyright Rudra Dutta, NCSU, Fall 2010
34
TCP Congestion Control Overview
Probe for available bandwidth by increasing cwnd
–
“slow” start = initially window is small
–
But every ACK adds 1 MSS to window
exponential increase phase - every RTT doubles window
Congestion avoidance = linear increase phase
–
Slow start threshold (= ssthresh)
–
controls transition from exponential to linear phase
Upon packet loss (timeout, duplicate
acknowledgment), assume congestion
–
–
–
retransmit the packet
reduce the window size - reduce rate
Also re-start probing
Copyright Rudra Dutta, NCSU, Fall 2010
35
Evolution of TCP's Congestion Window
Copyright Rudra Dutta, NCSU, Fall 2010
36
Problems with Assumptions
TCP embeds assumptions
–
–
–
–
“Pipe thickness” - product of transmission rate
and propagation delay
–
Loss indicates congestion
Available bw changes over time
(No other network mechanism controls loss or
congestion)
(Bandwidth is low in general – easy to overwhelm)
Amount of bits required to fill pipe
TCP works well over IP, especially when pipe
thickness values are comparatively low
Copyright Rudra Dutta, NCSU, Fall 2010
37
Pitfalls and Solutions
For high speed optical backbones, TCP takes long time to probe
and find b/w
–
For lossy wireless medium, TCP mistakes wireless loss for limit of
b/w
–
Solution: hide wireless hop from TCP (split connection), but provide
stable b/w info
For networks that intentionally form bursts, such as Optical Burst
Switching, TCP retreats to very low rate
–
Solution: change b/w probing mechanism (more aggressive) – but need
to stay reactive
Solution: balance need to form bursts and remain responsive, and/or
hide from TCP
For networks in which links may disconnect for significant time,
TCP gives up
–
Solution: Change (or remove) congestion semantics from TCP
Copyright Rudra Dutta, NCSU, Fall 2010
38
Summary
Transport layer is logically application’s
interface to network
–
–
Must create endpoint abstractions (ports)
Must maintain state
In the Internet,
–
TCP attempts to impose reliability on unreliable
network layer
–
Requires sliding window management
TCP attempts to perform congestion control
Slow down transmission rate in response to lost segments
Copyright Rudra Dutta, NCSU, Fall 2010
39