Transmission Control Protocol (TCP)

Transcript Transmission Control Protocol (TCP)

TCP/UDP/IP
Courtesy of Kevin Fall at UC Berkeley
& Raghupathy Sivakumar at GATECH
TCP/IP Protocol Suite
Physical layer
Data-link layer – ARP, RARP,
Network layer – IP, ICMP, IGMP
Transport layer – TCP, UDP, RTP
Application layer – http, smtp, ftp
Application
Transport
IP
DataLink
Physical
TCP/IP Protocol Suite
IP is used for each network node (or router)
Dest
Source
Application
Transport
IP
DataLink
Physical
Router
IP
DataLink
Router
IP
DataLink
Physical
Physical
Application
Transport
IP
DataLink
Physical
Internet Protocol (IP) service model
best-effort datagram model
error detection in header only
addressing, routing
signaling (ICMP)
Fragmentation and reassembly
Multiplexing and Demultiplexing
Addressing
Need a unique identifier for every host
in the Internet (analogous to postal
address)
IP addresses are 32 bits long
Hierarchical addressing scheme
Conceptually …

IPaddress =(NetworkAddress,HostAddress)
Address Classes
Class A
0 netId
hostId
7 bits
24 bits
Class B
1 0
netId
14 bits
hostId
16 bits
Class C
11 0
netId
21 bits
hostId
8 bits
Addresses and Hosts
Since netId is encoded into IP address,
each host will have a unique IP address
for each of its network connections
Hence, IP addresses refer to network
connections and not hosts
Why will hosts have multiple network
connections?
Special Addresses
Prefix
(netID)
Suffix
(hostID)
Type of Address
Purpose
All 0s
All 0s
This computer
Used during
bootstrap
Network
All 0s
Network
Identifies a
network
Network
All 1s
Direct broadcast
Broadcast on a
specified net
All 1s
All 1s
Limited broadcast
Broadcast on a
local net
127
any
loop back
testing
Exceptions to Addressing
Subnetting


Splitting hostId into subnetId and hostId
Achieved using subnet masks
Supernetting (Classless Inter-domain Routing
or CIDR)


Combining multiple lower class address ranges
into one range
Achieved using 32 bit masks and max prefix
routing
Examples
Subnetting (B class)
Network
Network
Host
Subnet
Host
IP Routing
Direct


If source and destination hosts are connected
directly
Still need to perform IP address to physical
address translation
Indirect


Table driven routing
Each entry: (NetId, RouterId)
 Default router
 Host-specific routes
IP Fragmentation
The physical network layers of different
networks in the Internet might have different
maximum transmission units (MTUs)
The IP layer performs fragmentation when
the next network has a smaller MTU than the
current network
IP fragmentation
MTU = 1500
MTU=500
IP Reassembly
Fragmented packets need to be put together
Where does reassembly occur?

The router at the other end of the smaller MTU
network
 Router overhead: complexity, buffering
 More than one path

The final destination
 Many fragments on the path
 more chance of missing packets
 Utilization inefficiency (many headers)
IP Header
Used for conveying information to peer
IP layers
Dest
Source
Application
Transport
IP
DataLink
Physical
Router
IP
DataLink
Router
IP
DataLink
Physical
Physical
Application
Transport
IP
DataLink
Physical
IP Header (contd.)
4 bit 4 bit hdr
version length
8 bit
TOS
16 bit identification
8 bit TTL
16 bit total length
3 bit
flags
8 bit protocol
13 bit fragment offset
16 bit header checksum
32 bit source IP address
32 bit destination IP address
Options (if any) (maximum 40 bytes)
data
Multiplexing
Web
Email
TCP
MP3
UDP
IP
IP datagrams
Web
Email
TCP
UDP
IP
IP datagrams
MP3
Endpoint identification
how to identify a remote
application/service on the Internet?
[IP_address, port number, protocol]
expect to find a process listening for
incoming packets
Port numbers
port numbers are in range [0..64K-1]
ports below 1024 are known as wellknown ports and reserved by IANA
ports in range [1024..64K-1] may be
registered but are not enforced
User datagram protocol (UDP)
UDP
provides a datagram service model

Additional intelligence built at the
application layer if needed
Error detection
header (8bytes)
Sending a UDP datagram
application requires that dest IP
address, port number to send
application chooses message size,
requests send using API (e.g. sockets)
API allocates OS-level buffer, leaving
for some headers, copies data from
user-level buffer to OS-level buffer,
gives to UDP module
Sending a UDP datagram
UDP module receives data and
prepends IP and UDP headers
fills in IP header info

proto, len, src, dst,…
fills in UDP header

src_port, dst_port, len,…
sets TTL and TOS
sends UDP/IP packet to IP module
Ethernet
IP header
header
UDP
header
Application data Ethernet trailer
Sending a UDP datagram
IP module receives packet
insert options if enabled
sets IP vers, IHL, offset, ID fields
determines an interface/MTU
fragments if needed and sends to link
layer
Receiving a UDP datagram
network adapter receives a frame, interrupts
processor
device driver determines frame contains IP
type data, strips link layer header and gives
to IP module
IP checks IP header, processes options
IP checks IP address (unicast, multicast, …)
IP reassembles if necessary, give the whole
packet to UDP based on protocol field
Receiving a UDP datagram
UDP receives IP/UDP packet
checks length and checksum
locates OS PCB based on dest port,
providing receiving process’ ID;
generates ICMP unreachable if nobody
there
copies to receiving process’ buffer
makes receiving process get to this
*PCB: protocol control block
Why use UDP?
downsides




no error correction
No flow control
No congestion control
App picks packet size
upsides

No connection establishment
 stateless


Broadcast/multicast more straight forward
App picks packet size
Transmission Control Protocol
(TCP)
TCP
End-to-end transport protocol
Responsible for reliability, congestion
control, flow control, and sequenced
delivery
Applications that use TCP: http (web),
telnet, ftp (file transfer), smtp (email),
chat
Applications that don’t: multimedia
(typically) – use UDP instead
Ports, End-points, & Connections
http ftp smtptelnet
TCP
UDP
IP Layer
Protocol ID
A1
A2
A3
Transport
Port
IP address
Thus, an end-point is represented by (IP address,Port)
Ports can be re-used between transport protocols
A connection is (SRC IP address, SRC port, DST IP
address, DST port)
Same end-point can be used in multiple connections
TCP
Connection Establishment
Connection Maintenance

Reliability
 by acknowledgement packet (ACK)



Congestion control
Flow control
Sequencing
Connection Termination
Fundamental Mechanism
data
ack
data
RTO
retx
data
ack
Simple stop and
go protocol
Timeout based
reliability (loss
recovery)
Multiple
unacknowledged
packets (W)
Sliding Window Protocol: 1 2 3 4 5 6 7 8 9 10 11 12 ….
Sliding window
Sliding window for flow control
Sliding window
The sender cannot send more data
Active and Passive Open
How do applications initiate a
connection?
One end (server) registers with the TCP
layer instructing it to “accept”
connections at a certain port
The other end (client) initiates a
“connect” request which is “accept”-ed
by the server
Reliability (Loss Recovery)
data
ack
Sequence Numbers
TCP uses cumulative
Acknowledgments (ACKs)

1
2
3 3
4
3
3
4


Next expected in-sequence packet sequence
number
Pros and cons?
Piggybacking
Timeout calculation


Rttavg = k*Rttavg + (1-k)*Rttsample
RTO = Rttavg + 4*Rttdeviation
Retransmission (fast retransmit)
after 3 duplicate ACKs, TCP sender
figures out the packet is lost
Congestion control: slow start
Initial window size W = 1 (can be bottleneck!)
Each ACK will increase W by 1
Congestion Control
Alternative: Fall to W/2 and start
congestion avoidance directly
Slow Start
 Start with W=1
 For every ACK,
W=W+1
Congestion Avoidance
(linear increase)
 For every ACK,
 W = W+1/W
Congestion Control
(multiplicative decrease)
 ssthresh = W/2
 W = 1
Why LIMD? (fairness)
• W=1
• 100
10
• 1
1
• Problem? – inefficient
diff = 90
diff = 0
•
•
•
•
•
•
•
•
•
•
•
10
5
6
7
diff
diff
diff
diff
28
14
diff = 45
diff = 23.5
38.25
19.65
diff = 23.5
diff = 11.2
• W=W/2
100
50
51
52
..
73
37.5
..
61.75
30.85
..
=
=
=
=
90
45
45
45
Flow Control
Prevent sender from overwhelming the
receiver
Receiver in every ACK advertises the
available buffer space at its end
Window calculation

MIN(congestion control window, flow control window)
Sequencing
1
2
3 3
4
3
3
4
1 given to app
2 given to app
Loss
4 buffered (not given to app)
3 & 4 given to app
4 discarded
Byte sequence numbers
TCP receiver buffers out
of order segments and
reassembles them later
Starting sequence
number randomly chosen
during connection
establishment

Why?
Connection Establishment &
Termination
Active open
SYN
Send connection
request
SYN+ACK
Server does passive open
Accept connection request
Send acceptance
ACK
DATA
Start connection
3-way handshake
used for connection
establishment

Delay!
Randomly chosen
sequence number
(why?) is conveyed to
the other end
Similar FIN, FIN+ACK
exchange used for
connection
termination
TCP Segment Format
16 bit SRC Port
16 bit DST Port
32 bit sequence number
32 bit ACK number
HL Rsv’d flags 16 bit window size
16 bit TCP checksum 16 bit urgent pointer
Options (if any)
Data
Flags: URG, ACK,
PSH, RST, SYN,
FIN
Silly window syndrome (SWS)
TCP is a window-based protocol
TCP receiver advertises a small amount
of window; so TCP sender transmits
only a short packet each time
Inefficient utilization of network BW
So what?

Save up enough to send
Nagle’s algorithm
Buffer all user data if any
unacknowledged data is outstanding
Ok to send if all is ACK’d or have a MSS
size worth of data
If small delay is wanted, Nagle’s
algorithm should be disabled
MSS size: maximum TCP payload size
MTU: maximum PDU size supported by link layer
MTU = MSS + 20 (TCP header) + 20 (IP header)
Interactive applications: Telnet
•
Remote terminal applications (e.g., Telnet) send characters
to a server. The server interprets the character and sends
the output at the server to the client.
•
For each character typed, you see three packets:
1. Client  Server: Send typed character
2. Server  Client: Echo of character (or user output) and
acknowledgement for first packet
3. Client  Server: Acknowledgement for second packet
Why 3 packets per character?
character
We would expect four
packets per character:
cter
ACK of chara
c
echo of chara
ter
ACK of echoed character
However, tcpdump
shows this pattern:
What has happened?
TCP has delayed the
transmission of an ACK
character
ACK and echo
of character
ACK of echoed character
Delayed ACKS
Problem:

In request/response programs, you send
separate ACK and Data packets for each
transaction
Solution:




Don’t ACK data immediately
Wait 200ms (must be less than 500ms – why?)
Must ACK every other packet
Must not delay duplicate ACKs
UDP-lite
Error-resilient CODECs appear
Over wireless links, BER is not negligible
Checksumming drops corrupted packets (even 1 bit error)
H.263+ Encoder
H.263+ Decoder
Packetization
De-packetization
RTP
RTP
Socket Interface
Socket Interface
UDP / UDP Lite
UDP / UDP Lite
IP
e.g. cellular networks
PPP
PPP
GSM Network
Mobile Host
Unix BSDi 3.0
IP
GSM
Base Station
PSTN
Fixed Host
Unix BSDi 3.0
UDP-lite
Error-resilient CODECs means there are redundancy or FEC
It may be better to use packets with some errors
In UDP-lite, there are error-sensitive and insensitive parts


The size of the former part is called coverage
E.g. application header can be error-sensitive part
Implemented in BSDi 3.0 kernel
Requires MAC-lite as well

Receiver MAC should pass the data to upper layer despite errors
0
7 8
source port #
length / coverage
15
dest port #
checksum

Transmission Control Protocol (TCP)

Transcript Transmission Control Protocol (TCP)

Directory