Transcript ppt

EE 122: Transport Protocols
Kevin Lai
October 16, 2002
Motivation

IP provides a weak, but efficient service model
(best-effort)
- packets can be delayed, dropped, reordered, duplicated
- packets have limited size (why?)

IP packets are addressed to a host
- how to decide which application gets which packets?

How should hosts send into the network?
- every sends as fast as they can  drop many packets,
network is under-utilized (congestion collapse)
[email protected]
2
Transport Protocol
TCP/UDP

Provides more than the underlying
network protocol
- more reliability, in order delivery, at most
once delivery
- supports messages of arbitrary length
- provide a way to decide which packets
go to which applications
(multiplexing/demultiplexing)
- govern how hosts should send data to
prevent congestion collapse (congestion
control and avoidance)
[email protected]
IP
Transport Layer
Networking Layer
Link Layer
Physical Layer
3
UDP




User Datagram Protocol
minimalistic transport protocol
same best-effort service model as IP
messages can be larger than one packet, but still
limited (64KB)
- uses fragmentation




provides multiplexing/demultiplexing to IP
does not provide congestion control
advantage over TCP: does not increase end-toend delay over IP
application example: video/audio streaming
[email protected]
4
TCP







Transmission Control Protocol
reliable, in-order, and at most once delivery
messages can be of arbitrary length
provides multiplexing/demultiplexing to IP
provides congestion control and avoidance
increases end-to-end delay over IP
e.g., file transfer, chat
[email protected]
5
Headers



IP
IP header  used for IP routing, fragmentation,
error detection…
UDP header  used for
multiplexing/demultiplexing, error detection
TCP header  used for
multiplexing/demultiplexing, flow and congestion
control
Receiver
Sender
data
Application
Application
TCP/UDP
data
TCP UDP
TCP UDP
TCP/UDP
data
IP
IP
[email protected]
data
IP
TCP/UDP
data
TCP/UDP
data
6
IP Header
0
4
Version HLen
8
16
TOS
Identification
TTL
19
31
Length
Flags
Fragment offset
Protocol
Header checksum
Source address
Destination address
20 bytes
Options (variable)

Comments
Payload
- HLen – header length only in 32-bit words (5 <= HLen <= 15)
- TOS (Type of Service): Differentiated Service (6 bits) Explicit Congestion
Notification (ECN) (2 bits)
- Length – the length of the entire datagram/segment; header + data
- Flags: Don’t Fragment (DF) and More Fragments (MF)
- Protocol: identifies the transport protocol
- Header checksum - uses 1’s complement
[email protected]
7
Fragmentation


What happens if router has to forward an IP
packet that is larger than allowed by a data link
layer?
Break the IP packet into smaller IP packets and
provide a way to reassemble
- set “more fragments” bit in all fragments but last
- set the fragment offset of fragment to be offset (in 8byte offsets) from beginning of original packet
- set the packet len to be length of this fragment
[email protected]
8
Fragmentation Issues



Sending host had better be changing the IP ID
Loose one fragment, loose them all
Reassembly is complex
- requires per packet state


Only reassemble at destination
Fragmentation can be avoided using Path
Maximum Transmission Unit Discovery (PMTU)
- most TCP implementations use PMTU
[email protected]
9
UDP Header
0
16
31
Destination port
Source port
UDP length
UDP checksum
Payload (variable)



Source and destination ports use port address
space
UDP length is UDP packet length (including UDP
header and payload, but not IP header)
Optional UDP checksum is over UDP packet
- why have UDP checksum in addition to IP checksum?
- why not have just the UDP checksum?
- why is the UDP checksum optional?
[email protected]
10
Port Addressing




Need to decide which application gets which packets
Solution: map each socket to a port
Client must know server’s port
separate 16-bit port address space for UDP and TCP
- (src IP, src port, dst IP, dst port) uniquely identifies TCP
connection

Well known ports(0-1023): everyone agrees which services
run on these ports
- e.g., ssh:22, http:80
- on UNIX, must be root to gain access to these ports (why?)

ephemeral ports(most 1024-65535): given to clients
- e.g. chatclient gets one of these
[email protected]
11
TCP Header
0
4
10
Source port
16
31
Destination port
Sequence number
Acknowledgement
HdrLen
Flags
Advertised window
Checksum
Urgent pointer
Options (variable)
Payload (variable)


Sequence number, acknowledgement, and advertised window –
used by sliding-window based flow control
Flags:
-
SYN, FIN – establishing/terminating a TCP connection
ACK – set when Acknowledgement field is valid
URG – urgent data; Urgent Pointer says where non-urgent data starts
PUSH – don’t wait to fill segment
RESET – abort connection
[email protected]
12
TCP Challenges


how to provide reliable, in-order, and at most once
delivery? (sliding window)
need to synchronize sender and receiver (connection
establishment)
- e.g., exchange initial sequence numbers





prevent sender from sending too fast for receiver (flow
control)
estimate RTT for flow control and timeouts
how to initially decide on sending rate (slow start)
estimate how much bandwidth is available in network
(congestion avoidance)
slow down sending rate when we were sending too fast
(congestion control)
[email protected]
13
Connection Establishment: How it
works

Three-way handshake
- Goal: agree on a set of parameters: the start sequence
number for each side
- Starting sequence numbers are random.
Server
Client (initiator)
Active
connect()
Open
listen()
Passive
Open
accept()
allocate
buffer space
[email protected]
14
Three-way Handshake: Rationale


Three-way handshare adds 1 RTT delay
Why not just start sending data immediately?
- congestion control
• network could be congested
• SYN = 40 bytes, Data < 1500 bytes
• packets which are dropped at a link waste the
bandwidth of all previous links
• smaller packets waste less bandwidth
• SYN acts as cheap probe of network conditions
[email protected]
15
More Rationale
- protection from denial of service (1)
• attacker could use one host to fake many SRC IP
address (spoofing) and send many SYNs to server
• server must devote resources (e.g., buffer space)
for open connections
• server would run out of resources and become very
slow or crash
• 3-way handshake requires client to reply before
server allocates significant resources
- protection from denial of service (2)
• client and server begin connection using wellknown sequence number instead of random one
• attacker guesses sequence number, inserts bogus
packets into stream
[email protected]
16
Even More Rationale
- protection from delayed packets
• client connects to server twice in succession using
the same port
• a packet from the first connection is delayed and
arrives during the second connection
• if sequence numbers are close, old packet could be
accepted
[email protected]
17

Sliding-window based flow
control:
- Higher window  higher
throughput
• Throughput = wnd/RTT



Remember: window size
control throughput
How to determine effective
window size?
How to detect packet loss?
1/18/2000
RTT (Round Trip Time)
Flow control: Window Size and
Throughput
wnd = 3
18
Effective Window Size

Receiver window (MaxRcvBuf – maximum buffer size at receiver)
AdvertisedWindow = MaxRcvBuffer – (LastByteRcvd – LastByteRead)

Sender window (MaxSendBuf – maximum buffer size at sender)
EffectiveWindow = AdvertisedWindow – (LastByteSent – LastByteAcked)
MaxSendBuffer >= LastByteWritten - LastByteAcked
Sending Application
Receiving Application
MaxRcvBuffer
MaxSendBuffer
LastByteRead
LastByteWritten
LastByteAcked
LastByteSent
NextByteExpected
sequence number increases
LastByteRcvd
sequence number increases
[email protected]
19
Advertised Window = 0


Sender cannot send any data  receiver will not
send acks  receiver cannot notify sender that
advertised window has grown
Solution: TCP Persist Timer
- when sender gets advertised window == 0, it sets timer
- if sender receives advertised window > 0, cancels timer
- when timer expires, sender sends 1 byte payload to
receiver
• receiver must accept data 1 byte past window
- receiver sends ack for byte before 1 byte
- sender gets new advertised window
[email protected]
20
Silly Window Syndrome (SWS)
advWin = w
app: send 1
app: send w+1


app: read 1
app: read w-1
Maximum Segment Size (MSS) = w
App sends of small segments and/or receiver
advertises small window
- causes small packets to be sent in network
- small packets have high header overhead
[email protected]
21
SWS Solution

Sender only sends if
- no unacknowledged data, (Nagle’s algorithm) or
- full packet to send

Receiver only sends new advertised window if
- newAdvWin – oldAdvWin > min(MSS, 0.5*maxRcvBuf)
[email protected]
22
Set timeout


If haven’t received ack by timeout, retransmit
packet after last acked packet
How to set timeout?
- Too long: connection has low throughput
- Too short: retransmit packet that was just delayed
• packet was probably delayed because of
congestion
• sending another packet too soon just makes
congestion worse

Solution: make timeout proportional to RTT
[email protected]
23
RTT Estimation
Use exponential averaging:
SampleRTT  AckRcvdTime  SendSegmentTime
EstimatedRTT    EstimatedRTT  (1   )  SampleRTT
TimeOut  2  EstimatedRTT
0  1
EstimatedRTT

SampleRTT
[email protected]
Time
24
Problem
How to differentiate between the real ACK, and
ACK of the retransmitted packet
SampleRTT
Sender
Receiver
Sender
Receiver
SampleRTT

[email protected]
25
Karn/Partridge Algorithm


Measure SampleRTT only for original
transmissions
Exponential backoff  for each retransmission,
double EstimatedRTT
[email protected]
26
Jacobson/Karels Algorithm

Problem: exponential average is not enough
- one solution: use standard deviation (requires expensive
square root computation)
- use mean deviation instead
Difference  SampleRTT  EstimatedRTT
EstimatedRTT  EstimatedRTT    Difference
Deviation  Deviation    (| Difference |  Deviation)
TimeOut    EstimatedRTT    Deviation
0   1
 1
 4
[email protected]
27
Summary

IP
- routing, fragmentation

UDP
- Multiplexing/demultiplexing using ports
- error detection

TCP
-
reliable, in order, at most once delivery
Connection establishment  three way handshake
RTT  exponential averaging and variance
Flow control  based on sliding window protocol
Congestion control  next lecture
[email protected]
28