ECE544 - WINLAB

Download Report

Transcript ECE544 - WINLAB

ECE544: Communication Networks-II
Spring 2009
Hang Liu
Lecture 8
Includes teaching materials from D. Raychaudhuri, S. Gopal, L. Peterson,
Today’s Lecture
•
•
•
•
Introduction to transport protocols
UDP
TCP
RTP
Protocol Stack
Host8
Host1
Appl.
Appl.
TCP/UDP
TCP/UDP
IP
ETH
ETH
R2
R1
IP
FDDI
IP
FDDI
R3
IP
PPP
PPP
ETH
• Transport protocol
– Enable communication between 2 or more
processes (which may be on different hosts in
different networks)
– The Transport Layer is the lowest Layer in the
network stack that is an end-to-end protocol
IP
ETH
Transport Protocols
• Different applications have different requirements for transport
protocols.
– Guarantee message delivery
• network may drop messages
– Deliver messages in the same order they are sent
• Messages may be reordered in networks and incurs a long delay
– Delivers at most one copy of each message
• Messages may duplicate in networks
– Support arbitrarily large message
–
–
–
–
• Networks may limit message size
Support synchronization between sender and receiver
Allows the receiver to apply flow control to the sender
Support multiple application processes on each host
……
• Design just a few transport protocols to meet most of the
current and future application requirements
– Each satisfies the requirements for a class of appls
– Many applications=>few transport protocols
Most Popular Transport Protocols
• User Datagram Protocol (UDP):
–
–
Support multiple applications processes on each host
Option to check messages for correctness with CRC check
•
Transmission Control Protocol (TCP):
•
Real Time Protocol (RTP):
–
Ensures reliable delivery of packets between source and
destination processes
– Ensures in-order delivery of packets to destination process
– Other options
–
–
–
Serves real-time multimedia applications
Header contains sequence number, timestamp, marker bit etc
Runs over UDP
• TCP, UDP and RTP satisfy needs of the most common
applications
– Applications requiring other functionality usually use UDP for
transport protocol, and implement additional features as part of the
application
User Datagram Protocol (UDP)
•
De-multiplex the applications/processes on a host
– Port: an identification for a communication process
– Each process-to-process communication is identified by 4-Tuple Connection
Identifier
<SrcPort, SrcIPAddr, DestPort, DestIPAddr >
– Well-known port: Unix talk: port 517
0
16
SrcPort
DesPort
Length
Checksum
31
Appl
User process
Space
Appl
process
Appl
process
Appl
process
Appl
process
Port
Kernel
Data
TCP
UDP
Checksum: <UDP header, data, pseudoheader
(protocol ID, src IP addr, dest IP addr in
IP header + UDP length)>
Length: UDP header + data bytes
IP
6
Transmission Control Protocol
(TCP)
• First proposed by Vinton Cerf and
Robert Kahn, 1974
– TCP/IP enabled computers of all sizes,
from different vendors, different OSs, to
communicate with each other.
– Used by 80% of all traffic on the Internet
• Reliable, in-order delivery, connectionoriented, bye-stream service
A Simple File Transfer
Application
• Server:
– passive open and wait for connection
• Client:
– Active open and initialize connection
establishment
• After connection establishment
– Reliable data transport
• Terminate connection
TCP Operation
4-Tuple
Connection
Identifier:
•SrcPort,
•SrcIPAddr,
•DestPort,
•DestIPAddr
•
•
•
Sender application process only needs to provide a byte stream to the
kernel
Kernels on sending and receiving hosts operate TCP processes
Receiver application process only needs to read received bytes from
the assigned TCP buffers
TCP Operation Sequence
• TCP protocol completely implemented at the end
hosts
• Sequence numbers maintained in bytes (remember,
TCP serves a byte stream)
• Start of operation:
– Connection Establishment by a Three-Way Handshake
algorithm
– Consensus on Initial Sequence Number (ISN)
• Data Transfer:
– Sends the data in packets, reliably and as fast as the
network/receiver permits
• Finish:
– Both sides independently close their half of the connection
TCP Header Format
•
Flags:
– SYN
– FIN
– RESET
– PUSH
– URG
– ACK
Connection Establishment
Active participant
(client)
• Three-Way Handshake
Connection
Algorithm
Establishment
– SYN and ACK flags in
the header used
–Initial Sequence
numbers x and y
Data
selected at random
transport
• Required to avoid
same number for
previous incarnation
on the same
Termination
connection
Passive participant
(server)
Connection tear-down
Data write
Data ACK
• Any side can terminate the connection
• Each side closes its half of the connection independently
TCP State-Transition
Event/Action
CLOSED
Passive open
Close
LISTEN
SYN/SYN+ACK
Close
Send/SYN
SYN/SYN+ACK
SYN_RCVD
Close/FIN
SYN_SENT
SYN+ACK/ACK
ACK
ESTABLISHED
Close/FIN
FIN_WAIT_1
Active open/SYN
FIN/ACK
CLOSE_WAIT
FIN/ACK
ACK
Close/FIN
CLOSING
FIN_WAIT_2
ACK
FIN/ACK
TIME_WAIT
LAST_ACK
Timeout after
2 x MSL
ACK
CLOSED
• Max segment lifetime (MSL): recommendation 120 sec
TCP Functions
• Goal of TCP: Deliver data reliably and in order as fast as
possible (Throughput = bytes delivered/ time taken)
•
Flow Control:
•
Error Control and Congestion Control:
– avoid that the sender sends data too fast so that the TCP receiver
cannot reliably receive and process it.
– When packets are lost, it implies that the one or more queues in
intermediate routers have overflowed.
– Retransmit lost packets
– Scale back flow rate to reduce congestion
– Congestion control and error control are intertwined using the
congestion window (cwnd)
• TCP increases the sending rate to use the network (the route)
to full capacity or receiver capability
– But scale back if congestion occurs or if receiver is flooded.
Flow Control
• TCP uses a sliding window flow control protocol
– the receiver specifies the amount of data (in bytes) willing to buffer
in the AdvertisedWindow field of each segment
– The sender can send only up to that amount of data before it must
wait for an ack and window update from the receiver.
Receiving Appl
Sending Appl
LastByteRead
TCP LastByteWritten
LastByteAcked LastByteSent
LastByteAcked <= LastByteSent <= LastByteWritten
LastByteSent – LastByteAcked <= AdvertisedWindow
EffWin = AdvertisedWin(LastByteSent-LastByteAcked)
LastByteWritten – LastByteAcked <= MaxSendBuffer
TCP
NextByteExpected
LastByteRcvd
LastByteRead < NextByteExp <= LastByteRcvd+1
LastByteRcvd-LastByteRead<=MaxRcvBuffer
AdvertisedWindow = MaxRcvBuffer((NextByteExp-1)-LastByteRead)
Sequence Number
• Protect against SequenceNum wraparound
– Sliding window
• Seq # space >= 2 x WinSize
• For TCP: 232 >> 2 x 216
– Seq # should not wraparound within a MSL (120
sec) period of time
– For OC-48 (2.5 Gbps), time until wraparound: 14
sec
• TCP extension to the sequence # space for
protecting against seq # wrapping around
– Add 32-bit timestamp as optional header
Keep the pipe full
• AdvertisedWindow: 216=>64 KB
– Big enough to allow the sender to keep the pipe
full (assume that the receiver has enough buffer
to handle the data)
– If RTT = 100 ms,
• Delay x Bandwidth = 122 KB for 10 Mbps Ethernet
• Delay x Bandwidth = 1.2 MB for 100 Mbps Ethernet
(AdvertisedWindow is not large enough)
• TCP Extension:
– Scaling factor option for AdvertisedWindow,
• e.g. use 16-byte units of data
Triggering Transmission
• When to transmit a segment:
– small segments subject to large overhead
• Reach max segment size (MSS): the size of
the largest segment TCP can send without
causing the local IP to fragment
– MSS = local MTU – IP & TCP header
• The sending process explicitly ask the TCP to
transmit, “push”
TCP Deadlock
• TCP Deadlock
– receiver advertises a window size of 0, the sender
stops sending data
– the window size update from the receiver is lost
• To solve it:
– the sender starts the persist timer when
AdvertisedWindow = 0
– When the persist timer expires, the sender sends
a small packet
TCP Silly Window Syndrome
• TCP Silly Window Syndrome
–
–
–
–
–
–
Sender has MSS bytes of data to send, but window is closed
ACK arrives with a small window
Sender sends a small segment (high overhead)
Receiver advertise a small window
Sender sends a small receive segment
Repeat the above
• To solve: Nagle’s Algorithm
– When the application have data to send
• If both available data and the window >= MSS
– Send a full segment
• Else
– If there is unACKed data in flight
» Buffer the new data until an ACK arrives
– Else
» Send all the new data now
TCP Error Control
• Cumulative retransmission: ack the
expected seq # in Ack field
– Extension: selective ack (SACK), ack
additional blocks of received data in TCP
optional header
• Adaptive retransmission
– Adapt the retran timer to RTT
TCP Timeout
• Original Algorithm:
– EstimatedRTT = a x EstimatedRTT + (1-a) x SampleRTT
0a 1
– Timeout = 2 x EstimatedRTT
– Issue: does not distinguish whether the ACK is for original
transmission or retransmission
• Karn/Partridge Algorithm
– Whenever TCP retransmits a segment, it stops taking
samples of the RTT
• Only measure SampleRTT for segments that have only have
been send once
– Each time TCP retransmits, set the next timeout to be twice
the last timeout
• Relieve congestion
TCP Timeout (Cont)
• TCP Timeout
– If timeout too soon, unnecessarily retransmit and
add load to network
– If timeout too late, increase latency
• Jacobson/Karels Algorithm: better RTT
estimation by considering the variance
Difference = SampleRTT - EstimatedRTT
EstimatedRTT = EstimatedRTT + (d x Difference)
Deviation = Deviation + d(|Difference|- Deviation)
Timeout = m x EstimatedRTT + f x Deviation
0d 1
(default: set m = 1 and f= 4, )
Congestion
Source
1
Dest
1
Source
2
Source
3
Dest
2
• TCP assumes packet loss as congestion
TCP Congestion Control
• TCP sends packets into network without
reservation
– Try to use network resource (bandwidth, buffer)
as much as it can
• As congestion occurs, scales back
• Strategy:
– Conservatively increases packet sending rate if no
congestion
– Quickly reduce sending rate as congestion
detected (timeout)
Additive increase/multiplicative decrease (AIMD)
• Maintain a CongestionWindow
– MaxWindow = MIN(CongestionWindow, AdvertisedWindow)
– EffectiveWin = MaxWindow – (LastByteSent –
LastByteAcked)
• Decrease congestion window aggressively and
increase it conservatively
– A simple algorithm: Additive increase/multiplicative decrease
(AIMD)
– Each time a timeout occurs, congestion window size (cwnd)
• cwnd = cwnd/2
– cwnd = Max(Cwnd, MSS)
– Each time an ACK received
• Increment = MSS x (MSS/Cwnd)
– Cwnd += Increment
• (CongestionWindow increase by one packet or MSS after all
packets sent out during last RTT have been acked)
CongestionWindow Size
Additive increase/multiplicative decrease (Cont)
• Issues with additive increase:
– takes too long to ramp up a
connection from the
beginning
– The entire advertised window
may be reopened when a lost
packet retransmitted and a
single cumulative ACK is
received by the sender
Time
• TCP sawtooth pattern
TCP “Slow Start”
• When timeout
– Slow Start Threshold (SSThresh) = cwnd/2
– Cwnd = MSS
• when receive an ack
– If cwnd <= SSThresh (Slow start phase)
incr = MSS
(exponential growth, double cwnd every RTT)
– Else (congestion avoidance mode)
incr = MSS x MSS/cwnd
(linear growth, add 1 MSS per RTT)
Cwnd = min(cwnd+incr, TCP_MAXWIN)
CongestionWindow Size
TCP Slow Start (Cont)
• Slow start
Time
Slow-start and congestion avoidance
•
A closer look
Fast Retransmit
•TCP timeout issue:
– may be a long time periods
of time during which the
connection went dead while
waiting for a timer expire
•Solution:
– Add fast retransmit (not
replace regular timeout):
PKT 1
PKT 2
PKT 3
PKT 4
PKT 5
PKT 6
• Sender retransmit the missing
packet after it receives some
number of duplicate ACKs
(e.g. 3 duplicate ACKs)
ACK 2
ACK 2
ACK 2
ACK 2
• Everytime receiver receives a
packet (out-of-order), send a
duplicate ACK
ACK 1
PKT 3
Retran
ACK 6
Fast Recovery
• during congestion avoidance mode, when packets
(detected through 3 duplicate ACKs) are not
received, the congestion window size is reduced to
the slow-start threshold, rather than the small initial
value 1 MSS.
– Cwnd = SSThresh = cwnd/2
• (escape slow start phase when fast retransmit
detects a lost packet and additive increase
begins)
TCP Sender Operation
• TCP operation is paced by its ACKs
New
ACK
Wait for ACK
Duplicate
ACK
•If 3rd duplicate ACK
•Fast_retransmit()
•Fast_recovery()
Timeout
ACKs indicate possible losses
No ACKs
ACKs indicate lossless operation
(operate in slow-start or congestion
avoidance)
•Measure RTT if applicable
•Set cwnd_start = purge_acked_pkts()
•Set cwnd = cwnd + increment_value()
•Send new packets
•Retart timer if applicable
•Set ssthresh = cwnd/2, cwnd = 1
•Retransmit cwnd_start packet
TCP Reno
• Most popular TCP flavor; implemented in most
operating systems
• Includes Fast Retransmit and Fast Recovery
Real-time Traffic
• Quality of Service (QoS) factors: Reliability, Delay and Jitter
– Late arrival = loss
– Because of possibly unbounded retransmissions in TCP, large delay
and jitter may ensue.
• Real-time applications prefer UDP instead.
• Jitter can be smoothed out by a playback buffer, but initial
buffering introduces a longer delay and reduce the interactivity.
– Depends on application delay tolerance
• Real-time transport protocol
(RTP) operates over UDP,
– RTP modules run in userspace. RTP libraries included in
the application.
– RTP provides no error recovery
and congestion control;
Applications handle all aspects
themselves.
Application
RTP
UDP
IP
Subnet
RTP
Payload
Sequence number
Type
Timestamp
Sync. source (SSRC) identifier
Contributing source (CSRC) identifier (optional)
..........
V=2 P X
CC
M
Extension header
RTP payload
Padding length
• RTP standard: RFC 3550
RTCP
• Real-time transport control protocol (RTCP)
– provides out-of-band statistics and control
information for an RTP flow
– partners RTP in the delivery and packaging of
multimedia data, but does not transport any
media streams itself.
– RTP uses even port number, whereas RTCP uses
the next higher odd port number.
RTCP Functions
• RTCP Functions
– Gather statistics on quality aspects of the media distribution
• can be used by the source for adaptive media encoding (codec)
and detection of transmission faults
– Convey the canonical end-point identifiers (CNAME) to all
session participants
• SSRC may change during a session, but CNAME represents the
unique identity of a sender
– Correlate and synchronize different media streams
• RTCP messages
–
–
–
–
Source reports
Receiver reports
Source descriptions
Application-specific control packets
Homework
•
•
•
•
•
5.13
5.16
5.28
5.34
5.39
Due 4/17