TCP Congestion control, UDP and IP multicast.
Download
Report
Transcript TCP Congestion control, UDP and IP multicast.
UDP, TCP/IP, and IP Multicast
COM S 414
Sunny Gleason, Vivek Uppal
Tuesday, October 23rd, 2001
In This Lecture
• We will build on understanding of IP (Internet
Protocol)
– UDP: User Datagram Protocol
• Unreliable, packet-based protocol
– TCP: Transmission Control Protocol
• Reliable, connection-oriented, stream-based protocol
– IP Multicast (if time allows…)
• Facilities for delivering datagrams to multiple recipients
– We won’t discuss ICMP (Internet Control Message
Protocol), but you can look it up if you want
Where To Find More Info
• For More “Practical” Information
– Network Programming in Java
• The Java Custom Networking Trail
http://java.sun.com/docs/books/tutorial/networking/sockets/
http://java.sun.com/docs/books/tutorial/networking/datagrams/
– Network Programming in C
• Books by W. Richard Stevens [HIGHLY recommended!]
– “TCP/IP Illustrated” Series
– UNIX Network Programming, Vol. 1
– Kernel Source – “Real” Protocol Stacks
• Linux TCP/IP Stack
– http://www.kernel.org/pub/linux/kernel/v2.4/
• OpenBSD TCP/IP Stack
– ftp://ftp.openbsd.org/pub/OpenBSD/src/sys/netinet/
Where to Find More Info
• Papers, Lecture Notes and RFC’s
– TCP Congestion Control
• Van Jacobson, “Congestion Avoidance and
Control”, 1988
• Internet RFC Series:
http://www.rfc-editor.org/
– CS514 - Fall 2000 Lecture Notes
– Birman, Kenneth. Building Secure and
Reliable Network Applications. 1995.
First, some definitions…
• Keep the OSI Layers in mind!
• Address
– An identifier, following an addressing convention,
which allows a machine to be uniquely identified
• MAC Address, or Hardware Address
– Numeric address used by Ethernet (data-link
layer)
– Might look like: “00:02:2D:08:68:F8”
• IP Address
– Numeric address used by IP (network layer)
– Might look like: “128.84.133.221”
First, some definitions…
• Packet, or Datagram
– self-contained unit of information
– consists of a header and body
• Packet Header
– For now, realize that it includes source
address, destination address
– With layered model, “nesting” of headers
First, some definitions…
• Local Area Network (LAN)
– Group of machines sharing a common
communications medium (such as Ethernet)
– High data rates, “private wires”, shorter distances
• Wide Area Network (WAN)
– spans a greater geographic area, may depend on
publicly available network structures
(telephone system, leased lines, satellites…)
First, some definitions…
• Router
– Machine that moves packets from one network to
a network that is closer to the destination
– (Based on a routing table, which may change)
• Bridge
– A machine that “indiscriminantly” replicates
packets between two LANs
– typically “not as smart” but faster than a router
• Gateway
– A machine that routes packets from the LAN to
the WAN (What is a Firewall?)
First, some definitions…
• Port
– In UDP and TCP, a number which the kernel uses
to deliver datagrams to the appropriate application
– For instance: HTTP is port 80, SMTP is port 25,
Telnet is port 23, DNS is port 53, FTP is port 21
• In this model, receivers agree to wait for
datagrams on a specified port
• Socket: {address, port}
The Internet
• A network based on the Internet
Protocol (IP)
= Router
The Internet
• Routes IP Datagrams from point A to
point B … [unreliably]
B: 128.84.154.132
A: 171.64.14.203
= Router
Unreliably?
• What good is that?
• Packet loss rate is extremely low (<<
1%)
• Packets usually dropped by overloaded
routers (as we’ll see later)
• This is good enough for us to build the
User Datagram Protocol (UDP)
UDP
• For applications where IP guarantees of
reliability are good-enough
– Streaming multimedia, stock quotes…
• Extends IP packet with source port,
destination port
• In addition, provides fragmentation
(and checksum)
Fragmentation in UDP
• Very simple: splits large UDP datagram into
multiple IP datagrams, each with a sequence
number
• Marks “fragmented” bit in the UDP header
• If one fragment is lost, the whole UDP packet
is discarded
• UDP datagrams are discarded if checksum
fails
The UDP API
• No-frills! Basically, you:
– Create a socket {address, port}
– Send data to a remote socket
– Receive data on a given socket
• No guarantees about reliability, or even
the ordering in which datagrams are
received
• How can we get around this?
Adding Reliability to UDP
• Timeouts & Acknowledgements
– Receiver sends acks of received datagrams
– If sender does not receive ack within a certain
time, retransmit the packet
• Sequence Numbers
– Sender marks datagrams with sequence numbers
– Receiver uses sequence numbers to restore order
to the datagrams, and ignore duplicates
• What if we have 100 or more concurrent
applications? Is this efficient?
TCP
• A TCP connection is defined by:
– { src_addr, src_port, dst_addr, dst_port }
– Note symmetry at both ends of connection
– Thus, sender is a receiver and vice-versa
• The goal: a reliable, stream-based,
connection-oriented protocol
– Reliable: data gets through [or connection breaks]
– Stream-based: imagine reading a file in-order
– Connection-oriented: point-to-point
• How is it all done?
Vivek Presents …
• The inner workings of the TCP
protocol…
• Any questions before we move on?
TCP
•
•
•
•
•
•
•
TCP – Stream Protocol
3-way Handshake
Closing a connection
Acknowledgments
Sliding Window
Flow Control
RED
TCP -- Stream Protocol
• Connection oriented
• like a telephone connection
• Needs set up before the transfer starts.
• Reliable, point to point communication.
• In order delivery
• No loss or duplication.
• Flow Control and error correction
• Duplex connections
3 Way Hand Shake
TCP is connection Oriented
A
Syn
Connection initiated by a
3 - way handshake
Takes 3 packets
Protection against duplicate
Syn Packets
Syn, Ack Of Syn
Ack Of Syn
B
Basic 3 Way Handshake
TCP A
TCP B
SEQ ACK
1. CLOSED
2. SYN-SENT <100>
CTL
LISTEN
<SYN> SYN-RECV
3. ESTABLISH <300> <101> <SYN,ACK> SYN-RECV
4. ESTABLISH <101> <301>
<ACK> ESTABLISH
Duplicate Recovery
TCP A
TCP B
SEQ ACK
CTL
1. CLOSED
LISTEN
2. SYN-SENT <100>
<SYN>
3. (duplicate) ... <90>
<SYN> SYN-RECV
...
4.
<300>
<91> <SYN,ACK> (duplicate)
5.
<91>
6.
... <100>
<SYN> SYN-RECV
7. SYN-SENT <400>
<101> <SYN,ACK> SYN-RECV
8. ESTABLISH <101>
<401>
<RST> LISTEN
<ACK> ESTABLISH
3 Way Handshake
It ensures that both sides are ready to
transmit data, and that both ends know
that the other end is ready before
transmission actually starts.
It allows both sides to pick the initial
sequence number to use.
Closing a Connection
Send a Fin packet before
tearing the connection
Both processes must send
Fin packets separately for
closing the connection in that
direction
A
B
Fin, Ack
Ack of Fin
Closing a Connection
TCP A
TCP B
SEQ ACK
CTL
1. ESTABLISHED
ESTABLISHED
2. (Close)
FIN
<100> <300> <FIN,ACK>
CLOSE-WAIT
3. FIN
<300> <101> <ACK>
4.
(Close)
<300> <101> <FIN,ACK> LAST-ACK
5.
<101> <301> <ACK>
CLOSED
Acknowledgements
• Receiver acks only the last in-order
packet received
• Send nacks for out-of-order packets
• Sender resends the first
unacknowledged packet
• timeout typically set to 1.5 * round trip
times
Sliding Window
The sender window has k segments (buffers)
Initially Empty
Initially Empty
Sliding Window
Send message m[i]
m[i]
m[i]
Empty
Sliding Window
m[i] m[i+1] … … m[i+k]
ack
m[i] m[i+1]
Sliding Window
m[i+2] m[i+3] … … m[i+k+1]
m[i]
ack
m[i+1]
Have been acked
m[i+2] m[i+3]
TCP Congestion Control
• Dynamically adjust window size
• Sender should not swamp the receiver – both
sides advertise maximum window size
• Linear increase -- When packets are getting
through, increment the window size by 1.
• When a packet is dropped, halve the window
size, and double the retransmission timeouts
-- exponential backoff.
• Also called TCP fairness/friendliness
TCP Slow start
• Might take some time to get to the
maximum possible window size
Optimization:
• Exponential increase to start with.
• Then follow the linear increase
exponential back off when the first
packet is lost
RED
•
•
•
•
Random Early Detection
Idea is very simple
Router senses that load is increasing
It simply notices that it has less
available memory for buffering
• This is because packets are entering
faster than they can be forwarded
RED …
•
•
•
•
Picks a packet at random and discards it
Even though perhaps it could be forwarded
Receiver detects the loss and sends a NACK
The network isn’t completely overloaded yet
so the NACK gets through
• Sender chokes back
Sunny Presents
• IP Multicast …
• Any questions before we move on?
• Note: Slides were stolen from CS514
FA2000 Web site
Unicast to multiple hosts
Multicast to multiple hosts
“to group”
Why do multicast?
• Send to a group, not to individual hosts
– Reduces overhead in sender
– Reduces bandwidth consumption in
network
– Reduces latency seen by receivers (all
receive “at the same time”, in theory)
Logical addressing
• Multicast groups “handled by network”
• Senders, receivers do not need to know
each others’ identities
• Group persists as long as it has at least
one member
• a “rendezvous” mechanism
Applications
•
•
•
•
•
Teleconferencing
Distance learning
Multimedia streaming
Directory service lookup
...
Multicasting for resource
location
• Expanding-ring search
• We want to find an instance of a
resource (database, etc) which is close
by
• Use multicast with IP time-to-live (TTL)
values
Time-to-live and hop counts
• TTL is a counter in the packet header
– Decrement at each “hop” through a router
– When TTL reaches zero, the packet is
dropped
– special values for “global” and “regional”
TTL (use with care!)
Expanding-ring search
“Find me a database”, TTL=1
Expanding-ring search
“Find me a database”, TTL=2
“I’m a database, what can I do for you?”
Multicast addresses
• Class D IP addresses for group
– 224.0.0.0 to 239.255.255.255
• Treated like any other IP address: can
send from it or listen to it
• In practice, use UDP as well (more on
this later)
Multicast at the LAN level
• Ethernet is a broadcast medium: all
network cards see all packets
• Register the multicast address in the
network card
– only pass matching packets to OS
– all other packets are ignored
Multicast beyond the LAN
• We would like to multicast between
hosts on different LANs
– LANs are joined together directly by
bridges
– or can be connected through the Internet
by a sequence of routers
– need an inter-LAN (WAN) protocol
• (in fact, this is rarely enabled!)
A naive approach
• We want to send multicasts everywhere
where there are group members
– use flooding to send multicast between
routers
– when we get to a LAN, use regular
(Ethernet) multicast
Multicast by flooding
router
group member
non-member
Multicast by flooding
router
group member
non-member
Why simple flooding doesn’t
work
router
group member
non-member
Why simple flooding doesn’t
work
wasted!
router
group member
non-member
Multicast flooding
• Not a scalable mechanism
– every LAN sees every multicast
– every WAN router sees every multicast:
wastes bandwidth, CPU
• Requires a two-part solution
– determining LAN group members
– omitting WAN routers from multicast
Multicast trees
• Shortest-path tree to all multicast
members, rooted at sender
• But must be computed independently
by each router
• And must be dynamically adjusted for
joins and leaves
A multicast tree
A multicast tree
IGMP
• Internet Group Management Protocol
(Deering and Cheriton)
• Developed from work in V distributed
operating system
– introduced notion of process groups
(Cheriton and Zwaenepol)
– groups for services, e.g. name resolution,
remote paging
IGMP
• Detects if a multicast group has any
members within a LAN
• Query and report messages
– router sends query of group membership
periodically
– hosts report groups they’re in
IGMP
Internet
“Who is a member?”
IGMP
Internet
“I am”
“I am”
“I am”
IGMP
Internet
“I am”
“I am”
“I am”
Avoiding overloading
• Report packets may overload router
– upon getting a query, each group member
sets a timer
– if it sees a report for its group before the
timer expires, it suppresses its report
– otherwise reports on expiration
THE END!
• Any questions?
• Slides will be put up on the web
• If interested, check out the sources for
more information