Internet Protocols - NYU Computer Science Department
Download
Report
Transcript Internet Protocols - NYU Computer Science Department
Data Communication and
Networks
Lecture 9/10
Internet Protocols
November 6, 2003
Joseph Conron
Computer Science Department
New York University
[email protected]
What’s the Internet: Components view
millions of connected computing devices: hosts, end-
systems
pc’s workstations, servers
PDA’s phones, toasters
running network apps
communication links
fiber, copper, radio, satellite
routers: forward packets (chunks) of data thru network
What’s the Internet:
Components view
protocols: control sending, receiving of msgs
e.g., TCP, IP, HTTP, FTP, PPP
Internet: “network of networks”
loosely hierarchical
public Internet versus private intranet
Internet standards
RFC: Request for comments
IETF: Internet Engineering Task Force
What’s the Internet: a service view
communication infrastructure enables
distributed applications:
WWW, email, games, e-commerce, database.,
voting,
more?
communication services provided:
connectionless
connection-oriented
Internet structure: network of networks
roughly hierarchical
national/international backbone
providers (NBPs)
e.g. BBN/GTE, Sprint, AT&T,
IBM, UUNet
interconnect (peer) with each
other privately, or at public
Network Access Point (NAPs)
regional ISPs
connect into NBPs
local ISP, company
connect into regional ISPs
local
ISP
regional ISP
NBP B
NAP
NAP
NBP A
regional ISP
local
ISP
Connectionless Operation
Corresponds to datagram mechanism in packet switched
network
Each NPDU treated separately
Network layer protocol common to all DTEs and routers
Known generically as the internet protocol
Internet Protocol
One such internet protocol developed for ARPANET
RFC 791 (Get it and study it)
Lower layer protocol needed to access particular
network
Connectionless Internetworking
Advantages
Flexibility
Robust
No unnecessary overhead
Unreliable
Not guaranteed delivery
Not guaranteed order of delivery
Packets can take different routes
Reliability is responsibility of next layer up (e.g. TCP)
Internet protocol stack
application: supporting network applications
ftp, smtp, http
transport: host-host data transfer
tcp, udp
network: routing of datagrams from source to
destination
ip, routing protocols
link: data transfer between neighboring network
elements
ppp, ethernet
physical: bits “on the wire”
application
transport
network
link
physical
Protocol layering and data
Each layer takes data from above
adds header information to create new data unit
passes new data unit to layer below
source
M
Ht M
Hn Ht M
Hl Hn Ht M
application
transport
network
link
physical
destination
application
Ht
transport
Hn Ht
network
Hl Hn Ht
link
physical
M
message
M
segment
M
M
datagram
frame
Internet Protocol (IP)
Only protocol at Layer 3
Defines
Internet addressing
Internet packet format
Internet routing
RFC 791 (1981)
IP Address Details
32 Bits - divided into two parts
Prefix identifies network
Suffix identifies host
Global authority assigns unique prefix to network (IANA)
Local administrator assigns unique suffix to host
IP Addresses
given notion of “network”, let’s examine IP addresses:
“class-full” addressing:
class
A
0 network
B
10
C
110
D
1110
1.0.0.0 to
127.255.255.255
host
network
128.0.0.0 to
191.255.255.255
host
network
multicast address
32 bits
host
192.0.0.0 to
223.255.255.255
224.0.0.0 to
239.255.255.255
Classes and Network Sizes
Maximum network size determined by class of
address
Class A large
Class B medium
Class C small
IP Addressing Example
Subnets and Subnet Masks
Allow arbitrary complexity of internetworked LANs within
organization
Insulate overall internet from growth of network
numbers and routing complexity
Site looks to rest of internet like single network
Each LAN assigned subnet number
Host portion of address partitioned into subnet number
and host number
Local routers route within subnetted network
Subnet mask indicates which bits are subnet number
and which are host number
Routing Using Subnets
IP addressing: CIDR
classful addressing:
inefficient use of address space, address space exhaustion
e.g., class B net allocated enough addresses for 65K hosts, even if
only 2K hosts in that network
CIDR: Classless InterDomain Routing
network portion of address of arbitrary length
address format: a.b.c.d/x, where x is # bits in network portion of
address
network
part
host
part
11001000 00010111 00010000 00000000
200.23.16.0/23
Internet Packets
Contains sender and destination addresses
Size depends on data being carried
Called IP datagram
Two Parts Of An IP Datagram
Header
Contains source and destination address
Fixed-size fields
Data Area (Payload)
Variable size up to 64K
No minimum size
IP datagram format
IP protocol version
number
header length
(bytes)
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to
32 bits
type of
ver head.
len service
length
fragment
16-bit identifier flgs
offset
time to upper
Internet
layer
live
checksum
total datagram
length (bytes)
for
fragmentation/
reassembly
32 bit source IP address
32 bit destination IP address
Options (if any)
data
(variable length,
typically a TCP
or UDP segment)
E.g. timestamp,
record route
taken, specify
list of routers
to visit.
IP Fragmentation & Reassembly
network links have MTU
(max.transfer size) - largest
possible link-level frame.
fragmentation:
in: one large datagram
out: 3 smaller datagrams
different link types, different
MTUs
large IP datagram divided
(“fragmented”) within net
one datagram becomes
several datagrams
“reassembled” only at final
destination
IP header bits used to
identify, order related
fragments
reassembly
IP Fragmentation and Reassembly
length ID fragflag offset
=4000 =x
=0
=0
One large datagram becomes
several smaller datagrams
length ID fragflag offset
=1500 =x
=1
=0
length ID fragflag offset
=1500 =x
=1
=1480
length ID fragflag offset
=1040 =x
=0
=2960
IP Semantics
IP is connectionless
Datagram contains identity of destination
Each datagram sent/ handled independently
Routes can change at any time
IP Semantics (continued)
IP allows datagrams to be
Delayed
Duplicated
Delivered out-of-order
Lost
Called best effort delivery
Motivation: accommodate all possible
networks
Datagram Lifetime
Datagrams could loop indefinitely
Consumes resources
Transport protocol may need upper bound on
datagram life
Datagram marked with lifetime
Time To Live field in IP
Once lifetime expires, datagram discarded (not
forwarded)
Hop count
Decrement time to live on passing through a each router
Time count
Need to know how long since last router
ICMP
Internet Control Message Protocol
RFC 792
Transfer of (control) messages from routers and
hosts to hosts
Feedback about problems
e.g. time to live expired
Encapsulated in IP datagram
Not reliable
ICMP Error Messages
When an ICMP error message is sent, the
message always contains the IP header and the
first 8 bytes of the IP datagram that caused the
problem
ICMP has rules regarding error message
generation to prevent broadcast storms
ICMP Echo Command
Used by “ping” and “tracert”
When a destination IP host receives an ICMP
echo command, it returns and ICMP “echo
reply”
Ping uses this to determine if a path to a
destination (and its return path) are “up”
Tracert uses echo in a clever way to determine
the identities of the routers along the path (by
“scoping” TTL).
Address Resolution Problem
Suppose we know the IP Address of a local
system (one to which we are connected)
We would like to send an IP packet to that
system.
The link layer (ethernet, for instance) only
knows about MAC addresses!
How do we determine the MAC address
associated with the IP address?
ARP
Address resolution provides a mapping between
two different forms of addresses
32-bit IP addresses and whatever the data link uses
ARP (address resolution protocol) is a protocol
used to do address resolution in the TCP/IP
protocol suite (RFC826)
ARP provides a dynamic mapping from an IP
address to the corresponding hardware address
ARP Protocol
A knows B's IP address, wants to learn physical address
of B
A broadcasts ARP query pkt, containing B's IP address
all machines on LAN receive ARP query
B receives ARP packet, replies to A with its (B's) physical
layer address
A caches (saves) IP-to-physical address pairs until
information becomes old (times out)
soft state: information that times out (goes away) unless
refreshed
ARP Cache
The cache maintains the recent IP to physical
address mappings
Each entry is aged (usually the lifetime is 20
minutes) forcing periodic updates of the cache
ARP replies are often broadcast so that all hosts
can update their caches
ARP Packet Format
8
16
31
Hardware Type
Hardware Size
Protocol Type
Protocol Size
Operation
Sender’s Hardware Address (for Ethernet 6 bytes)
Sender’s Protocol Address
(for IP 4 bytes)
Target Hardware Address
Target Protocol Address
Destination IP Address
Internet Transport Protocols
Two Transport Protocols Available
Transmission Control Protocol (TCP)
connection oriented
most applications use TCP
RFC 793
User Datagram Protocol (UDP)
Connectionless
RFC 768
Transport layer addressing
Communications endpoint addressed by:
IP address (32 bit) in IP Header
Port number (16 bit) in TP Header1
Transport protocol (TCP or UDP) in IP Header
1
TP => Transport Protocol (UDP or TCP)
Standard services and port numbers
service
echo
daytime
netstat
ftp-data
ftp
telnet
smtp
time
domain
finger
http
pop-2
pop
sunrpc
uucp-path
nntp
talk
tcp
udp
7
7
13
13
15
20
21
23
25
37
37
53
53
79
80
109
110
111
111
117
119
517
TCP:
Overview
RFCs: 793, 1122, 1323, 2018, 2581
point-to-point:
full duplex data:
one sender, one receiver
bi-directional data flow in
same connection
MSS: maximum segment
size
reliable, in-order byte
steam:
no “message boundaries”
pipelined:
connection-oriented:
handshaking (exchange of
control msgs) init’s
sender, receiver state
before data exchange
TCP congestion and flow
control set window size
send & receive buffers
flow controlled:
socket
door
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
segment
socket
door
sender will not overwhelm
receiver
TCP Header
TCP segment structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
rcvr window size
ptr urgent data
Options (variable length)
application
data
(variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
Reliability in an Unreliable World
IP offers best-effort (unreliable) delivery
TCP uses IP
TCP provides completely reliable transfer
How is this possible? How can TCP realize:
Reliable connection startup?
Reliable data transmission?
Graceful connection shutdown?
Reliable Data Transmission
Positive acknowledgment
Receiver returns short message when data arrives
Called acknowledgment
Retransmission
Sender starts timer whenever message is transmitted
If timer expires before acknowledgment arrives, sender retransmits
message
THIS IS NOT A TRIVIAL PROBLEM! – more on this later.
TCP Flow Control
Receiver
Advertises available buffer space
Called window
This is a known as a CREDIT policy
Sender
Can send up to entire window before ACK arrives
Each acknowledgment carries new window
information
Called window advertisement
Can be zero (called closed window)
Interpretation: I have received up through X, and can
take Y more octets
Credit Scheme
Decouples flow control from ACK
May ACK without granting credit and vice versa
Each octet has sequence number
Each transport segment has seq number, ack
number and window size in header
Use of Header Fields
When sending, seq number is that of first octet
in segment
ACK includes AN=i, W=j
All octets through SN=i-1 acknowledged
Next expected octet is i
Permission to send additional window of W=j
octets
i.e. octets through i+j-1
Credit Allocation
TCP Flow Control
flow control
sender won’t overrun
receiver’s buffers by
transmitting too much,
too fast
RcvBuffer = size of TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
receiver buffering
receiver: explicitly informs
sender of (dynamically
changing) amount of
free buffer space
RcvWindow field in
TCP segment
sender: keeps the amount
of transmitted, unACKed
data less than most
recently received
RcvWindow
TCP seq. #’s and ACKs
Seq. #’s:
byte stream
“number” of first
byte in segment’s
data
ACKs:
seq # of next byte
expected from other
side
cumulative ACK
Q: how receiver handles
out-of-order segments
A: TCP spec doesn’t
say, - up to
implementor
Host A
User
types
‘C’
Host B
host ACKs
receipt of
‘C’, echoes
back ‘C’
host ACKs
receipt
of echoed
‘C’
simple telnet scenario
time
TCP ACK generation
[RFC 1122, RFC 2581]
Event
TCP Receiver action
in-order segment arrival,
no gaps,
everything else already ACKed
delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
in-order segment arrival,
no gaps,
one delayed ACK pending
immediately send single
cumulative ACK
out-of-order segment arrival
higher-than-expect seq. #
gap detected
send duplicate ACK, indicating seq. #
of next expected byte
arrival of segment that
partially or completely fills gap
immediate ACK if segment starts
at lower end of gap
TCP: retransmission scenarios
time
Host A
Host B
X
loss
lost ACK scenario
Host B
Seq=100 timeout
Seq=92 timeout
timeout
Host A
time
premature timeout,
cumulative ACKs
Why Startup/ Shutdown Difficult?
Segments can be
Lost
Duplicated
Delayed
Delivered out of order
Either side can crash
Either side can reboot
Need to avoid duplicate ‘‘shutdown’’ message from affecting
later connection
TCP Connection Management
Recall: TCP sender, receiver
establish “connection” before
exchanging data segments
initialize TCP variables:
seq. #s
buffers, flow control info
(e.g. RcvWindow)
client: connection initiator
Socket clientSocket = new
Socket("hostname","port
number");
server: contacted by client
Socket connectionSocket =
welcomeSocket.accept();
Three way handshake:
Step 1: client end system sends
TCP SYN control segment to
server
specifies initial seq #
Step 2: server end system
receives SYN, replies with
SYNACK control segment
ACKs received SYN
allocates buffers
specifies server-> receiver
initial seq. #
TCP Connection Management (OPEN)
client
server
opening
opening
established
closed
TCP Connection Management (cont.)
Closing a connection:
client closes socket:
clientSocket.close();
client
server
close
Step 1: client end system sends
TCP FIN control segment to
server
close
replies with ACK. Closes
connection, sends FIN.
timed wait
Step 2: server receives FIN,
closed
TCP Connection Management (cont.)
Step 3: client receives FIN,
client
replies with ACK.
Enters “timed wait” - will
respond with ACK to
received FINs
closing
Step 4: server, receives ACK.
closing
can handle simultaneous
FINs.
timed wait
Connection closed.
Note: with small modification,
server
closed
closed
TCP Connection Management
(cont)
TCP server
lifecycle
TCP client
lifecycle
Timing Problem!
The delay required for data to reach a destination and an
acknowledgment to return depends on traffic in the internet as
well as the distance to the destination. Because it allows
multiple application programs to communicate with multiple
destinations concurrently, TCP must handle a variety of delays
that can change rapidly.
How does TCP handle this .....
Solving Timing Problem
Keep estimate of round trip time on each connection
Use current estimate to set retransmission timer
Known as adaptive retransmission
Key to TCP’s success
TCP Round Trip Time and Timeout
Q: how to set TCP
timeout value?
longer than RTT
note: RTT will vary
too short: premature
timeout
unnecessary
retransmissions
too long: slow reaction
to segment loss
Q: how to estimate RTT?
SampleRTT: measured time from
segment transmission until ACK
receipt
ignore retransmissions,
cumulatively ACKed segments
SampleRTT will vary, want
estimated RTT “smoother”
use several recent
measurements, not just current
SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
Exponential weighted moving average
influence of given sample decreases exponentially fast
typical value of x: 0.1
Setting the timeout
EstimtedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin
Timeout = EstimatedRTT + 4*Deviation
Deviation = (1-x)*Deviation +
x*|SampleRTT-EstimatedRTT|
Implementation Policy Options
Send
Deliver
Accept
Retransmit
Acknowledge
Send
If no push or close TCP entity transmits at its
own convenience (IFF send window allows!)
Data buffered at transmit buffer
May construct segment per data batch
May wait for certain amount of data
Deliver (to application)
In absence of push, deliver data at own
convenience
May deliver as each in-order segment received
May buffer data from more than one segment
Accept
Segments may arrive out of order
In order
Only accept segments in order
Discard out of order segments
In windows
Accept all segments within receive window
Retransmit
TCP maintains queue of segments transmitted
but not acknowledged
TCP will retransmit if not ACKed in given time
First only
Batch
Individual
Acknowledgement
Immediate
as soon as segment arrives.
will introduce extra network traffic
Keeps sender’s pipe open
Cumulative
Wait a bit before sending ACK (called “delayed ACK”)
Must use timer to insure ACK is sent
Less network traffic
May let sender’s pipe fill if not timely!
UDP: User Datagram Protocol
“no frills,” “bare bones”
Internet transport protocol
“best effort” service, UDP
segments may be:
lost
delivered out of order to
app
connectionless:
no handshaking between
UDP sender, receiver
each UDP segment
handled independently of
others
[RFC 768]
Why is there a UDP?
no connection
establishment (which
can add delay)
simple: no connection
state at sender, receiver
small segment header
no congestion control:
UDP can blast away as
fast as desired
UDP: more
often used for streaming
multimedia apps
loss tolerant
Length, in
bytes of UDP
rate sensitive
other UDP uses
DNS
SNMP
reliable transfer over UDP:
add reliability at application
layer
application-specific error
recover!
segment,
including
header
32 bits
source port #
dest port #
length
checksum
Application
data
(message)
UDP segment format
UDP Uses
Inward data collection
Outward data dissemination
Request-Response
Real time application