Course Review

Download Report

Transcript Course Review

Reviewing the Course
Overview, TCP/IP Stack, ARP, LAN Switching, IP, Subnetting, UDP, TCP,
NAT
1
Networking Concepts
• Protocol Architecture (Stack or Suite)
• Protocol Layers
• Encapsulation
• Network Abstractions
2
TCP/IP Stack and OSI Reference Model
Application
Layer
The TCP/IP protocol stack does not
define the lower layers of a complete
protocol stack
Application
Layer
Transport
Layer
Network
Layer
(Data) Link
Layer
Presentation
Layer
Session
Layer
Transport
Layer
Network
Layer
(Data) Link
Layer
Physical
Layer
TCP/IP Suite
OSI
Reference
Model
3
TCP/IP Protocol Stack
• IP is the waist of the
hourglass of the Internet
protocol architecture
Applications
HTTP FTP SMTP
• Multiple higher-layer protocols
• Multiple lower-layer protocols
• Only one protocol at the
network layer.
TCP UDP
IP
Data link layer
protocols
Physical layer
protocols
4
Assignment of Protocols to Layers
ping
application
HTTP
Telnet
FTP
TCP
DNS
SNMP
Application
Layer
Transport
Layer
UDP
Routing Protocols
ICMP
RIP
IP
IGMP
PIM
Network
Layer
OSPF
DHCP
ARP
Ethernet
Network
Interface
Data Link
Layer
5
Sending a packet from Argon to Neon
argon.tcpip-lab.edu
"Argon"
128.143.137.144
neon.tcpip-lab.edu
"Neon"
128.143.71.21
router137.tcpip-lab.edu
"Router137"
128.143.137.1
router71.tcpip-lab.edu
"Router71"
128.143.71.1
Router
Ethernet Network
Ethernet Network
6
Sending a packet128.143.71.21
from Argon
to
Neon
is not on my local network.
Therefore, I need to send the packet to my
128.143.71.21
on my local
network.
default
gateway withisaddress
128.143.137.1
DNS:
DNS:
The is
IPisthe
address
address
of
Therefore, I can send the packet directly.
ARP:What
What
theIPMAC
of“neon.tcpip-lab.edu
“neon.tcpip-lab.edu
””is? of
address
128.143.137.1?
ARP:
TheofMAC
address
128.143.71.21
128.143.137.1 is 00:e0:f9:23:a8:20
argon.tcpip-lab.edu
"Argon"
128.143.137.144
ARP: What is the MAC
ARP:
TheofMAC
address of
address
128.143.71.21?
128.143.137.1 is neon.tcpip-lab.edu
00:20:af:03:98:28
"Neon"
128.143.71.21
router137.tcpip-lab.edu
"Router137"
128.143.137.1
router71.tcpip-lab.edu
"Router71"
128.143.71.1
Router
frame
Ethernet Network
frame
Ethernet Network
7
Communications Architecture
• The complexity of the communication task is reduced by
using multiple protocol layers:
• Each protocol is implemented independently
• Each protocol is responsible for a specific subtask
• Protocols are grouped in a hierarchy
• A structured set of protocols is called a communications
architecture or protocol suite or stack
8
TCP/IP Protocol Suite
• The TCP/IP protocol suite is the
protocol architecture of the
Internet
Application
User-level programs
Transport
• The TCP/IP suite has four layers:
Application, Transport, Network,
and Data Link Layer
• End systems (hosts) implement
all four layers. Gateways
(Routers) only have the bottom
two layers.
Operating system
Network
Data Link
Data Link
Media Access
Control (MAC)
Sublayer in
Local Area
Networks
9
Functions of the Layers
• Data Link Layer:
– Service:
Reliable transfer of frames over a link
Media Access Control on a LAN
– Functions: Framing, media access control, error checking
• Network Layer:
– Service:
Move packets from source host to destination host
– Functions: Routing, addressing
• Transport Layer:
– Service:
Delivery of data between hosts
– Functions: Connection establishment/termination, error
control, flow control
• Application Layer:
– Service:
Application specific (delivery of email, retrieval of HTML
documents, reliable transfer of file)
– Functions: Application specific
10
Layered Communications
• An entity of a particular layer can only communicate with:
1. a peer layer entity using a common protocol (Peer
Protocol)
2. adjacent layers to provide services and to receive
services
N+1 Layer
N+1 Layer
Entity
N+1 Layer Protocol
N+1 Layer
Entity
N Layer
Entity
N Layer Protocol
N Layer
Entity
N-1 Layer
Entity
N-1 Layer Protocol
N-1 Layer
Entity
layer N+1/N
interface
N Layer
layer N/N-1
interface
N-1 Layer
11
Layers in the Example
HTTP
HTTP protocol
HTTP
TCP
TCP protocol
TCP
IP
Ethernet
IP
IP protocol
Ethernet
argon.tcpiplab.edu
128.143.137.144
Ethernet
IP protocol
Ethernet
Ethernet
router71.tcpip- router137.tcpiplab.edu
lab.edu
128.143.137.1
128.143.71.1
00:e0:f9:23:a8:20
IP
Ethernet
neon.tcpip-lab.edu
128.143.71.21
12
Layers in the Example
HTTP
TCP
IP
Frame is an IP
datagram
Ethernet
Send HTTP Request
to neon
Establish a connection to 128.143.71.21 at
port 80Open TCP connection to
128.143.71.21 port 80
IP datagram is a TCP
segment for port 80
Send
IP data-gram
to
Send a datagram (which
contains
a connection
Send IP datagram
to
IP
128.143.71.21
request) to 128.143.71.21
128.143.71.21
Frame is an IP
datagram
Send the datagram to 128.143.137.1
Ethernet
Ethernet
HTTP
TCP
IP
Send the datagram
Ethernet
to 128.143.7.21
argon.tcpipneon.tcpip-lab.edu
router71.tcpip- router137.tcpipSend Ethernet frame
Send Ethernet frame
lab.edu
128.143.71.21
lab.edu
to 00:20:af:03:98:28
to 00:e0:f9:23:a8:20 lab.edu
128.143.137.144
128.143.137.1
128.143.71.1
00:e0:f9:23:a8:20
13
Layers and Services
• Service provided by TCP to HTTP:
– reliable transmission of data over a logical connection
• Service provided by IP to TCP:
– unreliable transmission of IP datagrams across an IP
network
• Service provided by Ethernet to IP:
– transmission of a frame across an Ethernet segment
• Other services:
– DNS: translation between domain names and IP addresses
– ARP: Translation between IP addresses and MAC addresses
14
Encapsulation and Demultiplexing
• As data is moving down the protocol stack, each protocol is
adding layer-specific control information
User data
HTTP
HTTP Header
User data
HTTP Header
User data
TCP
TCP Header
IP
TCP segment
IP Header
Ethernet
TCP Header
HTTP Header
User data
IP datagram
Ethernet
Header
IP Header
TCP Header
HTTP Header
User data
Ethernet
Trailer
Ethernet frame
15
Different Views of Networking
• Different Layers of the protocol stack have a different view of
the network. This is HTTP’s and TCP’s view of the network.
Argon
128.143.137.144
Neon
128.143.71.21
HTTP client
HTTP
server
HTTP
server
TCP client
TCP server
TCP server
IP Network
16
Network View of IP Protocol
17
Network View of Ethernet
• Ethernet’s view of the network
18
Address Resolution Protocol
(ARP)
19
Overview
TCP
UDP
ICMP
IP
IGMP
ARP
Network
Access
RARP
Transport
Layer
Network
Layer
Link Layer
Media
20
ARP and RARP
• Note:
– The Internet is based on IP addresses
– Data link protocols (Ethernet, FDDI, ATM) may have
different (MAC) addresses
• The ARP and RARP protocols perform the translation
between IP addresses and MAC layer addresses
• We will discuss ARP for broadcast LANs, particularly Ethernet
LANs
IP address
(32 bit)
ARP
RARP
Ethernet MAC
address
(48 bit)
21
Processing of IP packets by network drivers
IP Input
IP Output
Put on IP
input queue
Yes
Yes
IP destination = multicast
or broadcast ?
No
IP destination of packet
= local IP address ?
loopback
Driver
Put on IP
input queue
No: get MAC
address with
ARP
Ethernet
Driver
ARP
ARP
Packet
IP datagram
demultiplex
Ethernet Frame
Ethernet
22
Sending a packet from Argon to Neon
argon.tcpip-lab.edu
"Argon"
128.143.137.144
neon.tcpip-lab.edu
"Neon"
128.143.71.21
router137.tcpip-lab.edu
"Router137"
128.143.137.1
router71.tcpip-lab.edu
"Router71"
128.143.71.1
Router
Ethernet Network
Ethernet Network
23
Address Translation with ARP
ARP Request:
Argon broadcasts an ARP request to all stations on the
network: “What is the hardware address of Router137?”
Argon
128.143.137.144
00:a0:24:71:e4:44
Router137
128.143.137.1
00:e0:f9:23:a8:20
ARP Request:
What is the MAC address
of 128.143.71.1?
24
Address Translation with ARP
ARP Reply:
Router 137 responds with an ARP Reply which contains the
hardware address
Argon
128.143.137.144
00:a0:24:71:e4:44
Router137
128.143.137.1
00:e0:f9:23:a8:20
ARP Reply:
The MAC address of 128.143.71.1
is 00:e0:f9:23:a8:20
25
ARP Cache
• Since sending an ARP request/reply for each IP datagram is
inefficient, hosts maintain a cache (ARP Cache) of current
entries. The entries expire after 20 minutes.
• Contents of the ARP Cache:
(128.143.71.37) at 00:10:4B:C5:D1:15 [ether] on eth0
(128.143.71.36) at 00:B0:D0:E1:17:D5 [ether] on eth0
(128.143.71.35) at 00:B0:D0:DE:70:E6 [ether] on eth0
(128.143.136.90) at 00:05:3C:06:27:35 [ether] on eth1
(128.143.71.34) at 00:B0:D0:E1:17:DB [ether] on eth0
(128.143.71.33) at 00:B0:D0:E1:17:DF [ether] on eth0
26
Things to know about ARP
• What happens if an ARP Request is made for a non-existing
host?
Several ARP requests are made with increasing time
intervals between requests. Eventually, ARP gives up.
• What if a host sends an ARP request for its own IP address?
The other machines respond (gratuitous ARP) as if it
was a normal ARP request.
This is useful for detecting if an IP address has already
been assigned.
27
Proxy ARP
• Proxy ARP: Host or router responds to ARP Request that
arrives from one of its connected networks for a host that is
on another of its connected networks.
28
LAN Switching and Bridges
29
Outline
•
•
•
•
•
Interconnection Devices
Bridges/LAN Switches vs. Routers
Bridges
Learning Bridges
Transparent bridges
30
Introduction
• There are many different devices for interconnecting networks
Ethernet
Hub
Ethernet
Hub
Hosts
Hosts
Bridge
Router
X.25
Network
Tokenring
Gateway
31
Ethernet Hub
• Used to connect hosts to Ethernet LAN and to connect multiple Ethernet
LANs
• Collisions are propagated
Ethernet
Hub
Ethernet
Hub
Host
Host
IP
IP
LLC
LLC
802.3 MAC
Hub
Hub
802.3 MAC
32
Bridges/LAN switches
• A bridge or LAN switch is a device that interconnects two or more Local
Area Networks (LANs) and forwards packets between these networks.
• Bridges/LAN switches operate at the Data Link Layer (Layer 2)
Tokenring
Bridge
IP
IP
Bridge
LLC
802.3 MAC
LLC
LAN
802.3 MAC
LLC
802.5 MAC
LAN
802.5 MAC
Terminology: Bridge, LAN switch, Ethernet switch
There are different terms to refer to a data-link layer interconnection device:
• The term bridge was coined in the early 1980s.
• Today, the terms LAN switch or (in the context of Ethernet) Ethernet
switch are used.
Convention:
• Since many of the concepts, configuration commands, and protocols for
LAN switches were developed in the 1980s, and commonly use the old
term `bridge’, we will, with few exceptions, refer to LAN switches as
bridges.
34
Ethernet Hubs vs. Ethernet Switches
• An Ethernet switch is a packet switch for Ethernet frames
• Buffering of frames prevents collisions.
• Each port is isolated and builds its own collision domain
• An Ethernet Hub does not perform buffering:
• Collisions occur if two frames arrive at the same time.
Hub
Switch
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
CSMA/CD
HighSpeed
Backplane
CSMA/CD
Input
Buffers
CSMA/CD
CSMA/CD
Output
Buffers
35
Routers
• Routers operate at the Network Layer (Layer 3)
• Interconnect IP networks
IP network
IP network
IP network
Host
Router
Host
Router
Application
Application
TCP
TCP
IP
Network
Access
Host
IP
IP protocol
Data
Link
Network
Access
IP
IP protocol
Network
Access
Router
Data
Link
Network
Access
IP protocol
Network
Access
Router
Data
Link
IP
Network
Access
Host
36
Gateways
• The term “Gateway” is used with different meanings in
different contexts
• “Gateway” is a generic term for routers (Level 3)
• “Gateway” is also used for a device that interconnects
different Layer 3 networks and which performs translation of
protocols (“Multi-protocol router”)
SNA
Network
X.25
Network
IP Network
Host
Gateway
Host
Gateway
37
Interconnecting networks:
Bridges versus Routers
Routers
Bridges/LAN switches
• Each host’s IP address must be
configured
• MAC addresses of hosts are
hardwired
• If network is reconfigured, IP
addresses may need to be
reassigned
• No network configuration needed
• Routing done via RIP or OSPF
• Each router manipulates packet
header (e.g., reduces TTL field)
• Routing done by
– learning bridge algorithm
– spanning tree algorithm
• Bridges do not manipulate frames
38
Bridges
Overall design goal: Complete transparency
“Plug-and-play”
Self-configuring without hardware or software changes
Bridges should not impact operation of existing LANs
Three parts to understanding bridges:
(1) Forwarding of Frames
(2) Learning of Addresses
(3) Spanning Tree Algorithm
39
Need for a forwarding between networks
• What do bridges do if
some LANs are
reachable only in
multiple hops ?
• What do bridges do if the
path between two LANs
is not unique ?
LAN 2
d
Bridge 4
Bridge 3
Bridge 1
LAN 5
Bridge 5
LAN 1
Bridge 2
LAN 3
LAN 4
40
Transparent Bridges
• Three principal approaches can be found:
– Fixed Routing
– Source Routing
– Spanning Tree Routing (IEEE 802.1d)
• We only discuss the last one in detail.
• Bridges that execute the spanning tree algorithm are called
transparent bridges
41
(1) Frame Forwarding
• Each bridge maintains a MAC forwarding table
• Forwarding table plays the same role as the routing table of an IP router
• Entries have the form ( MAC address, port, age), where
MAC address:
port:
age:
host name or group address
port number of bridge
aging time of entry (in seconds)
with interpretation:
a machine with MAC address lies in direction of the port number
from the bridge. The entry is age time units old.
MAC forwarding table
MAC address
port
a0:e1:34:82:ca:34
45:6d:20:23:fe:2e
1
2
age
10
20
42
(1) Frame Forwarding
• Assume a MAC frame arrives on port x.
Port x
Is MAC address of
destination in forwarding
table for ports A, B, or C ?
Bridge 2
Port A
Port C
Port B
Found?
Not
found ?
Flood the frame,
Forward the frame on the
appropriate port
i.e.,
send the frame on all
ports except port x.
43
(2) Address Learning (Learning Bridges)
• Routing tables entries are set automatically with a simple
heuristic:
The source field of a frame that arrives on a port tells
which hosts are reachable from this port.
Src=x, Dest=y
Src=x, Dest=y
Src=x,
Src=y, Dest=x
Dest=y
Port 1
Port 4
x is at Port 3
y is at Port 4
Port 2
Port 3
Port 5
Port 6
Src=x,
Src=y, Dest=x
Dest=y
Src=x, Dest=y
Src=x, Dest=y
44
(2) Address Learning (Learning Bridges)
Learning Algorithm:
• For each frame received, the source stores the source
field in the forwarding database together with the port
where the frame was received.
• All entries are deleted after some time (default is 15
seconds).
Src=y, Dest=x
Port 1
Port 4
x is at Port 3
y is at Port 4
Src=y, Dest=x
Port 2
Port 5
Port 3
Port 6
45
Flooding Can Lead to Loops
• Switches sometimes need to broadcast frames
– Upon receiving a frame with an unfamiliar destination
– Upon receiving a frame sent to the broadcast address
• Broadcasting is implemented by flooding
– Transmitting frame out every interface
– … except the one where the frame arrived
• Flooding can lead to forwarding loops
– E.g., if the network contains a cycle of switches
– Either accidentally, or by design for higher reliability
46
Solution: Spanning Trees
• Ensure the topology has no loops
– Avoid using some of the links when flooding
– … to avoid forming a loop
• Spanning tree
– Sub-graph that covers all vertices but contains no
cycles
47
Solution: Spanning Trees
• Ensure the topology has no loops
– Avoid using some of the links when flooding
– … to avoid forming a loop
• Spanning tree
– Sub-graph that covers all vertices but contains no cycles
– Links not in the spanning tree do not forward frames
48
Constructing a Spanning Tree
• Need a distributed algorithm
– Switches cooperate to build the spanning tree
– … and adapt automatically when failures occur
• Key ingredients of the algorithm
– Switches need to elect a “root”
• The switch with the smallest identifier
– For each of its interfaces, a switch identifies
if the interface is on the shortest path from the root
• And it excludes an interface from the tree if not
49
Constructing a Spanning Tree (cont. I)
root
One hop
Three hops
50
Constructing a Spanning Tree (cont. II)
• Use broadcast messages; e.g. (Y, d, X)
– From node X
– Claiming Y is the root
– And the distance from X to root is d
51
Steps in Spanning Tree Algorithm
• Initially, each switch thinks it is the root
– Switch sends a message out every interface identifying
itself as the root
– Example: switch X announces (X, 0, X)
• Switches update their view of the root
– Upon receiving a message, check the root id
– If the new id is smaller, start viewing that switch as root
• Switches compute their distance from the root
– Add 1 to the distance received from a neighbor
– Identify interfaces not on a shortest path to the root
– … and exclude them from the spanning tree
52
Example From Switch #4’s Viewpoint
• Switch #4 thinks it is the root
– Sends (4, 0, 4) message to 2 and 7
• Then, switch #4 hears from #2
– Receives (2, 0, 2) message from 2
– … and thinks that #2 is the root
– And realizes it is just one hop away
1
3
5
2
• Then, switch #4 hears from #7
– Receives (2, 1, 7) from 7
– And realizes this is a longer path
– So, prefers its own one-hop path
– And removes 4-7 link from the tree
4
7
6
53
Example From Switch #4’s Viewpoint
• Switch #2 hears about switch #1
– Switch 2 hears (1, 1, 3) from 3
– Switch 2 starts treating 1 as root
– And sends (1, 2, 2) to neighbors
• Switch #4 hears from switch #2
– Switch 4 starts treating 1 as root
– And sends (1, 3, 4) to neighbors
• Switch #4 hears from switch #7
– Switch 4 receives (1, 3, 7) from 7
– And realizes this is a longer path
– So, prefers its own three-hop path
– And removes 4-7 Iink from the
tree
1
3
5
2
4
7
6
54
Robust Spanning Tree Algorithm
• Algorithm must react to failures
– Failure of the root node
• Need to elect a new root, with the next lowest identifier
– Failure of other switches and links
• Need to recompute the spanning tree
• Root switch continues sending messages
– Periodically reannouncing itself as the root (1, 0, 1)
– Other switches continue forwarding messages
• Detecting failures through timeout (soft state!)
– Switch waits to hear from others
– Eventually times out and claims to be the root
55
Spanning Tree Protocol (IEEE 802.1d)
•
The Spanning Tree Protocol (SPT) is a
solution to prevent loops when
forwarding frames between LANs
LAN 2
d
•
The SPT is standardized as the IEEE
802.1d protocol
•
The SPT organizes bridges and LANs
as spanning tree in a dynamic
environment
– Frames are forwarded only along
the branches of the spanning tree
– Note: Trees don’t have loops
•
Bridges that run the SPT are called
transparent bridges
•
Bridges exchange messages to
configure the bridge (Configuration
Bridge Protocol Data Unit or BPDUs) to
build the tree.
Bridge 4
Bridge 3
Bridge 1
LAN 5
Bridge 5
LAN 1
Bridge 2
LAN 3
LAN 4
56
Configuration BPDUs
Destination
MAC address
Source MAC
address
message type
Set to 0
lowest bit is "topology change bit (TC bit)
flags
Cost
bridge ID
port ID
ID of root
Cost of the path from the
bridge sending this
message
ID of bridge sending this message
message age
ID of port from which
message is sent
maximum age
Time between
BPDUs from the root
(default: 1sec)
Set to 0
version
root ID
Configuration
Message
Set to 0
protocol identifier
hello time
forward delay
Time between
recalculations of the
spanning tree
(default: 15 secs)
time since root sent a
message on
which this message is based
57
What do the BPDUs do?
With the help of the BPDUs, bridges can:
• Elect a single bridge as the root bridge.
• Calculate the distance of the shortest path to the root bridge
• Each LAN can determine a designated bridge, which is the
bridge closest to the root. The designated bridge will forward
packets towards the root bridge.
• Each bridge can determine a root port, the port that gives the
best path to the root.
• Select ports to be included in the spanning tree.
58
Concepts
• Each bridge as a unique identifier:
Bridge ID
Bridge ID = Priority :
2 bytes
Bridge MAC address: 6 bytes
– Priority is configured
– Bridge MAC address is lowest MAC addresses of all ports
• Each port of a bridge has a unique identifier (port ID).
• Root Bridge: The bridge with the lowest identifier is the root
of the spanning tree.
• Root Port:
Each bridge has a root port which identifies the
next hop from a bridge to the root.
59
Concepts
• Root Path Cost: For each bridge, the cost of the min-cost
path to the root.
• Designated Bridge, Designated Port: Single bridge on a
LAN that provides the minimal cost path to the
root for this LAN:
- if two bridges have the same cost, select the
one with highest priority
- if the min-cost bridge has two or more ports
on the LAN, select the port with the lowest
identifier
• Note: We assume that “cost” of a path is the number of “hops”.
60
Steps of Spanning Tree Algorithm
• Each bridge is sending out BPDUs that contain the following
information:
root ID cost bridge ID port ID
root bridge (what the sender thinks it is)
root path cost for sending bridge
Identifies sending bridge
Identifies the sending port
• The transmission of BPDUs results in the distributed
computation of a spanning tree
• The convergence of the algorithm is very quick
61
Ordering of Messages
• We define an ordering of BPDU messages
ID R1 C1 ID B1 ID P1
M1
ID R2 C2 ID B2 ID P2
M2
We say M1 advertises a better path than M2 (“M1<<M2”) if
(R1 < R2),
Or (R1 == R2) and (C1 < C2),
Or (R1 == R2) and (C1 == C2) and (B1 < B2),
Or (R1 == R2) and (C1 == C2) and (B1 == B2) and (P1 < P2)
62
Initializing the Spanning Tree Protocol
• Initially, all bridges assume they are the root bridge.
• Each bridge B sends BPDUs of this form on its LANs from
each port P:
B
0
B
P
• Each bridge looks at the BPDUs received on all its ports and
its own transmitted BPDUs.
• Root bridge is the smallest received root ID that has been
received so far (Whenever a smaller ID arrives, the root is
updated)
63
Operations of Spanning Tree Protocol
• Each bridge B looks on all its ports for BPDUs that are better than its own
BPDUs
• Suppose a bridge with BPDU:
M1
R1 C1 B1 P1
receives a “better” BPDU:
M2
R2 C2 B2 P2
Then it will update the BPDU to:
R2 C2+1 B1 P1
• However, the new BPDU is not necessarily sent out
• On each bridge, the port where the “best BPDU” (via relation “<<“) was
received is the root port of the bridge.
64
When to send a BPDU
• Say, B has generated a BPDU for each port x
R
Cost
B
x
• B will send this BPDU on port x only if its
BPDU is better (via relation “<<“) than any
BPDU that B received from port x.
Port x
Bridge B
Port A
Port C
Port B
• In this case, B also assumes that it
is the designated bridge for the
LAN to which the port connects
• And port x is the designated port of that LAN
65
Selecting the Ports for the Spanning Tree
• Each bridges makes a local decision which of its ports are
part of the spanning tree
• Now B can decide which ports are in the spanning tree:
• B’s root port is part of the spanning tree
• All designated ports are part of the spanning tree
• All other ports are not part of the spanning tree
• B’s ports that are in the spanning tree will forward packets
(=forwarding state)
• B’s ports that are not in the spanning tree will not forward
packets (=blocking state)
66
Building the Spanning Tree
• Consider the network on the right.
• Assume that the bridges have
calculated the designated ports
(D) and the root ports (P) as
indicated.
LAN 2
d
D
Bridge
Bridge
D
R
R
LAN 5
Bridge
R
• What is the spanning tree?
– On each LAN, connect R ports
to the D ports on this LAN
Bridge
D
LAN 1
R
D
LAN 3
Bridge
D
LAN 4
67
IP - The Internet Protocol
68
Orientation
• IP (Internet Protocol) is a Network Layer Protocol.
TCP
UDP
ICMP
IP
ARP
Network
Access
IGMP
Transport
Layer
Network
Layer
Link Layer
Media
• IP’s current version is Version 4 (IPv4). It is specified in RFC
891.
69
IP: The waist of the hourglass
• IP is the waist of the
hourglass of the Internet
protocol architecture
Applications
HTTP FTP SMTP
• Multiple higher-layer protocols
• Multiple lower-layer protocols
• Only one protocol at the
network layer.
TCP UDP
IP
Data link layer
protocols
Physical layer
protocols
70
Application protocol
• IP is the highest layer protocol which is implemented at both
routers and hosts
Application
Application protocol
Application
TCP
TCP protocol
TCP
IP
Data Link
Host
IP
IP protocol
Data
Link
Data
Link
IP
IP protocol
Data
Link
Router
Data
Link
Data
Link
IP protocol
Data
Link
Router
Data
Link
IP
Network
Access
Host
71
IP Service
• Delivery service of IP is minimal
• IP provides an unreliable connectionless best effort service (also called:
“datagram service”).
– Unreliable: IP does not make an attempt to recover lost packets
– Connectionless: Each packet (“datagram”) is handled independently.
IP is not aware that packets between hosts may be sent in a logical
sequence
– Best effort: IP does not make guarantees on the service (no
throughput guarantee, no delay guarantee,…)
• Consequences:
• Higher layer protocols have to deal with losses or with duplicate
packets
•
Packets may be delivered out-of-sequence
72
IP Service
• IP supports the following services:
• one-to-one
(unicast)
• one-to-all
(broadcast)
• one-to-several
(multicast)
unicast
broadcast
multicast
• IP multicast also supports a many-to-many service.
• IP multicast requires support of other protocols (IGMP, multicast routing)
73
IP Datagram Format
bit # 0
7 8
version
header
length
15 16
ECN
DS
Identification
time-to-live (TTL)
23
24
31
total length (in bytes)
0
D M
F F
protocol
Fragment offset
header checksum
source IP address
destination IP address
options (0 to 40 bytes)
payload
4 bytes
• 20 bytes ≤ Header Size < 24 x 4 bytes = 60 bytes
• 20 bytes ≤ Total Length < 216 bytes = 65536 bytes
74
IP Datagram Format
• Question: In which order are the bytes of an IP datagram
transmitted?
• Answer:
• Transmission is row by row
• For each row:
1. First transmit bits 0-7
2. Then transmit bits 8-15
3. Then transmit bits 16-23
4. Then transmit bits 24-31
• This is called network byte order or big endian byte
ordering.
• Note: some computers store 32-bit words in little endian format.
75
Fields of the IP Header
• Version (4 bits): current version is 4, next version will be 6.
• Header length (4 bits): length of IP header, in multiples of 4
bytes
• DS/ECN field (1 byte)
– This field was previously called as Type-of-Service (TOS)
field. The role of this field has been re-defined, but is
“backwards compatible” to TOS interpretation
– Differentiated Service (DS) (6 bits):
• Used to specify service level (currently not supported in
the Internet)
– Explicit Congestion Notification (ECN) (2 bits):
• New feedback mechanism used by TCP
76
Fields of the IP Header
• Identification (16 bits): Unique identification of a datagram
from a host. Incremented whenever a datagram is transmitted
• Flags (3 bits):
– First bit always set to 0
– DF bit (Do not fragment)
– MF bit (More fragments)
Will be explained later Fragmentation
77
Fields of the IP Header
• Time To Live (TTL) (1 byte):
– Specifies longest paths before datagram is dropped
– Role of TTL field: Ensure that packet is eventually dropped
when a routing loop occurs
Used as follows:
– Sender sets the value (e.g., 64)
– Each router decrements the value by 1
– When the value reaches 0, the datagram is dropped
78
Fields of the IP Header
• Protocol (1 byte):
• Specifies the higher-layer protocol.
• Used for demultiplexing to higher layers.
4 = IP-in-IP
encapsulation
17 = UDP
6 = TCP
2 = IGMP
1 = ICMP
IP
• Header checksum (2 bytes): A simple 16-bit long checksum
which is computed for the header of the datagram.
79
Fields of the IP Header
• Options:
• Security restrictions
• Record Route: each router that processes the packet adds its IP
address to the header.
• Timestamp: each router that processes the packet adds its IP
address and time to the header.
• (loose) Source Routing: specifies a list of routers that must be
traversed.
• (strict) Source Routing: specifies a list of the only routers that
can be traversed.
• Padding: Padding bytes are added to ensure that header
ends on a 4-byte boundary
80
Maximum Transmission Unit
• Maximum size of IP datagram is 65535, but the data link layer protocol
generally imposes a limit that is much smaller
• Example:
– Ethernet frames have a maximum payload of 1500 bytes
 IP datagrams encapsulated in Ethernet frame cannot be longer than
1500 bytes
• The limit on the maximum IP datagram size, imposed by the data link
protocol is called maximum transmission unit (MTU)
• MTUs for various data link protocols:
Ethernet:
1500
FDDI:
4352
802.3:
1492
ATM AAL5: 9180
802.5:
4464
PPP:
negotiated
81
IP Fragmentation
• What if the size of an IP datagram exceeds the MTU?
IP datagram is fragmented into smaller units.
• What if the route contains networks with different MTUs?
FDDI
Ring
Host A
MTUs:
FDDI: 4352
Ethernet
Router
Host B
Ethernet: 1500
• Fragmentation:
• IP router splits the datagram into several datagram
• Fragments are reassembled at receiver
82
Where is Fragmentation done?
• Fragmentation can be done at the sender or at
intermediate routers
• The same datagram can be fragmented several times.
• Reassembly of original datagram is only done at
destination hosts (except in NAT’s case) !!
IP datagram
H
Fragment 2
H2
Fragment 1
H1
Router
83
What’s involved in Fragmentation?
• The following fields in the IP
header are involved:
version
header
length
DS
Identification
time-to-live (TTL)
Identification
protocol
total length (in bytes)
ECN
0
DM
F F
Fragment offset
header checksum
When a datagram is fragmented, the
identification is the same in all fragments
Flags
DF bit is set: Datagram cannot be fragmented and must
be discarded if MTU is too small
MF bit set: This datagram is part of a fragment and an
additional fragment follows this one
84
What’s involved in Fragmentation?
• The following fields in the IP
header are involved:
version
header
length
DS
Identification
time-to-live (TTL)
Fragment offset
Total length
protocol
total length (in bytes)
ECN
0
DM
F F
Fragment offset
header checksum
Offset of the payload of the current
fragment in the original datagram
Total length of the current fragment
85
Example of Fragmentation
• A datagram with size 2400 bytes must be fragmented according to an
MTU limit of 1000 bytes
Header length: 20
Total length:
2400
Identification:
0xa428
DF flag:
0
MF flag:
0
Fragment offset: 0
Header length: 20
Total length:
448
Identification:
0xa428
DF flag:
0
MF flag:
0
Fragment offset: 244
IP datagram
Header length: 20
Header length: 20
Total length:
996
Total length:
996
Identification:
0xa428 Identification:
0xa428
DF flag:
0
DF flag:
0
MF flag:
1
MF flag:
1
Fragment offset: 122
fragment offset: 0
Fragment 3
MTU: 4000
Fragment 2
Fragment 1
MTU: 1000
Router
86
IP Addressing
IP Addresses
• Structure of an IP address
• Subnetting
• CIDR
IP Addresses
32 bits
0x4
0x5
0x00
9d08
12810
4410
0102
00000000000002
0x06
8bff
128.143.137.144
128.143.71.21
Ethernet Header
IP Header
TCP Header
Ethernet frame
Application data
Ethernet Trailer
What is an IP Address?
• An IP address is a unique global address for a network
interface
• An IP address:
- is a 32 bit long identifier
- encodes a network number (network prefix)
and a host number
Dotted Decimal Notation
• IP addresses are written in a so-called dotted decimal
notation
• Each byte is identified by a decimal number in the range
[0..255]:
• Example:
10000000
1st Byte
= 128
10001111
2nd Byte
= 143
10001001
3rd Byte
= 137
128.143.137.144
10010000
4th Byte
= 144
Network prefix and Host number
• The network prefix identifies a network and the host number
identifies a specific host (actually, interface on the network).
network prefix
host number
• How do we know how long the network prefix is?
– The network prefix is implicitly defined (see class-based
addressing)
– The network prefix is indicated by a netmask.
Example
• Example: ellington.cs.virginia.edu
128.143
137.144
• Network id is:
• Host number is:
• Network mask is:
128.143.0.0
137.144
255.255.0.0
• Prefix notation:
128.143.137.144/16
» Network prefix is 16 bits long
or ffff0000
The old way: Classful IP Adresses
• When Internet addresses were standardized (early 1980s),
the Internet address space was divided up into classes:
– Class A: Network prefix is 8 bits long
– Class B: Network prefix is 16 bits long
– Class C: Network prefix is 24 bits long
• Each IP address contained a key which identifies the class:
– Class A: IP address starts with “0”
– Class B: IP address starts with “10”
– Class C: IP address starts with “110”
The old way: Internet Address Classes
bit # 0
Class A
1
7 8
31
0
Network Prefix
Host Number
8 bits
24 bits
bit # 0 1 2
Class B
10
15 16
network id
110
host
Network Prefix
Host Number
16 bits
16 bits
bit # 0 1 2 3
Class C
31
23 24
network id
31
host
Network Prefix
Host Number
24 bits
8 bits
The old way: Internet Address Classes
bit # 0 1 2 3 4
Class D
1110
31
multicast group id
bit # 0 1 2 3 4 5
Class E
11110
31
(reserved for future use)
• We will learn about multicast addresses later in this course.
Subnetting
• Problem: Organizations
have multiple networks
which are independently
managed
– Solution 1: Allocate one or
more addresses for each
network
• Difficult to manage
• From the outside of the
organization, each network
must be addressable.
University Network
Engineering
School
Medical
School
Library
– Solution 2: Add another
level of hierarchy to the
IP addressing structure
Subnetting
Basic Idea of Subnetting
• Split the host number portion of an IP address into a
subnet number and a (smaller) host number.
• Result is a 3-layer hierarchy
network prefix
network prefix
• Then:
host number
subnet number
host number
extended network prefix
• Subnets can be freely assigned within the organization
• Internally, subnets are treated as separate networks
• Subnet structure is not visible outside the organization
Typical Addressing Plan for an Organization that
uses subnetting
• Each layer-2 network (Ethernet segment, FDDI segment) is
allocated a subnet address.
128.143.71.0 / 24
128.143.0.0/16
128.143.7.0 / 24
128.143.16.0 / 24
128.143.8.0 / 24
128.143.17.0 / 24
128.143.22.0 / 24
128.143.136.0 / 24
Advantages of Subnetting
• With subnetting, IP addresses use a 3-layer hierarchy:
» Network
» Subnet
» Host
• Improves efficiency of IP addresses by not consuming an
entire address space for each physical network.
• Reduces router complexity. Since external routers do not
know about subnetting, the complexity of routing tables at
external routers is reduced.
• Note: Length of the subnet mask need not be identical at all
subnetworks.
CIDR - Classless Interdomain Routing
• Goals:
– Restructure IP address assignments to increase efficiency
– Hierarchical routing aggregation to minimize route table
entries
Key Concept: The length of the network id (prefix) in the IP
addresses is kept arbitrary
• Consequence: Routers advertise the IP address and the
length of the prefix
CIDR Example
• CIDR notation of a network address:
192.0.2.0/18
• "18" says that the first 18 bits are the network part of the
address (and 14 bits are available for specific host
addresses)
• The network part is called the prefix
• Assume that a site requires a network address with 1000 addresses
• With CIDR, the network is assigned a continuous block of 1024 addresses
with a 22-bit long prefix
CIDR: Prefix Size vs. Network Size
CIDR Block Prefix
/27
/26
/25
/24
/23
/22
/21
/20
/19
/18
/17
/16
/15
/14
/13
# of Host Addresses
32 hosts
64 hosts
128 hosts
256 hosts
512 hosts
1,024 hosts
2,048 hosts
4,096 hosts
8,192 hosts
16,384 hosts
32,768 hosts
65,536 hosts
131,072 hosts
262,144 hosts
524,288 hosts
CIDR and Address assignments
• Backbone ISPs obtain large block of IP addresses space and
then reallocate portions of their address blocks to their
customers.
Example:
• Assume that an ISP owns the address block 206.0.64.0/18, which
represents 16,384 (214) IP addresses
• Suppose a client requires 800 host addresses
• With classful addresses: need to assign a class B address (and
waste ~64,700 addresses) or four individual Class Cs (and introducing 4
new routes into the global Internet routing tables)
• With CIDR: Assign a /22 block, e.g., 206.0.68.0/22, and allocated a
block of 1,024 (210) IP addresses.
CIDR and Routing Information
Company X :
ISP X owns:
Internet
Backbone
206.0.68.0/22
206.0.64.0/18
204.188.0.0/15
209.88.232.0/21
ISP y :
209.88.237.0/24
Organization z1 :
Organization z2 :
209.88.237.192/26
209.88.237.0/26
CIDR and Routing Information
Backbone routers do not know
anything about Company X, ISP
Y, or Organizations z1, z2.
Company X :
ISP X does not know about
Organizations z1, z2.
Internet
ISP X sends everything which
Backbone
matches the prefix:
206.0.68.0/22
ISPISP
y sends
everything which matches
X owns:
the prefix:
206.0.64.0/18
209.88.237.192/26 to Organizations z1
204.188.0.0/15
209.88.237.0/26 to Organizations z2
209.88.232.0/21
ISP y :
206.0.68.0/22 to Company X,
209.88.237.0/24 to ISP y
Backbone sends everything
which matches the prefixes
206.0.64.0/18, 204.188.0.0/15,
209.88.232.0/21 to ISP X.
209.88.237.0/24
Organization z1 :
Organization z2 :
209.88.237.192/26
209.88.237.0/26
You can find about ownership of IP addresses in
North America via http://www.arin.net/whois/
Example
• The IP Address:
207
207.2.88.170
2
88
170
11001111 00000010 01011000 10101010
Belongs to:
City of Charlottesville, VA: 207.2.88.0 - 207.2.92.255
11001111 00000010 01011000 00000000
Belongs to:
Cable & Wireless USA 207.0.0.0 - 207.3.255.255
11001111 00000000 00000000 00000000
Subnetting in Details
The Catch
Before subnetting:
• In any network (or subnet) one can use most of the IP
addresses for host addresses.
• One loses two addresses for every network or subnet.
1. Network Address - One address is reserved to that of the
network.
2. Broadcast Address – One address is reserved to address all
hosts in that network or subnet.
Subnet Example
Network address 172.19.0.0 with /16 network mask
Network Network
172
19
Host
Host
0
0
Subnet Example
Network address 172.19.0.0 with /16 network mask
Network Network
Host
Host
172
19
0
0
Using Subnets: subnet mask 255.255.255.0 or /24
Network Network
Subnet
Host
Network Mask:
255.255.0.0 or /16
11111111
11111111
00000000
00000000
Subnet Mask:
255.255.255.0 or /24
11111111
11111111
11111111
00000000
• Applying a mask which is larger than the default subnet
mask, will divide your network into subnets.
• Subnet mask used here is 255.255.255.0 or /24
Subnet Example
Network address 172.19.0.0 with /16 network mask
Using Subnets: subnet mask 255.255.255.0 or /24
Network Network
Subnet
Host
172
172
19
19
0
1
Host
Host
172
172
172
172
19
19
19
19
2
3
etc.
254
Host
Host
Host
Host
172
19
255
Host
Subnets
255
Subnets
28 - 1
Cannot use last
subnet as it
contains broadcast
address
Subnet Example
Network address 172.19.0.0 with /16 network mask
Using Subnets: subnet mask 255.255.255.0 or /24
Network Network
Subnet
Host
172
172
19
19
0
1
0
0
172
172
172
172
19
19
19
19
2
3
etc.
254
0
0
0
0
172
19
255
0
Subnets
Addresses
255
Subnets
28 - 1
Cannot use last
subnet as it
contains broadcast
address
Subnet Example
Class B address 172.19.0.0 with /16 network mask
Using Subnets: subnet mask 255.255.255.0 or /24
Network Network
Subnet
Hosts
Hosts
Addresses
172
172
19
19
0
1
1
1
254
254
172
172
172
172
19
19
19
19
2
3
etc.
254
1
1
1
1
254
254
254
172
19
255
Host
254
Each subnet has
254 hosts, 28 – 2
Subnet Example
Network address 172.19.0.0 with /16 network mask
Using Subnets: subnet mask 255.255.255.0 or /24
Network Network
Subnet
Host
172
172
19
19
0
1
255
255
172
172
172
172
19
19
19
19
2
3
etc.
254
255
255
255
255
172
19
255
255
Broadcast
Addresses
255
Subnets
28 - 1
Cannot use last
subnet as it
contains broadcast
address
Subnet Example
Network address 172.19.0.0 with /16 network mask
Using Subnets: subnet mask 255.255.255.0 or /24
172.19.0.0/24
172.19.5.0/24
172.19.10.0/24
172.19.25.0/24
Important things to remember about Subnetting
• You can only subnet the host portion, you do not have control of the
network portion.
• Subnetting does not give you more hosts, it only allows you to divide your
larger network into smaller networks.
• When subnetting, you will actually lose host adresses:
– For each subnet you lose the address of that subnet
– For each subnet you lose the broadcast address of that subnet
– You “may” lose the first and last last subnets
• Why would you want to subnet?
– Divide larger network into smaller networks
– Limit layer 2 and layer 3 broadcasts to their subnet.
– Better management of traffic.
Subnetting – Example
•
•
•
Host IP Address: 138.101.114.250
Network Mask: 255.255.0.0 (or /16)
Subnet Mask: 255.255.255.192 (or /26)
Given the following Host IP Address, Network Mask and Subnet mask find the
following information:
• Major Network Information
– Major Network Address
– Major Network Broadcast Address
– Range of Hosts if not subnetted
• Subnet Information
– Subnet Address
– Range of Host Addresses (first host and last host)
– Broadcast Address
• Other Subnet Information
– Total number of subnets
– Number of hosts per subnet
Major Network Information
• Host IP Address: 138.101.114.250
• Network Mask: 255.255.0.0
• Subnet Mask: 255.255.255.192
• Major Network Address: 138.101.0.0
• Major Network Broadcast Address: 138.101.255.255
• Range of Hosts if not Subnetted: 138.101.0.1 to 138.101.255.254
Step 1: Convert to Binary
128 64 32 16 8 4 2 1
IP Address
Mask
138.
10001010
11111111
255.
101.
01100101
11111111
255.
114.
01110010
11111111
255.
250
11111010
11000000
192
Step 1:
Translate Host IP Address and Subnet Mask into binary notation
Step 2: Find the Subnet Address
IP Address
Mask
Network
138.
10001010
11111111
10001010
138
101.
01100101
11111111
01100101
101
114.
01110010
11111111
01110010
114
250
11111010
11000000
11000000
192
Step 2:
Determine the Network (or Subnet) where this Host address lives:
1. Draw a line under the mask
2. Perform a bit-wise AND operation on the IP Address and the Subnet
Mask
Note: 1 AND 1 results in a 1, 0 AND anything results in a 0
3. Express the result in Dotted Decimal Notation
4. The result is the Subnet Address of this Subnet or “Wire” which is
138.101.114.192
Step 2: Find the Subnet Address
IP Address
Mask
Network
138.
10001010
11111111
10001010
138
101.
01100101
11111111
01100101
101
114.
01110010
11111111
01110010
114
250
11111010
11000000
11000000
192
Step 2:
Determine the Network (or Subnet) where this Host address lives:
Quick method:
1. Find the last (right-most) 1 bit in the subnet mask.
2. Copy all of the bits in the IP address to the Network Address
3. Add 0’s for the rest of the bits in the Network Address
Step 3: Subnet Range / Host Range
G.D.
IP Address
Mask
Network
10001010
11111111
10001010
01100101
11111111
01100101
S.D.
01110010
11 111010
11111111
11 000000
01110010
11 000000
 subnet
  host 
counting range
counting
range
Step 3:
Determine which bits in the address contain Network (subnet)
information and which contain Host information:
• Use the Network Mask: 255.255.0.0 and divide (Great Divide) the
from the rest of the address.
• Use Subnet Mask: 255.255.255.192 and divide (Small Divide) the
subnet from the hosts between the last “1” and the first “0” in the
subnet mask.
Step 4: First Host / Last Host
G.D.
S.D.
IP Address
Mask
Network
10001010
11111111
10001010
01100101
11111111
01100101
01110010
11 111010
11111111
11 000000
01110010
11 000000
 subnet
  host 
counting range
counting
range
First Host
10001010
138
01100101
101
01110010
114
11
000001
193
Last Host
10001010
138
01100101
101
01110010
114
11
111110
254
Broadcast
10001010
138
01100101
101
01110010
114
11
111111
255
Host Portion
• Subnet Address: all 0’s
• First Host: all 0’s and a 1
• Last Host: all 1’s and a 0
• Broadcast: all 1’s
Step 5: Total Number of Subnets
G.D.
IP Address
Mask
Network
10001010
11111111
10001010
01100101
11111111
01100101
S.D.
01110010
11 111010
11111111
11 000000
01110010
11 000000
 subnet
  host 
counting range
counting
range
• TotalFirst
number
of 10001010
subnets 01100101
01110010
11 000001
Host
138
101
114
193
– Number of subnet bits 10
10001010
01100101
01110010
11 111110
Last Host
10
– 2 = 1,024
138
101
114
254
– 1,024
total subnets
10001010
01100101
01110010
11 111111
Broadcast
101 subnet cannot
114
255
• Subtract one138
“if” all-zeros
be used
• Subtract one “if” all-ones subnet cannot be used
Step 6: Total Number of Hosts per Subnet
G.D.
IP Address
Mask
Network
10001010
11111111
10001010
01100101
11111111
01100101
S.D.
01110010
11 111010
11111111
11 000000
01110010
11 000000
 subnet
  host 
counting range
counting
range
• TotalFirst
number
of 10001010
hosts per subnet
01100101
01110010
Host
138
101
114
– Number of host bits 6
10001010
01100101
01110010
Last Host
6
– 2 = 64
138
101
114
– 64Broadcast
host per subnets
10001010
01100101
01110010
101 address114
• Subtract one138
for the subnet
• Subtract one for the broadcast address
– 62 hosts per subnet
11
000001
193
11
111110
254
11
111111
255
IP Forwarding
129
Orientation
• Internet is a collection of networks
• IP provides an end-to-end delivery service for IP datagrams
between hosts
• The delivery service is realized with the help of IP routers
130
Delivery of an IP datagram
• View at the data link layer:
– Internetwork is a collection of LANs or point-to-point links or switched
networks that are connected by routers
IP
131
Delivery of an IP datagram
• View at the IP layer:
– An IP network is a logical entity with a network number
– We represent an IP network as a “cloud”
– The IP delivery service takes the view of clouds, and ignores the data
link layer view
IP
132
Tenets of end-to-end delivery of datagrams
The following conditions must hold so that an IP datagram can
be successfully delivered
1. The network prefix of an IP destination address must
correspond to a unique data link layer network (=LAN or
point-to-point link or switched network).
2. Routers and hosts that have a common network prefix
must be able to exchange IP datagrams using a data link
protocol (e.g., Ethernet, PPP)
3. An IP network is formed when a data link layer network is
connected to at least one other data link layer network via
a router.
133
Routing tables
•
•
Each router and each host keeps a routing table which
tells the router how to process an outgoing packet
Main columns:
1. Destination address: where is the IP datagram going to?
2. Next hop or interface: how to send the IP datagram?
•
Routing tables are set so that a datagram gets closer to
the its destination
Destination
Routing table of a host or router
IP datagrams can be directly delivered
(“direct”) or are sent to a next hop
router (“R4”)
20.2.1.0/28
10.1.0.0/24
10.1.2.0/24
10.2.1.0/24
10.3.1.0/24
20.1.0.0/16
Next Hop
R4
direct
direct
R4
direct
R4
134
Delivery of IP datagrams
• There are two distinct processes to delivering IP datagrams:
1. Forwarding: How to pass a packet from an input
interface to the output interface?
2. Routing: How to find and setup the routing tables?
• Forwarding must be done as fast as possible:
– on routers, is often done with support of hardware
– on PCs, is done in kernel of the operating system
• Routing is less time-critical
– On a PC, routing is done as a background process
135
Processing of an IP datagram in IP
Routing
Protocol
Static
routing
UDP
TCP
Demultiplex
Yes
routing
table
Lookup next
hop
Yes
IP forwarding
enabled?
No
Destination
address local?
No
IP module
Send
datagram
Discard
Input
queue
Data Link Layer
IP router: IP forwarding enabled
Host: IP forwarding disabled136
Processing of an IP datagram in IP
• Processing of IP datagrams is very similar on an IP router and
a host
• Main difference:
“IP forwarding” is enabled on router and disabled on host
• IP forwarding enabled
 if a datagram is received, but it is not for the local system,
the datagram will be sent to a different system
• IP forwarding disabled
 if a datagram is received, but it is not for the local system,
the datagram will be discarded
137
Processing of an IP datagram at a router
Receive an
IP datagram
1.
2.
3.
4.
5.
6.
7.
8.
9.
IP header validation
Process options in IP header
Parsing the destination IP address
Routing table lookup
Decrement TTL
Perform fragmentation (if necessary)
Calculate checksum
Transmit to next hop
Send ICMP packet (if necessary)
138
Routing table lookup
• When a router or host need to
transmit an IP datagram, it
performs a routing table lookup
• Routing table lookup: Use the
IP destination address as a key to
search the routing table.
• Result of the lookup is the IP
address of a next hop router, or
the name of a network interface
Destination
address
Next hop
network prefix
IP address of
or
next hop router*
host IP address
or
or
loopback address
Name of a
or
network
default route
interface
*Note: A router has many IP addresses. The IP
address in the routing table refers to the address
of the network interface on the same directly
connected network.
139
Type of routing table entries
• Network route
– Destination addresses is a network address (e.g., 10.0.2.0/24)
– Most entries are network routes
• Host route
– Destination address is an interface address (e.g., 10.0.1.2/32)
– Used to specify a separate route for certain hosts
• Default route
– Used when no network or host route matches
– The router that is listed as the next hop of the default route is the
default gateway (for Cisco: “gateway of last resort)
• Loopback address
– Routing table for the loopback address (127.0.0.1)
– The next hop lists the loopback (lo0) interface as outgoing interface
140
Longest Prefix Match
•
Longest Prefix Match: Search for the
routing table entry that has the longest
match with the prefix of the destination
IP address
1. Search for a match on all 32 bits
2. Search for a match for 31 bits
…..
32. Search for a match on 0 bits
Host route, loopback entry
 32-bit prefix match
Default route is represented as 0.0.0.0/0
 0-bit prefix match
128.143.71.21
Destination address
Next hop
10.0.0.0/8
128.143.0.0/16
128.143.64.0/20
128.143.192.0/20
128.143.71.0/24
128.143.71.55/32
default
R1
R2
R3
R3
R4
R3
R5
The longest prefix match for
128.143.71.21 is for 24 bits
with entry 128.143.71.0/24
Datagram will be sent to R4
141
Route Aggregation
• Longest prefix match algorithm permits the aggregation of
prefixes with identical next hop address to a single entry
• This contributes significantly to reducing the size of routing
tables of Internet routers
Destination
Next Hop
Destination
Next Hop
10.1.0.0/24
10.1.2.0/24
10.2.1.0/24
10.3.1.0/24
20.2.0.0/16
20.1.1.0/28
R3
direct
direct
R3
R2
R2
10.1.0.0/24
10.1.2.0/24
10.2.1.0/24
10.3.1.0/24
20.0.0.0/14
R3
direct
direct
R3
R2
142
Transport Protocols
(UDP)
143
Orientation
• We move one layer up and look at the transport layer.
User
Process
User
Process
User
Process
TCP
User
Process
Application
Layer
UDP
Transport
Layer
ICMP
IP
IGMP
Network
Layer
ARP
Hardware
Interface
RARP
Link Layer
Media
144
Orientation
• Transport layer protocols are end-to-end protocols
• They are only implemented at the hosts
HOST
HOST
Application
Application
Transport
Transport
Network
Data Link
Network
Data Link
Network
Data Link
Data Link
145
Transport Protocols in the Internet
The Internet supports 2 transport protocols
•
•
•
•
•
•
UDP - User Datagram Protocol
datagram oriented
unreliable, connectionless
simple
unicast and multicast
useful only for few applications,
e.g., multimedia applications
Used by a lot for services
– network management
(SNMP), routing (RIP),
naming (DNS), etc.
•
•
•
•
•
TCP - Transmission Control
Protocol
stream oriented
reliable, connection-oriented
complex
only unicast
used for most Internet
applications:
– web (http), email (smtp), file
transfer (ftp), terminal (telnet),
etc.
146
UDP - User Datagram Protocol
• UDP supports unreliable transmissions of datagrams
• UDP merely extends the host-to-to-host delivery service of IP datagram to
an application-to-application service
• The only thing that UDP adds is multiplexing and demultiplexing
Applications
Applications
UDP
UDP
IP
IP
IP
IP
IP
147
Port Numbers
• UDP (and TCP) use port numbers to identify applications
• A globally unique address at the transport layer (for both UDP
and TCP) is a tuple <IP address, port number>
• There are 65,535 UDP ports per host.
User
Process
User
Process
User
Process
User
Process
TCP
User
Process
UDP
IP
User
Process
Demultiplex
based on
port number
Demultiplex
based on
Protocol field in IP
header
148
Transport Protocols
(TCP)
149
Overview
Byte Stream
Byte Stream
TCP = Transmission Control Protocol
• Connection-oriented protocol
• Provides a reliable unicast end-to-end byte stream over an
unreliable internetwork.
TCP
TCP
IP Internetwork
150
Connection-Oriented
• Before any data transfer, TCP establishes a connection:
• One TCP entity is waiting for a connection (“server”)
• The other TCP entity (“client”) contacts the server
• The actual procedure for setting up connections is more
complex.
SERVER
• Each connection is CLIENT
Request a co
nnection
full duplex
onnection
Accept a c
Data Transer
waiting for
connection
request
Disconnect
151
Reliable
Byte stream is broken up into chunks which are called segments
Receiver sends acknowledgements (ACKs) for segments
TCP maintains a timer. If an ACK is not received in time,
the segment is retransmitted
Detecting errors:
TCP has checksums for header and data. Segments with
invalid checksums are discarded
Each byte that is transmitted has a sequence number
152
Byte Stream Service
• To the lower layers, TCP handles data in blocks called
segments.
• To the higher layers TCP handles data as a sequence of
bytes and does not identify boundaries between bytes
• So: Higher layers do not know about the beginning and
end of segments !
Application
Application
1. read 40 bytes
2. read 40 bytes
3. read 40 bytes
1. write 100 bytes
2. write 20 bytes
TCP
queue of
bytes to be
transmitted
Segments
TCP
queue of
bytes that
have been
received
153
TCP Format
TCP segments have a minimum 20 byte header with >= 0 bytes of data.
IP header TCP header
20 bytes
TCP data
20 bytes
0
15 16
Source Port Number
31
Destination Port Number
Sequence number (32 bits)
header
length
0
Flags
TCP checksum
20 bytes
Acknowledgement number (32 bits)
window size
urgent pointer
Options (if any)
DATA
154
TCP header fields
• Port Number:
• A port number identifies the endpoint of a connection.
• A pair <IP address, port number> identifies one
endpoint of a connection.
• Two pairs <client IP address, client port number>
and <server IP address, server port number> identify
a TCP connection.
Applications
Ports:
23 80 104
Applications
7
80 16
TCP
TCP
IP
IP
Ports:
155
TCP header fields
• Sequence Number (SeqNo):
– Sequence number is 32 bits long.
– So the range of SeqNo is
0 <= SeqNo <= 232 -1  4.3 Gbyte
– Each sequence number identifies a byte in the byte
stream
– Initial Sequence Number (ISN) of a connection is set
during connection establishment
156
TCP header fields
• Acknowledgement Number (AckNo):
– Acknowledgements are piggybacked, I.e
a segment from A -> B can contain an
acknowledgement for a segment sent in the B -> A
direction.
– A host uses the AckNo field to send acknowledgements. (If
a host sends an AckNo in a segment it sets the “ACK flag”)
– The AckNo contains the next SeqNo that a hosts wants to
receive
Example: The acknowledgement for a segment with
sequence number 0 and 1500 data bytes is
AckNo=1500+1
157
TCP header fields
• Acknowledge Number (cont’d)
– TCP uses the sliding window flow protocol to regulate the
flow of traffic from sender to receiver
– TCP uses the following variation of sliding window:
– no NACKs (Negative ACKnowledgement)
– only cumulative ACKs
• Example:
Assume: Sender sends two segments with “0..1500” and
“1501..3000”, but receiver only gets the second segment.
In this case, the receiver cannot acknowledge the second
packet. It can only send AckNo=0+1
158
TCP header fields
• Header Length ( 4bits):
– Length of header in 32-bit words
– Note that TCP header has variable length (with minimum
20 bytes)
159
TCP header fields
• Flag bits:
– URG: Urgent pointer is valid
– If the bit is set, the following bytes contain an urgent message in
the range:
SeqNo <= urgent message <= SeqNo+urgent pointer
– ACK: Acknowledgement Number is valid
– PSH: PUSH Flag
– Notification from sender to the receiver that the receiver should
pass all data that it has to the application.
– Normally set by sender when the sender’s buffer is empty
160
TCP header fields
• Flag bits:
– RST: Reset the connection
– The flag causes the receiver to reset the connection
– Receiver of a RST terminates the connection and indicates
higher layer application about the reset
– SYN: Synchronize sequence numbers
– Sent in the first packet when initiating a connection
– FIN: Sender is finished with sending
– Used for closing a connection
– Both sides of a connection must send a FIN
161
TCP header fields
• Window Size:
– Each side of the connection advertises the window size
– Window size is the maximum number of bytes that a
receiver can accept.
– Maximum window size is 216-1= 65535 bytes
• TCP Checksum:
– TCP checksum covers TCP segment and IP pseudo
header (see discussion on UDP).
• Urgent Pointer:
– Only valid if URG flag is set
162
Connection Management in TCP
• Opening a TCP Connection
• Closing a TCP Connection
• Special Scenarios
163
TCP Connection Establishment
• TCP uses a three-way handshake to open a connection:
(1) ACTIVE OPEN: Client sends a segment with
– SYN bit set
– port number of client
– initial sequence number (ISN) of client
(2) PASSIVE OPEN: Server responds with a segment with
– SYN bit set
– initial sequence number of server
– ACK for ISN of client
(3) Client acknowledges by sending a segment with:
– ACK ISN of server
164
Three-Way Handshake
aida.poly.edu
mng.poly.edu
SYN (Seq
N o = x)
y, AckNo
=
o
N
q
e
(S
N
SY
(SeqNo = x
=x+1)
+1, AckNo
=
y+1)
165
Three-Way Handshake
aida.poly.edu
mng.poly.edu
S 103188
0193:103
1880193(
win 16384
0)
<mss 146
0, ...>
8586(0)
8
4
2
7
:1
6
8
5
8
8
S 1 724
< mss 1460>
0
6
7
8
in
w
4
9
1
ack 1031880
ack 172488
587 win 175
20
166
Why is a Two-Way Handshake not enough?
aida.poly.edu
S 1 031
880193
:10318
win 16
384 <m 80193(0)
ss 146
0, ...>
S 1 532
211235
win 163 4:1532211235
4
84 < ms
s 1460, (0)
...>
8 6 (0 )
5
8
8
4
:172
6
8
5
8
48
460>
2
1
7
s
1
s
S
0 <m
6
7
8
win
mng.poly.edu
The red
line is a
delayed
duplicate
packet.
Will be discarded
as a duplicate
SYN
When aida initiates the data transfer (starting with SeqNo=15322112355),
mng will reject all data.
167
TCP Connection Termination
• Each end of the data flow must be shut down independently
(“half-close”)
• If one end is done it sends a FIN segment. This means that
no more data will be sent
• Four steps involved:
(1) X sends a FIN to Y (active close)
(2) Y ACKs the FIN,
(at this time: Y can still send data to X)
(3) and Y sends a FIN to X (passive close)
(4) X ACKs the FIN.
168
TCP Connection Termination
aida.poly.edu
mng.poly.edu
F 172488734:172488734(0)
ack 1031880221 win 8733
. ack 17
2488735
win 174
84
F 10318
80221:1
0318802
ack 172
21(0)
488735
win 175
20
in 8733
w
2
2
2
0
8
8
1
3
0
. a ck 1
169
TCP States
State
Description
CLOSED
LISTEN
SYN RCVD
SYN SENT
ESTABLISHED
FIN WAIT 1
FIN WAIT 2
TIMED WAIT
CLOSING
CLOSE WAIT
LAST ACK
No connection is active or pending
The server is waiting for an incoming call
A connection request has arrived; wait for Ack
The client has started to open a connection
Normal data transfer state
Client has said it is finished
Server has agreed to release
Wait for pending packets (“2MSL wait state”)
Both Sides have tried to close simultaneously
Server has initiated a release
Wait for pending packets
170
TCP States in “Normal” Connection Lifetime
SYN_SENT
(active open)
SYN (SeqNo = x)
No = x + 1 )
k
c
A
,
y
=
o
N
q
SYN (Se
LISTEN
(passive open)
SYN_RCVD
(AckNo = y + 1 )
ESTABLISHED
ESTABLISHED
FIN_WAIT_1
(active close)
FIN_WAIT_2
TIME_WAIT
FIN (SeqNo = m)
(AckNo = m+ 1 )
CLOSE_WAIT
(passive close)
FIN (SeqNo = n )
(AckNo =
LAST_ACK
n+1)
CLOSED
171
2MSL Wait State
2MSL Wait State = TIME_WAIT
• When TCP does an active close, and sends the final ACK, the
connection must stay in in the TIME_WAIT state for twice
the maximum segment lifetime.
2MSL= 2 * Maximum Segment Lifetime
• Why?
TCP is given a chance to resend the final ACK. (Server will
timeout after sending the FIN segment and resend the FIN)
• The MSL is set to 2 minutes or 1 minute or 30 seconds.
172
Resetting Connections
• Resetting connections is done by setting the RST flag
• When is the RST flag set?
– Connection request arrives and no server process is
waiting on the destination port
– Abort (Terminate) a connection
Causes the receiver to throw away buffered data. Receiver
does not acknowledge the RST segment
173
Interactive and bulk data in TCP
TCP applications can be put into the following categories
bulk data transfer
- ftp, mail, http
interactive data transfer
- telnet, rlogin
TCP has algorithms to deal which each type of applications
efficiently.
174
Delayed Acknowledgement
• TCP delays transmission of ACKs for up to 200ms
• The hope is to have data ready in that time frame. Then, the
ACK can be piggybacked with the data segment.
• Delayed ACKs explain why the ACK and the “echo of
character” are sent in the same segment.
175
Nagle’s Algorithm
• There are fewer transmissions than there are characters.
• Aida never has multiple segments outstanding.
• This is due to Nagle’s Algorithm:
Each TCP connection can have only one small (1-byte)
segment outstanding that has not been acknowledged.
• Implementation: Send one byte and buffer all subsequent bytes
until acknowledgement is received.Then send all buffered bytes in a
single segment. (Only enforced if data is arriving from application one
byte at a time)
• Nagle’s rule reduces the amount of small segments.
The algorithm can be disabled.
176
TCP:
Flow Control
Congestion Control
Error Control
177
What is Flow/Congestion/Error Control ?
Flow Control:
Algorithms to prevent that the sender
overruns the receiver with information?
Congestion Control: Algorithms to prevent that the sender
overloads the network
Error Control:
Algorithms to recover or conceal the
effects from packet losses
 The goal of each of the control mechanisms is different.
 But the implementation is combined
178
TCP Flow Control
TCP implements sliding window flow control
Sending acknowledgements is separated from setting
the window size at sender.
Acknowledgements do not automatically increase the
window size
Acknowledgements are cumulative
179
Sliding Window Flow Control
Sliding Window Protocol is performed at the byte level:
Advertised window
1
2
sent and
acknowledged
3
4
5
sent but not
acknowledged
6
7
8
can be sent
USABLE
WINDOW
9
10 11
can't sent
Here: Sender can transmit sequence numbers 6,7,8.
180
Sliding Window: “Window Closes”
Transmission of a single byte (with SeqNo = 6) and acknowledgement is
received (AckNo = 5, Win=4):
1
2
3
4
5
6
7
8
9
10 11
Transmit Byte 6
1
2
3
4
5
6
7
8
9
10 11
AckNo = 5, Win = 4
is received
1
2
3
4
5
6
7
8
9
10 11
181
Sliding Window: “Window Opens”
Acknowledgement is received that enlarges the window to the right (AckNo
= 5, Win=6):
1
2
3
4
5
6
7
8
9
10 11
AckNo = 5, Win = 6
is received
1
2
3
4
5
6
7
8
9
10 11
A receiver opens a window when TCP buffer empties (meaning that data is
delivered to the application).
182
Sliding Window: “Window Shrinks”
Acknowledgement is received that reduces the window from the right
(AckNo = 5, Win=3):
1
2
3
4
5
6
7
8
9
10 11
AckNo = 5, Win = 3
is received
1
2
3
4
5
6
7
8
9
10 11
Shrinking a window should not be used
183
Window Management in TCP
• The receiver is returning two parameters to the sender
AckNo
32 bits
window size
(win)
16 bits
• The interpretation is:
• I am ready to receive new data with
SeqNo= AckNo, AckNo+1, …., AckNo+Win-1
• Receiver can acknowledge data without opening the window
• Receiver can change the window size without acknowledging
data
184
Sliding Window: Example
Receiver
Buffer
Sender
sends 2K
of data
0
4K
2K SeqNo=0
2K
Sender blocked
Sender
sends 2K
of data
Win=2048
AckNo=2048
2K SeqNo=2
048
4K
Win=0
AckNo=4096
3K
Win=1024
AckNo=4096
185
TCP Congestion Control
• TCP has a mechanism for congestion control. The
mechanism is implemented at the sender
• The window size at the sender is set as follows:
Send Window = MIN (flow control window, congestion window)
where
• flow control window is advertised by the receiver
• congestion window is adjusted based on feedback from the
network
186
TCP Congestion Control
• The sender has two additional parameters:
– Congestion Window (cwnd)
Initial value is 1 MSS (=maximum segment size) counted as bytes
– Slow-start threshold Value (ssthresh)
Initial value is the advertised window size)
• Congestion control works in two modes:
– slow start (cwnd < ssthresh)
– congestion avoidance (cwnd >= ssthresh)
187
Slow Start
• Initial value:
– cwnd = 1 segment
• Note: cwnd is actually measured in bytes:
1 segment = MSS bytes
• Each time an ACK is received, the congestion window is increased by
MSS bytes.
– cwnd = cwnd + MSS
– If an ACK acknowledges two segments, cwnd is still increased by only 1
segment.
– Even if ACK acknowledges a segment that is smaller than MSS bytes long,
cwnd is increased by MSS.
• Does Slow Start increment slowly? Not really.
In fact, the increase of cwnd can be exponential
188
Slow Start Example
• The congestion
window size grows
very rapidly
– For every ACK, we
increase cwnd by
1 irrespective of
the number of
segments ACK’ed
• TCP slows down the
increase of cwnd
when
cwnd > ssthresh
cwnd =
1xMSS
segment 1
t1
ACK for segmen
cwnd =
2xMSS
cwnd =
4xMSS
cwnd =
7xMSS
segment 2
segment 3
ts 2
ACK for segmen
ts 3
ACK for segmen
segment 4
segment 5
segment 6
ts 4
ACK for segmen
ts 5
ACK for segmen
ts 6
ACK for segmen
189
Congestion Avoidance
• Congestion avoidance phase is started if cwnd has reached
the slow-start threshold value
• If cwnd >= ssthresh then each time an ACK is received,
increment cwnd as follows:
• cwnd = cwnd + MSS(MSS/ cwnd)
• So cwnd is increased by one segment (=MSS bytes) only if all
segments have been acknowledged.
190
Slow Start / Congestion Avoidance
• Here we give a more accurate version than in our earlier
discussion of Slow Start:
If cwnd <= ssthresh then
Each time an Ack is received:
cwnd = cwnd + MSS
else /* cwnd > ssthresh */
Each time an Ack is received :
cwnd = cwnd + MSS. MSS / cwnd
endif
191
Example of
Slow Start/Congestion Avoidance
Assume that ssthresh = 8
cwnd = 1
cwnd = 2
cwnd = 4
14
cwnd = 8
10
ssthresh
8
6
4
cwnd = 9
2
6
t=
4
t=
2
t=
0
0
t=
Cwnd (in segments)
12
Roundtrip times
cwnd = 10
192
Responses to Congestion
• Most often, a packet loss in a network is due to an overflow at
a congested router (rather than due to a transmission error)
• So, TCP assumes there is congestion if it detects a packet
loss
• A TCP sender can detect lost packets via:
• Timeout of a retransmission timer
• Receipt of a duplicate ACK
• When TCP assumes that a packet loss is caused by
congestion it reduces the size of the sending window
193
TCP Tahoe
• Congestion is assumed if sender has timeout or receipt of
duplicate ACK
• Each time when congestion occurs,
– cwnd is reset to one:
cwnd = MSS
– ssthresh is set to half the current size of the congestion
window:
ssthressh = cwnd / 2
– and slow-start is entered
194
Slow Start / Congestion Avoidance
•
A typical plot of cwnd for a TCP connection (MSS = 1500
bytes) with TCP Tahoe:
195
Background: ARQ Error Control
• Two types of errors:
– Lost packets
– Damaged packets
• Most Error Control techniques are based on:
1. Error Detection Scheme (Parity checks, CRC).
2. Retransmission Scheme.
• Error control schemes that involve error detection and
retransmission of lost or corrupted packets are referred to as
Automatic Repeat Request (ARQ) error control.
196
Background: ARQ Error Control
All retransmission schemes use all or a subset of the following
procedures:
Positive acknowledgments (ACK)
Negative acknowledgment (NACK)
All retransmission schemes (using ACK, NACK or both)
rely on the use of timers
The most common ARQ retransmission schemes are:
Stop-and-Wait ARQ
Go-Back-N ARQ
Selective Repeat ARQ
197
Error Control in TCP
• TCP implements a variation of the Go-back-N retransmission
scheme
• TCP maintains a Retransmission Timer for each
connection:
– The timer is started during a transmission. A timeout
causes a retransmission
• TCP couples error control and congestion control (i.e., it
assumes that errors are caused by congestion)
• TCP allows accelerated retransmissions (Fast Retransmit)
198
TCP Retransmission Timer
• Retransmission Timer:
– The setting of the retransmission timer is crucial for
efficiency
– Timeout value too small  results in unnecessary
retransmissions
– Timeout value too large  long waiting time before a
retransmission can be issued
– A problem is that the delays in the network are not fixed
– Therefore, the retransmission timers must be adaptive
199
Round-Trip Time Measurements
• The retransmission mechanism of TCP is adaptive
• The retransmission timers are set based on round-trip time
(RTT) measurements that TCP performs
Segment 1
RTT #1
t1
Segment 2
Segment 3
RTT #2
egm
ACK for S
Segment
RTT #3
The RTT is based on time difference
between segment transmission and
ACK
But:
TCP does not ACK each
segment
Each connection has only one
timer
en
ACK for Segm
egm
ACK for S
ent 2 + 3
Segme
5
nt 4
ent 4
egment 5
ACK for S
200
Round-Trip Time Measurements
• Retransmission timer is set to a Retransmission Timeout
(RTO) value.
• RTO is calculated based on the RTT measurements.
• The RTT measurements are smoothed by the following
estimators srtt and rttvar:
srttn+1 = a RTT + (1- a ) srttn
rttvarn+1 = b ( | RTT - srttn+1 | ) + (1- b ) rttvarn
RTOn+1 = srttn+1 + 4 rttvarn+1
• The gains are set to a =1/4 and b =1/8
• srtt0 = 0 sec, rttvar0 = 3 sec, Also: RTO1 = srtt1 + 2 rttvar1
201
Karn’s Algorithm
segme
nt
Timeout !
RTT ?
RTT ?
• If an ACK for a retransmitted
segment is received, the sender
cannot tell if the ACK belongs to
the original or the
retransmission.
retransm
ission
of segm
ent
ACK
Karn’s Algorithm:
Don’t update srtt on any segments that have been retransmitted.
Each time when TCP retransmits, it sets:
RTOn+1 = min( 2 RTOn, 64)
(exponential backoff)
202
Network Address Translation (NAT)
203
Network Address Translation (NAT)
• RFC 1631
• A short term solution to the problem of the depletion of IP
addresses
– Long term solution is IP v6
– CIDR (Classless InterDomain Routing ) is a possible short
term solution
– NAT is another
• NAT is a way to conserve IP addresses
– Can be used to hide a number of hosts behind a single IP
address
– Uses private addresses:
• 10.0.0.0-10.255.255.255,
• 172.16.0.0-172.32.255.255 or
• 192.168.0.0-192.168.255.255
204
Network Address Translation (NAT)
• NAT is a router function where IP addresses (and possibly
port numbers) of IP datagrams are replaced at the boundary
of a private network
• NAT is a method that enables hosts on private networks to
communicate with hosts on the Internet
• NAT is run on routers that connect private networks to the
public Internet, to replace the IP address-port pair of an IP
packet with another IP address-port pair.
205
Basic Operation of NAT
• NAT device has address translation table
• One to one address translation
206
Pooling of IP Addresses
• Scenario: Corporate network has many hosts but only a
small number of public IP addresses
• NAT solution:
– Corporate network is managed with a private address
space
– NAT device, located at the boundary between the
corporate network and the public Internet, manages a pool
of public IP addresses
– When a host from the corporate network sends an IP
datagram to a host in the public Internet, the NAT device
picks a public IP address from the address pool, and binds
this address to the private address of the host
207
Pooling of IP Addresses
Private
network
Internet
Source
= 10.0.1.2
Destination = 213.168.112.3
Source
= 128.143.71.21
Destination = 213.168.112.3
NAT
device
private address: 10.0.1.2
public address:
H1
public address:
213.168.112.3
H5
Private
Address
Public
Address
10.0.1.2
Pool of addresses: 128.143.71.0-128.143.71.30
208
Supporting Migration between Network Service
Providers
• Scenario: In CIDR, the IP addresses in a corporate network are obtained
from the service provider. Changing the service provider requires
changing all IP addresses in the network.
• NAT solution:
– Assign private addresses to the hosts of the corporate network
– NAT device has static address translation entries which bind the
private address of a host to the public address.
– Migration to a new network service provider merely requires an update
of the NAT device. The migration is not noticeable to the hosts on the
network.
Note:
– The difference to the use of NAT with IP address pooling is that the
mapping of public and private IP addresses is static.
209
Supporting Migration between network service
Providers
210
IP Masquerading
• Also called: Network address and port translation
(NAPT), port address translation (PAT).
• Scenario: Single public IP address is mapped to multiple
hosts in a private network.
• NAT solution:
– Assign private addresses to the hosts of the corporate
network
– NAT device modifies the port numbers for outgoing traffic
211
IP Masquerading
212
Load Balancing of Servers
• Scenario: Balance the load on a set of identical servers,
which are accessible from a single IP address
• NAT solution:
– Here, the servers are assigned private addresses
– NAT device acts as a proxy for requests to the server from
the public network
– The NAT device changes the destination IP address of
arriving packets to one of the private addresses for a
server
– A sensible strategy for balancing the load of the servers is
to assign the addresses of the servers in a round-robin
fashion.
213
Load Balancing of Servers
214
Concerns about NAT
• Performance:
– Modifying the IP header by changing the IP address
requires that NAT boxes recalculate the IP header
checksum
– Modifying port number requires that NAT boxes recalculate
TCP checksum
• Fragmentation
– Care must be taken that a datagram that is fragmented
before it reaches the NAT device, is not assigned a
different IP address or different port numbers for each of
the fragments.
215
Concerns about NAT
• End-to-end connectivity:
– NAT destroys universal end-to-end reachability of hosts on
the Internet.
– A host in the public Internet often cannot initiate
communication to a host in a private network.
– The problem is worse, when two hosts that are in a private
network need to communicate with each other.
216
Concerns about NAT
• IP address in application data:
– Applications that carry IP addresses in the payload of the
application data generally do not work across a privatepublic network boundary.
– Some NAT devices inspect the payload of widely used
application layer protocols and, if an IP address is detected
in the application-layer header or the application payload,
translate the address according to the address translation
table.
217