PPT - Electrical and Computer Engineering

Download Report

Transcript PPT - Electrical and Computer Engineering

Chapter 4
Network Layer
A note on the use of these ppt slides:
We’re making these slides freely available to all (faculty, students, readers).
They’re in PowerPoint form so you can add, modify, and delete slides
(including this one) and slide content to suit your needs. They obviously
represent a lot of work on our part. In return for use, we only ask the
following:
 If you use these slides (e.g., in a class) in substantially unaltered form, that
you mention their source (after all, we’d like people to use our book!)
 If you post any slides in substantially unaltered form on a www site, that
you note that they are adapted from (or perhaps identical to) our slides, and
note our copyright of this material.
Computer Networking: A
Top Down Approach
5th edition.
Jim Kurose, Keith Ross
Addison-Wesley, April
2009.
Thanks and enjoy! JFK/KWR
All material copyright 1996-2010
J.F Kurose and K.W. Ross, All Rights Reserved
Network Layer
4-1
Chapter 4: Network Layer
4. 1 Introduction
4.2 Packet forwarding
4.3 What’s inside a router
4.4 IP: Internet Protocol




Datagram format
IPv4 addressing
ICMP
IPv6
4.5 Routing algorithms
 Distance Vector
 Link state
 Hierarchical routing
4.6 Routing in the Internet
 RIP
 OSPF
 BGP
Network Layer
4-2
Network
Network layer





transport segment from
sending host to receiving host
on sending side encapsulate
segments into datagrams
on rcving side, deliver segments
to transport layer
network layer protocols in every
host, router
router examines header fields
in all IP datagrams passing
through it
Link1 Link2 Link3 Link4
PHY1 PHY2 PHY3 PHY4
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
network
data link
data link
physical
physical
network
data link
physical
network
data
network
link
physical
data link
physical
network
data link
physical
network
data link
physical
Network Layer
application
transport
network
data link
physical
4-3
The Canadian Network
Links: OC-192 (10 Gbps)
4
ORION
(Ontario Research and
Innovation Optical Network)
ORION Office
34 King Street East
Suite 800, 8th Floor
Toronto, Ontario, M5C 2X8
5
COGENT Network
See a fuller map at:
http://www.submarinecablemap.com/
6
Network Connection to UW
7
UW Network
8
Many routing protocols
RIP BGP
OSPF UDP TCP ICMP
Routing
Table(s)
Network layer
IP
Link1 Link2 Link3 Link4
PHY1 PHY2 PHY3 PHY4
2
1
3
4
OSPF: Open Shortest Path First
RIP: Routing Information Protocol
BGP: Border Gateway Protocol
ICMP: Internet Control Message Protocol
TCP: Transmission Control Protocol
UDP: User Datagram Protocol
9
Hop-by-hop routing
Dest.
Introduction 1-10
IP address
In IPv4, an IP address is 32-bit long
Example: 10000001 01100001 01011100 00100101
This is also written as: 129.97.92.37
(10000001)2
(01100001)2
(01011100)2
(00100101)2
= 129
= 97
= 92
= 37
Dotted decimal notation
(easy to enter …..)
Network Layer 4-11
IP address
Switches and hubs do not have IP addresses
End-devices generally have one IP address each …..
(Laptops, desktops, servers, … have one IP address each)
Routers have multiple IP addresses … one+ for each
physical interface …
IP3
IP2
IP1
IP4
Network Layer 4-12
Who gives ISPs IP address blocks?
ICANN allocates IP address blocks to regional internet registries
- AFRINIC: African Registry for Network Info Centre (Mauritius)
- APNIC:
Asia-Pacific Network Information Centre (South Brisbane)
- ARIN:
American Registry for Internet Numbers (Virginia)
- LACNIC: Latin American and Caribbean Network Info. Centre (Uruguay)
- RIPE:
Réseaux IP Européens Network Coordination Centre (Amsterdam)
ICANN: Internet Corp. for Assigned Names and Numbers
Network Layer 4-13
The concept of network prefix
IP1
IP2
IP3
IP4
All the IP addresses on the same link have a common portion
in their most significant bits
IP1: x.y.z. 1 0 1 0 1 0 0 0
IP2: x.y.z. 1 0 1 0 1 0 0 1
IP3: x.y.z. 1 0 1 0 1 0 1 0
IP4: x.y.z. 1 0 1 0 1 0 1 1
common portion
Network prefix
host
ID
14
The concept of network prefix
IP1: x.y.z.1 0 1 0 1 0 0 0
IP2: x.y.z. 1 0 1 0 1 0 0 1
IP3: x.y.z. 1 0 1 0 1 0 1 0
IP4: x.y.z. 1 0 1 0 1 0 1 1
Network prefix
host ID
Remember:
Routers do not store routing info. for individual destination IPs…
(there are billions of IP addresses….. storing and searching all those
IP addresses will need much more CPU power and memory…..
Instead, routers store aggregated IP addresses in their routing tables……
Example: x.y.z.168/30
(10101000)2 = 168
30 = length of the network prefix in bits
15
New/ IP addressing: CIDR
CIDR: Classless InterDomain Routing
 subnet portion (prefix portion) of address is of arbitrary length
 Addr format: a.b.c.d/x, where x is # bits in subnet portion
host
part
subnet
part
11001000 00010111 00010000 00000000
 200.23.16.0/23
VLSM: Variable Length Subnet Mask
Old / Classful addressing: Fixed length subnet mask
Class A: Addr begins with 0 and it is of the form:
Class B: Addr begins with 10 and it is of the form:
Class C: Addr begins with 110 and it is of the form:
Prefix . Host . Host . Host
Prefix. Prefix. Host. Host
Prefix . Prefix . Prefix . Host
Network Layer 4-16
The concept of network prefix
#1: Net
Prefix
#2: Autonomous
Systems
The concept of network prefix is very simple,
yet it is a powerful concept that makes the Internet scalable…..
Routers in Toronto, Tokyo, New York, … know all
UWaterloo hosts by 129.97.0.0/16. The whole UW is one dest.
Tokyo
Toronto
UW
NY
Network Layer 4-17
The concept of network prefix
For discussion purpose, we use the notation 129.97.0.0/16.
However, routers use the following notation:
Dest. Address:
129.97.0.0
Network Mask: 11111111.11111111.00000000.00000000
: 255.255.0.0
Destination address: 129.97.0.0/ 255.255.0.0
Network Layer 4-18
The concepts of network ID and broadcast address

Network ID appears in routing tables.
 Individual host IP addresses do NOT.

Broadcast Address is used to perform IP-level
broadcast.
 You send an IP packet with a Broadcast Addr as
the Destination, the IP packet is delivered to ALL
the nodes on the network.
Network Layer 4-19
The concepts of network ID and broadcast address
The network has an ID called Network ID,
expressed in the form of an IP address.
IP1
IP2
IP3
IP4
Consider the network
IP5
Also, the network has a
broadcast address.
10.2.5.16/28
/28 = 11111111 . 11111111 . 11111111 . 11110000 = 255.255.255.240
The FIRST addr in the net 10.2.5.16/28 is 10.2.5.00010000 = 10.2.5.16
Net ID
The next addr in the net 10.2.5.16/28 is 10.2.5.00010001 = 10.2.5.17
Assign to
Router (con.)
The next addr in the net 10.2.5.16/28 is 10.2.5.00010010 = 10.2.5.18
:
:
:
:
:
:
The next addr in the net 10.2.5.16/28 is 10.2.5.00011110 = 10.2.5.30
The LAST addr in the net 10.2.5.16/28 is 10.2.5.00011111 = 10.2.5.31
Assign to
hosts
Broadcast
addr
Network Layer 4-20
Net ID and Broadcast address are NOT
assigned to any host/router.
Given a network ID and a mask:
Find the next network ID with the same mask.
Find the previous network ID with the same mask.
Given an IP address and a mask:
Find the network ID
and
the Broadcast address
Network Layer 4-21
Given a network ID and a mask:
Find the next network ID with the same mask.
Find the previous network ID with the same mask.
Start
IP address space
Previous network ID
10.2.5.16/28
Next network ID
32 – 28 = 4
32 is length of IP addr in IPv4
Block size = 24 = 16
Previous network ID = 10.2.5.16 - 16
= 10.2.5.0
Next network ID = 10.2.5.16 + 16
= 10.2.5.32
End
Network Layer 4-22
Given an IP address and a mask:
Find the network ID
and
the Broadcast address
Start
32 – 28 = 4
IP address
space
10.2.5.20/28
32 is length of IP addr in IPv4
Block size = 24 = 16
For network ID:
10.2.5.20 = 10.2.5.00010100
(Identify the network portion: the left-most 28 bits)
=> 10.2.5.00010100
(Reset the host portion to 0’s)
=> 10.2.5.00010000
= 10.2.5.16
Broadcast Address = Network ID + Block size -1
= 10.2.5.16 + 16 -1
End
= 10.2.5.31
Network Layer
4-23
Public IP addr vs. Private IP addr
Private IP addresses
Public IP addresses

These addresses are globally
unique.

These are not globally unique.
(Unique within an org.)

Routers everywhere recognize
these.

Routers outside the org. do not
recognize these.

Any host can open a TCP
connection with a machine
with a public IP addr.

If a host has a private IP addr,
a host outside the org. cannot
open a TCP conn with it.

Reuse of IP addresses
Added security
Adv.

Network Layer 4-24
Private IP addresses (RFC 1918)
10.0.0.0/8: Valid IP addresses are 10.0.0.1 -- 10.255.255.254.
172.16.0.0/12: Valid IP addresses are 172.16.0.1 -- 172.31.255.254.
192.168.0.0/16: Valid IP addresses are 192.168.0.1 -- 192.168.255.254.
Note

UW uses the 172.16.0.0/12
10.0.0.0/8
block
block for wireless
RFC: Request For Comments -- a kind of IETF (Internet Eng. Task Force) doc.
Network Layer 4-25
Partitioning an IP address block into different networks
An ISP (UW) gets a block of public IP addresses (129.97.0.0/16) from IANA/ARIN
Public IP address space
ECE
CS
Private IP address space
Human Resource
ECE
Finance
Quest
WiFi
Optometry
Network Layer 4-26
Simple structure of a routing table (at B): Example
*
Dest.
Address
Mask
Next hop
Interface
73.2.0.0
255.255.0.0
IP1
1
129.97.8.0
255.255.255.0
“connected”
3
0.0.0.0
0.0.0.0
IP2
Metric
73.2.0.0/16
IP1
2
1
B
Default entry (configure it)
A network connected to the router
(configure it)
2
3
IP2
129.97.8.0/24
IP3
IP4
* Learn this entry by running routing protocols.
Network Layer 4-27
How does a router choose the next hop for a packet….
Address
x.y.z.w
Mask
a.b.c.d
Next hop
IPn
Interface
I
Metric
m
Routing Table
of A
Next hop?
If dest. addr (IP1) “matches” with
an entry in the RT, choose interface I.
IP packet
IP1
IP1
IPn
I
Router A
Network Layer 4-28
How is address “matching” performed?
x.y.z.w/n
IP1
“matching”
condition
True
IF
n MS-bits of x.y.z.w
==
Matching
occurred
n MS-bits of IP1
False
Matching
Failed
If ( (x.y.z.w AND a.b.c.d) == (IP1 AND a.b.c.d) ), matching occurs
Network Layer 4-29
For an RT and an IP address, many entries may match
Destination Address
Interface #
11001000 00010111 00010/21
0
11001000 00010111 00011000/24
1
11001000 00010111 00011/21
2
Otherwise (default): 0.0.0.0/0
3
Examples:
IP1: 11001000 00010111 00010 110 10100001
matches with the 1st one.
IP2: 11001000 00010111 00011 000 10101010
matches with the 2nd entry
matches with the 3rd entry
Note: If many matchings occurs, there is a “longest” prefix matching …..
Network Layer 4-30
Matching and forwarding algorithm
Inputs:
IP address from packet header (call it IP1)
Routing Table (call it RT)
Processing:
if (matching occurs between IP1 and RT), {
- find the matching entry with the longest prefix
- forward the packet via the appropriate interface
}
else if (default entry exists in RT) { // default: 0.0.0.0/0
forward the packet via the appropriate interface
}
else {
send error message (ICMP message) to the source
of the IP packet
}
ICMP: Internet Control Message Protocol
Network Layer 4-31
Longest Prefix Matching/
packet routing
12
4
IP1
6 4 3 3 bits (total: 32 bits)
IP packet
Dest addr: IP1
IP1
Source Network
Tokyo
IP1
Can./Vancouver
IP1
Can./Toronto1
IP1
Ont./Toronto2
IP1
UW/IST
IP1
ECE/EIT
IP1
Connected
IP1
IP1
4th Floor
IP1
Dest. Network
Implementation of Routing Tables
RT in RAM
RT in TCAM
(Ternary Content Addressable Memory)
Network Layer 4-33
Chapter 4: Network Layer
4. 1 Introduction
4.2 Packet forwarding
4.3 What’s inside a router?
4.4 IP: Internet Protocol




Datagram format
IPv4 addressing
ICMP
IPv6
4.5 Routing algorithms
 Distance Vector
 Link state
 Hierarchical routing
4.6 Routing in the Internet
 RIP
 OSPF
 BGP
4.7 Broadcast and
multicast routing
Network Layer 4-34
Logical view
Router Architecture Overview
Network
Link1 Link2 Link3 Link4
PHY1 PHY2 PHY3 PHY4
Two key router functions:

run routing algorithms/protocols (RIP, OSPF, BGP)

forward datagrams from incoming to outgoing links
switching
fabric
router input
ports
OS
routing
processor
router output
ports
IP, RIP, OSPF,
BGP
Network Layer 4-35
Input Port Functions
line
termination
link
layer
protocol
(receive)
lookup,
forwarding
switch
fabric
queueing
Decentralized switching

Given datagram dest. IP addr,
lookup output port using RT

Queuing occurs at both
input ports and output ports
If fabric is slower than input
ports combined, queueing may
occur at input queues:
queueing delay and loss due to
input buffer overflow!
Network Layer 4-36
Switching fabrics
switching
fabric
router input
ports
router output
ports
routing
processor

transfer packet from input buffer to appropriate output
buffer

(Performance of switching fabric) switching rate: rate
at which packets can be transferred from inputs to
outputs
 often measured as multiple of input/output line rate
 N inputs: Ideally, switching rate is N times line rate
Example: 4 input lines with 10 Gbps per line
Desired switching rate: 4 x 10 Gbps = 40 Gbps
Network Layer 4-37
Switching fabrics

Three types of switching fabrics
memory
memory
bus
crossbar
Network Layer 4-38
Switching Via Memory
First generation routers:
 traditional computers with switching under direct
control of CPU
 packets are copied to system’s memory
 speed is limited by memory bandwidth (2 bus
crossings per datagram)
input
port
(e.g.,
Ethernet)
memory
output
port
(e.g.,
Ethernet)
system bus
Network Layer 4-39
Switching Via a Bus

datagram from input port memory
to output port memory via a shared bus

bus contention: switching speed limited by
bus bandwidth

Example:
bus
 Cisco 5600 router: 32 Gbps bus
Network Layer 4-40
Switching Via An Interconnection Network


Overcome bus bandwidth limitations
Banyan networks and other
interconnection nets initially developed
for parallel processing
crossbar
Example:
Cisco 12000 router: switches 60 Gbps
through the interconnection network
An 8x8
banyan network
Network Layer 4-41
Output Ports Functions
switch
fabric
datagram
buffer
queueing
link
layer
protocol
(send)
line
termination

buffering required when datagrams arrive from fabric
faster than the transmission rate

scheduling discipline chooses among queued
datagrams for transmission
Network Layer 4-42
Output port queueing
switch
fabric
at t, packets move
from input to output


switch
fabric
one packet time later
buffering when arrival rate via switch exceeds output
line speed
queueing (delay) and loss due to output port buffer
overflow!
Network Layer 4-43
How much buffering?

RFC 3439 rule of thumb:
 average buffering = “typical” RTT x link capacity C
Example
RTT = 250 ms; C = 10 Gpbs
Buffer size = 2.5 Gbits
R1
RTT
C
(RTT: Round-trip Time)
R2
buffer
Network Layer 4-44
Chapter 4: Network Layer




4. 1 Introduction
4.2 Virtual circuit and
datagram networks
4.3 What’s inside a
router
4.4 IP: Internet Protocol
 Datagram (IPv4 pkt)
format
 IPv4 addressing
 ICMP
 IPv6

4.5 Routing algorithms
 Link state
 Distance Vector
 Hierarchical routing

4.6 Routing in the
Internet
 RIP
 OSPF
 BGP

4.7 Broadcast and
multicast routing
Network Layer 4-45
Chapter 4: Network Layer
4. 1 Introduction
4.2 Virtual circuit and
datagram networks
4.3 What’s inside a router
4.4 IP: Internet Protocol
 Datagram (IPv4 pkt)
format
 IPv4 addressing
 ICMP
 IPv6
4.5 Routing algorithms
 Link state
 Distance Vector
 Hierarchical routing
4.6 Routing in the Internet
 RIP
 OSPF
 BGP
4.7 Broadcast and
multicast routing
Network Layer 4-46
IP Packet Format
32 bits
Header
ver head. DSCP
len
Length
fragment
Flgs
16-bit ID
offset
upper
header
TTL
layer
checksum
source IP address
destination IP address
Version (4 bits) : 4 (= 0100)
Header length (4 bits): unit is 4-bytes
(Ex.: A 20-byte header is rep. by 5 (= 0101)
DSCP (Differentiated Services Code Point/ 8 bits):
Type of data carried
(6-bit DSCP + 2-bit)
(DSCP = 46  High Priority; 0  Low)
Length (16 bits): Packet length in bytes
Options (if any)
Data
(variable length,
typically a TCP
or UDP segment)
16-bit ID: A long IP packet is fragmented
into smaller packets. All those small packets
carry the same 16-bit-ID.
3-bit flags:
<Not used, Don’t frag., More frags. to follow>
Fragment offset x 23 : gives the position of the
fragment in the original packet.
Network Layer 4-47
IP Packet Format
32 bits
Header
ver head. DSCP
len
length
fragment
Flgs
16-bit ID
offset
upper
header
TTL
layer
checksum
source IP address
destination IP address
Options (if any)
Data
(variable length,
typically a TCP
or UDP segment)
TTL: Time To Live
Max # of remaining hops.
TTL is decremented by 1 at each router.
TTL = 0  Router discards the packet
Upper layer: Upper layer protocol to deliver
payload to (Ex.: TCP = 6
UDP=17)
Header Checksum: to detect bit errors in
packet header
(errors in “data” are ignored.)
Source IP address: 32-bit IP address of the
node that (originally) created the packet.
Destination IP address: 32-bit IP address of
the destination node of the packet.
Network Layer 4-48
IP Packet Format
Options:
32 bits
Header
ver head. DSCP
len
16-bit ID
upper
TTL
layer
length
Flgs
fragment
offset
header
checksum
source IP address
destination IP address
Time stamp, record route taken,
specify list of routers to visit, …
Data: from the upper layer.
Usually one TCP (Transport Control Protocol)
or one UDP (User Datagram Protocol) segment
Options (if any)
Data
(variable length,
typically a TCP
or UDP segment)
IPv4 header length without
options is 20 bytes……….
How much to remember?
DSCP: Diff. Services Code Point (for Quality of Service)
Network Layer 4-49
Header Checksum Calculation (at the Sender)
Example : IP header (in Hex) with checksum set to 0000
4500 0073 0000 4000 4011
0000 c0a8 0001 c0a8 00c7
Step 1: Add all 16-bit blocks of the header
Result = 0010 0100 0111 1001 1100
Add the carry (0010) to the rest to get
Temp = 0100 0111 1001 1110
Step 2: Take 1’s complement of Temp to get the checksum
Checksum = 1’s complement(Temp)
= 1011 1000 0110 0001
= b861
Header with checksum =
4500 0073 0000 4000 4011 b861 c0a8 0001 c0a8 00c7
Send the IP packet with this header ….
50
Header Checksum Re-calculation (at the Receiver)
Assume that the header is received without bit error.
4500 0073 0000 4000 4011 b861 c0a8 0001 c0a8 00c7
Step 1: Add all the 16-bit blocks of the header
Result = 2 fffd
Add carry (2) to fffd to get ffff
Step 2: Take 1’s complement of the final result from step 1,
1’s complement of ffff = 0000.
Step 3: Decision
If the result from step 2 is 0000: No error
Else: bit-error; drop packet
51
IP Fragmentation & Reassembly


network links have MTU limitations
(Max. Transfer Unit)
Ex. 1500 bytes
large IP packets are divided
(“fragmented”) within net
 one IP pkt becomes several IP
pkts
 “reassembled” only at final
destination
 IP header bits used to identify
and order related IP packets
fragmentation:
in: one large datagram
out: 3 smaller datagrams
reassembly
Header bits: < ID, Flags, Offset>
Network Layer 4-52
IP Fragmentation and Reassembly
Data size = 4000-20 = 3980 Bytes
Example
 4000 byte IP pkt
 MTU = 1500 bytes
length
=4000
ID fragflag
=x
=0
One large IP pkt becomes
several smaller IP pkts
length
=1500
1480 bytes in
data field
offset
=0
ID
=x
fragflag
=1
offset
=0
length
=1500
ID
=x
fragflag
=1
offset
=185
length
=1040
ID
=x
fragflag
=0
offset
=370
offset = 1480/8
offset = (1480 +1480)/8
Verification: Data size = 1500 + 1500 +1040 – (20 + 20 + 20) = 3980 Bytes
Network Layer 4-53
Chapter 4: Network Layer
4. 1 Introduction
4.2 Virtual circuit and
datagram networks
4.3 What’s inside a router
4.4 IP: Internet Protocol




Datagram format
IPv4 addressing
ICMP
IPv6
4.5 Routing algorithms
 Link state
 Distance Vector
 Hierarchical routing
4.6 Routing in the Internet
 RIP
 OSPF
 BGP
4.7 Broadcast and
multicast routing
Network Layer 4-54
Subnets

IP address:
 subnet part (high order bits)
(Recall: Network Prefix)
 host part (low order bits)
223.1.1.1
223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3

What’s a subnet ?
 device interfaces with same
subnet part of IP address
223.1.2.9
223.1.3.27
223.1.2.2
subnet
223.1.3.1
223.1.3.2
 can physically reach each other
without an intervening router
network consisting of 3 subnets
Network Layer 4-55
IP addresses: how to get one?


hard-coded by system admin in a file
DHCP: Dynamic Host Configuration Protocol: dynamically
get address from a server
Network Layer 4-56
DHCP: Dynamic Host Configuration Protocol
Goal: allow host to dynamically obtain its IP addr. from network
server when it joins network
 Allows reuse of addresses (only hold addr while connected)
 Support for mobile users …..
DHCP overview:
 host broadcasts “DHCP discover” msg
 DHCP server responds with “DHCP offer” msg
 host requests IP address: “DHCP request” msg
 DHCP server sends address: “DHCP ack” msg
Network Layer 4-57
DHCP client-server scenario
DHCP server
A
B
223.1.2.1
223.1.1.1
223.1.1.2
223.1.1.4
RA
223.1.2.9
223.1.2.2
223.1.1.3
223.1.3.1
arriving
DHCP client needs
address in this
network
223.1.3.27
223.1.3.2
E
DHCP Relay Agent (RA)
on routers …
DHCP
server
Note: A DHCP server need not be a separate machine.
Router
Many routers run DHCP servers for small networks.
Network Layer 4-58
DHCP
Port #67
DHCP client-server scenario
UDP
IP
DHCP Discover
src : 0.0.0.0, 68
dest.:, 255.255.255.255, 67
yiaddr: 0.0.0.0
transaction ID: 654 (example)
DHCP server:
223.1.2.5
DHCP
Port #68
UDP
IP
Arriving
client
DHCP Offer
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddr: 223.1.2.4  note this
transaction ID: 654
Lifetime: 3600 secs
Broadcast IP addr:
255.255.255.255
DHCP Request
src: 0.0.0.0, 68
dest:: 255.255.255.255, 67
yiaddr: 223.1.2.4
transaction ID: 655
Lifetime: 3600 secs
time
DHCP ACK
src: 223.1.2.5, 67
dest: 255.255.255.255, 68
yiaddr: 223.1.2.4
transaction ID: 655
Lifetime: 3600 secs
Client’s IP addr=
223.1.2.4
4-59
DHCP: returns more than an IP address
It returns:
 IP address of first-hop router for client
(aka default gateway)
 name and IP address of DNS sever
DNS: Domain Name System
Function: Machine name  IP address
Example: naik3.uwaterloo.ca  129.97.10.192
 network mask
IP addr block: 11001000 00010111 00010000 00000000 200.23.16.0/20
Net. mask:
11111111 11111111 11110000 00000000 255.255.240.0
Network Layer 4-60
Hierarchical addressing: address aggregation
Hierarchical addressing allows efficient advertisement of routing information:
Without aggregation:
R1 advertises 8 routes
With aggregation: R1 advertises 1 route
Organization 0
200.23.16.0/23
Organization 1
“Send me anything
with addresses beginning
200.23.18.0/23
Organization 2
200.23.20.0/23
Organization 7
.
.
.
.
.
.
200.23.16.0/20”
Fly-By-Night-ISP
R2
R1
200.23.30.0/23
Internet
ISPs-R-Us
“Send me anything
with addresses beginning
199.31.0.0/16”
Network Layer 4-61
Address Aggregation (a.k.a. supernetting)
1 entry
on RT
B
4 entries
on RT
A
192.168.0.0/24: 11000000.10101000.000000 00.00000000
192.168.1.0/24: 11000000.10101000.000000 01.00000000
192.168.2.0/24: 11000000.10101000.00000010.00000000
192.168.3.0/24: 11000000.10101000.00000011.00000000
192.168.0.0/22
Advertise: I can reach 192.168.0.0/22
Network Layer 4-62
Address aggregation
(Cisco: Summary address)
Organization 0
Organization 1
Organization 2
...
11001000 00010111 0001 000 0 00000000
11001000 00010111 0001 001 0 00000000
11001000 00010111 0001 010 0 00000000
…..
….
200.23.16.0/23
200.23.18.0/23
200.23.20.0/23
….
Organization 7
11001000 00010111 0001 111 0 00000000
200.23.30.0/23
ISP's
Address block
11001000 00010111 0001 0000 00000000
200.23.16.0/20
Summary address
Possible Final Q.: Given a few IP address blocks, find their summary address.
Network Layer 4-63
NAT: Network Address Translation
Motivation: One way to solve the address shortage problem …..
The 2nd way is to use IPv6 …
local (say, home) network
rest of
Internet
138.76.28.2
10.0.0.1
138.76.28.1
(private IP addr)
10.0.0.4
NAT
138.76.29.7
138.76.28.3
10.0.0.2
138.76.28.4
10.0.0.3
All IP pkts leaving local net
have same single source NAT IP addr:
138.76.29.7,
but different source port #
(with unchanged dest. IP addr)
IP pkts with source and
destination in this network
have 10.0.0.0/24 address for
source, destination (as usual)
Network Layer 4-64
NAT in UW
Public IP address space
(globally unique)
Private IP address space
(not globally unique)
Remember
Network Layer 4-65
NAT: Network Address Translation

Local network uses just one IP address as far as outside
world is concerned.  more devices are supported.
 A range of addr. is not needed from ISP: just one IP addr for all.
 Devices inside local net are not explicitly addressable (i.e. visible) by
outside world (a security plus).
 This constraint is seen as a security plus point.
App
TCP
Port
(16-bit #)
IP
Network Layer 4-66
NAT: Network Address Translation
2: NAT router
- changes IP pkt
source addr from
10.0.0.1, 3345 to
138.76.29.7, 5001,
NAT translation table
WAN side addr
LAN side addr
1: host 10.0.0.1
(with port# 3345)
sends datagram to
128.119.40.186, 80
138.76.29.7, 5001 10.0.0.1, 3345
……
……
S: 10.0.0.1, 3345
D: 128.119.40.186, 80
10.0.0.1
- updates table
2
S: 138.76.29.7, 5001
D: 128.119.40.186, 80
138.76.29.7
S: 128.119.40.186, 80
D: 138.76.29.7, 5001
3: Reply arrives
dest. address:
138.76.29.7, 5001
3
1
10.0.0.4
S: 128.119.40.186, 80
D: 10.0.0.1, 3345
10.0.0.2
4
10.0.0.3
4: NAT router
changes IP pkt
dest addr from
138.76.29.7, 5001 to 10.0.0.1, 3345
Network Layer 4-67
Implementation of NAT
 outgoing datagrams
 replace (src IP addr, port #) of every outgoing datagram
with (NAT IP addr, new port #)
 [remote clients/servers will respond using
(NAT IP addr, new port #) as dest. addr. ]
 remember (in NAT translation table) every mapping
(src IP addr, port #)  (NAT IP addr, new port #)
 incoming datagrams
 replace (NAT IP address, new port #) in dest fields of every
incoming datagram with corresponding
(src IP addr, port #) stored in NAT table
Network Layer 4-68
NAT traversal problem

Client wants to connect to server
with address 10.0.0.1
Client
?
 server address 10.0.0.1 local
to LAN (client can’t use it as
destination addr)
10.0.0.4
138.76.29.7
 only one externally visible
NATed address: 138.76.29.7
10.0.0.1
NAT
router
Network Layer 4-69
NAT traversal problem: Solution #3

Relaying
3. Relaying established
1. Connection to relay
initiated by NATed host
Client
2. Connection to relay
initiated by client
138.76.29.7
10.0.0.1
NAT
router
Network Layer 4-70
NAT traversal problem: solution #1
statically configure NAT to
forward incoming connection
requests at given port to server
 e.g., (138.76.29.7, port 2500)
always forwarded to 10.0.0.1,
port 2500
Client
10.0.0.1
?
10.0.0.4
138.76.29.7
NAT
router
Network Layer 4-71
NAT traversal problem: solution #2

(automate static NAT port map
configuration)
10.0.0.1
Internet Gateway Device (IGD) Protocol
allows NATed host to:
138.76.29.7
 learn public IP address (138.76.29.7)
 add/remove port mappings (with
lease times)
IGD
10.0.0.4
NAT
router
Network Layer 4-72
Chapter 4: Network Layer
4. 1 Introduction
4.2 Virtual circuit and
datagram networks
4.3 What’s inside a router
4.4 IP: Internet Protocol




Datagram format
IPv4 addressing
ICMP
IPv6
4.5 Routing algorithms
 Link state
 Distance Vector
 Hierarchical routing
4.6 Routing in the Internet
 RIP
 OSPF
 BGP
4.7 Broadcast and
multicast routing
Network Layer 4-73
ICMP: Internet Control Message Protocol

used by hosts & routers to
communicate network-level
information
 error reporting: unreachable
host, network, port, protocol
 echo request/reply (used by
ping)


network-layer “above” IP:
 ICMP msgs carried in IP
datagrams
ICMP message: type, code plus
first 8 bytes of IP datagram
causing error
Type
0
3
3
3
3
3
3
Code
0
0
1
2
3
6
7
description
echo reply (ping)
dest. network unreachable
dest host unreachable
dest protocol unreachable
dest port unreachable
dest network unknown
dest host unknown
8
9
10
11
12
0
0
0
0
0
echo request (ping)
router advertisement
router discovery
TTL expired
bad IP header
Network Layer 4-74
Traceroute and ICMP

Source sends series of UDP
segments to dest
 first has TTL =1
 second has TTL=2, etc.
 unlikely port number

When nth datagram arrives
to nth router:
 router discards datagram
 and sends to source an
ICMP message (type 11,
code 0)
 ICMP message includes
name of router & IP address
when ICMP message
arrives, source calculates
RTT
 traceroute does this 3 times
Stopping criterion
 UDP segment eventually
arrives at destination host
 destination returns ICMP
“port unreachable” packet
(type 3, code 3)
 when source gets this ICMP,
stops.

Network Layer 4-75
Chapter 4: Network Layer
4. 1 Introduction
4.2 Virtual circuit and
datagram networks
4.3 What’s inside a router
4.4 IP: Internet Protocol




Datagram format
IPv4 addressing
ICMP
IPv6
4.5 Routing algorithms
 Link state
 Distance Vector
 Hierarchical routing
4.6 Routing in the Internet
 RIP
 OSPF
 BGP
4.7 Broadcast and
multicast routing
Network Layer 4-76
IPv6

Initial motivation: 32-bit address space soon to be
completely allocated.

Additional motivation:
 header format helps speed processing/forwarding
 header changes to facilitate QoS (Quality of Service)

IPv6 datagram format:
 fixed-length 40 byte header
 no fragmentation allowed
Network Layer 4-77
IPv6 Header (Cont)
Priority: identify priority among datagrams in flow
Flow Label: identify datagrams in same “flow.”
(concept of“flow” not well defined).
Next header: identify upper layer protocol for data
ver
pri
flow label
hop limit
payload len
next hdr
source address
(128 bits)
destination address
(128 bits)
data
32 bits
Network Layer 4-78
Other Changes from IPv4

Checksum: removed entirely to reduce
processing time at each hop

Options: allowed, but outside of header,
indicated by “Next Header” field

ICMPv6: new version of ICMP
 additional message types, e.g. “Packet Too Big”
 multicast group management functions
Network Layer 4-79
Transition From IPv4 To IPv6

Not all routers can be upgraded simultaneously
 How will the network operate with mixed IPv4 and IPv6
routers?

Tunneling: IPv6 carried as payload in IPv4
datagram among IPv4 routers
Network Layer 4-80
Tunneling
Logical view:
E
F
IPv6
IPv6
IPv6
A
B
E
F
IPv6
IPv6
IPv6
IPv6
A
B
IPv6
Physical view:
tunnel
IPv4
IPv4
Network Layer 4-81
Tunneling
F
E
A
B
IPv6
IPv6
A
B
C
D
E
F
IPv6
IPv6
IPv4
IPv4
IPv6
IPv6
Logical view:
tunnel
IPv6
IPv6
Physical view:
Flow: X
Src: A
Dest: F
data
A-to-B:
IPv6
Src:B
Dest: E
Src:B
Dest: E
Flow: X
Src: A
Dest: F
Flow: X
Src: A
Dest: F
data
data
B-to-C:
IPv6 inside
IPv4
D-to-E:
IPv6 inside
IPv4
Flow: X
Src: A
Dest: F
data
E-to-F:
IPv6
Network Layer 4-82
Chapter 4: Network Layer
4. 1 Introduction
4.2 Virtual circuit and
datagram networks
4.3 What’s inside a router
4.4 IP: Internet Protocol




4.5/4.6 Routing algorithms




Distance vector + RIP
Link state + OSPF
Hierarchical routing
BGP
Datagram format
IPv4 addressing
ICMP
IPv6
Network Layer 4-83
Hierarchical Routing
Scale: with 200 Mil. destination networks

can’t store all dest’s in routing tables!

routing table exchange would swamp links!

Address aggregation alleviates the problem,
but not enough…..
Solution: Autonomous System
 Hierarchical routing
Network Layer 4-84
Hierarchical organization of the Internet
Autonomous System
BGP (Border Gateway Protocol) Routers
(Ordinary) routers
Internet
Network Layer 4-85
An Autonomous System is a set of routers under a single technical admin, using
an interior gateway protocol and common metrics to route packets within the AS,
and using an exterior gateway protocol to route packets to other AS’s.
AS’s are identified by a 16-bit ID, called AS number.
Network Layer 4-86
Top IPv6 providers
Example AS numbers and names
AS
Num of
Customers number
159
174
113
577
105
15290
Network Name
COGENT Cogent/PSI
BACOM – Bell Canada
ALLST-15290 – Allstream Corp.
102
88
84
84
852
3356
6539
6327
ASN852 – Telus Advanced Communications
LEVEL3 Level 3 Communications
GT-BELL – Bell Canada
SHAW – Shaw Comm. Inc.
64
63
701
3257
UUNET – MCI Comm. Services, Inc. d/b/a Verizon Business
TINET-BACKBONE Tinet SpA
57
50
49
6453
3549
13768
GLOBEINTERNET TATA Communications
GBLX Global Crossing Ltd.
PEER1 – Peer 1 Network Inc.
Network Layer 4-87
ORANO
#26677
COGENT
#174
IPv4/v6
IPv4
Univ. of Waterloo
AS #12093
Hydro One
Telecom Inc
#19752
IPv4/v6
IPv4
Allstream Corp.
#15290
Network Layer 4-88
UW AS #12093
Voskamp
Advertises 4 prefixes
198.96.155.0/24
(IPv4)
129.97.0.0/16
(IPv4)
129.97.248.0/21
(IPv4)
/47
(IPv6)
Network Layer 4-89
Reason for advertising TWO UW IPv4 prefixes
COGENT
#174
ORANO
#26677
Advt.
129.97.0.0/16
UW
129.97.0.0/16
1 Gbps
129.97.248.0/21
Student Residence
Advt.
129.97.0.0/16
1 Gbps
Hydro One
Telecom Inc
#19752
Advt.
129.97.0.0/16
129.97.248.0/21
10 Gbps
1 Gbps
Advt.
129.97.0.0/16
Allstream Corp.
#15290
Network Layer 4-90
Routing Protocols
Intra-AS routing
(within an AS)
(IGP: Interior Gateway
Protocols)
Inter-AS routing
(among AS)
(EGP: Exterior Gateway
Protocols)
RIP: Routing Information Protocol
BGP: Border Gateway Protocol
or
OSPF: Open Shortest Path First
or
your own proprietary protocol
(Cisco: EIGRP (Enhanced Interior
Gateway Routing Protocol))
Choose one intra-AS routing protocol in a given AS.
Two different intra-AS routing protocols do NOT run in the same AS.
If one+ intra-AS protocols run in the same AS,
(Cisco) routers can be configured to redistribute routes.
Network Layer 4-91
NOTE
RT
RT
RIP
BGP
RT
RT
RIP
RIP
RT
RIP
RT
RIP
AS (running RIP for intra-AS routing)
Network Layer 4-92
A few things to remember ….
RIP
OSPF
BGP
Network Layer 4-93
5
Graph abstraction
Graph: G = (N,E)
N = set of routers = { u, v, w, x, y, z }
2
u
v
2
1
x
3
w
3
z
1
y
1
5
2
E = set of links ={ (u,v), (u,x), (v,x), (v,w), (x,w), (x,y), (w,y), (w,z), (y,z) }
c(x,x’) = cost of link (x,x’)
Example: c(w,z) = 5
Cost of path (x1, x2, x3,…, xp) = c(x1,x2) + c(x2,x3) + … + c(xp-1,xp)
Cost: hop count, delay, ….
Network Layer 4-94
RIP
Routing Information Protocol
Network Layer 4-95
Distance Vector Algorithm
x
y
v
Let
dx(y) := cost of least-cost path from x to y
Bellman-Ford Equation
dx(y) = min {c(x, v) + dv(y) }
min is taken over all neighbors v of x
The neighbor that leads to the minimum cost is the next
hop on shortest path from x to y.
Network Layer 4-96
Bellman-Ford example
du(z) = min { c(u,v) + dv(z),
c(u,x) + dx(z),
c(u,w) + dw(z) }
5
2
u
v
2
1
x
3
w
3
1
5
z
1
y
2
= min {2 + 5,
1 + 3,
5 + 3}
=4
Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3
Network Layer 4-97
Distance Vector Algorithm

Dx(y) = estimate of least cost from x to y
 x maintains distance vector Dx = {Dx(y): y є N }

node x:
 knows cost to each neighbor v: c(x, v)
 maintains its neighbors’ distance vectors.
For each neighbor v, x maintains
Dv = {Dv(y): y є N }
Network Layer 4-98
Distance vector algorithm
Basic idea:
 from time-to-time, each node sends its own
distance vector estimate to neighbors

v
y
when x receives new DV estimate from a neighbor, it updates
its own DV using B-F equation:
Dx(y) ← minv{c(x,v) + Dv(y)}

x
for each node y ∊ N
In steady state
the estimate Dx(y) converges to the actual least cost dx(y)
Network Layer 4-99
LHTR-N1(4)
ICR-N1(3)
LHTR-N1(5)
LHTR-N1(3)
N2
ICR-N1(2)
LHTR-N1(2)
LHTR- N1(6)
LHTR-N1(4)
ICR-N1(1)
ICR-N1(3)
Config. this
for N1
ICR-N1(2)
LHTR-N1(3)
N1
I can reach N1 (Hopcount cost): ICR-N1(cost)
Learns how to reach N1 (Hopcount cost): LHTR-N1(cost)
Network Layer 4-100
Distance Vector Algorithm: General Idea
Each node:
Wait for (change in local link cost or
msg from neighbor)
Recompute estimates
Notify neighbours, if DV to any dest has
changed
Network Layer 4-101
Distance Vector Algorithm

In steady state, shortest paths are established.

Routers learn from neighbors only.  simplicity

Good news travels fast (and explicitly).
 When a router finds a shorter path to a dest., the news
is broadcasted to its neighbors in the next round of
comm.

Bad news travels slowly
 When a router does not hear from a neighbor, it does
not tell its other neighbors about the failure.
 Routing loops are formed.
102
Note: RT @ each node
DV: Two-node instability problem
Before failure
X
1
A
X A 2
B
1
X
A
X
B
A
X A 4
1
B
:
:
X A 2
1
Cost
X B 3
After failure
X - ∞
NH
After B receives update from A
[Dest, NH, cost]
X - 1
Dest.
Finally
After A receives update from B
X - ∞
X B 3
X
A
X A 2
1
B
X
A
1
X -
∞
B
103
DV: Solutions to the loop problem

Redefine “infinity” to a smaller number (say, 16).
X

A
B
Split horizon
 If node B learns about a path to X from A, subsequently, B
does not tell A about this.
• Taking information from A, modifying it, and
sending it back to A creates confusion.

Split horizon with poisoned reverse
 If A routes to X via B, A tells B that it has an infinity-cost
path to X.
104
DV: Three-node instability problem
X
1
X - 1
X A 2
A
B
1
1
X
Failure has
occurred …
X
1
A
1
After A sends the
update to B and C, but
the update to C is lost.
X 1
X C 3
1
B
1
C
X A 2
X - ∞
A
1
C
Before failure
X - ∞
After C sends
update info to B.
∞
B
1
C
X
X
B
X A 2
4
A
1
X C 3
1
B
1
C
X A 2
After B sends
update info to A.
X A 2
105
RIP ( Routing Information Protocol)


included in BSD-UNIX distribution in 1982
uses Distance Vector algorithm
 cost metric: # hops (max = 15 hops), each link has cost 1
 DVs exchanged with neighbors every 30 sec in response message
(aka advertisement)
 each advertisement: list of up to 25 destination subnets (in IP
addressing sense)
u
v
A
z
C
B
D
w
x
y
from router A to destination subnets:
subnet hops
u
1
v
2
w
2
x
3
y
3
z
2
Network Layer 4-106
RIP: Example
D
w
A
z
y
x
B
C
routing table in router D
destination subnet
next router
# hops to dest
w
y
z
x
A
B
B
--
2
2
7
1
….
….
....
Network Layer 4-107
RIP: Example
dest
w
x
z
….
w
A
A-to-D advertisement
next hops
1
1
C
4
… ...
x
z
y
B
D
C
routing table in router D
destination subnet
next router
# hops to dest
w
y
z
x
A
B
A
B
--
2
2
5
7
1
….
….
....
Network Layer 4-108
RIP: Link Failure and Recovery
If no advertisement heard from a neighbor after 180 sec -->
neighbor/link declared dead
 routes via neighbor invalidated
 new advertisements sent to neighbors
 neighbors in turn send out new advertisements (if tables
changed)
 link failure info quickly (?) propagates to entire net
 poisoned reverse used to prevent ping-pong loops (infinite
distance = 16 hops)
Network Layer 4-109
RIP Table processing


RIP routing tables managed by application-level
process called route-d (daemon)
advertisements sent in UDP packets, periodically
repeated
routed
routed
Transport
(UDP)
network
(IP)
link
physical
Transprt
(UDP)
routing
table
routing
table
network
(IP)
link
physical
Network Layer 4-110
OSPF
Open Shortest Path First
Network Layer 4-111
OSPF (Open Shortest Path First)


“open”: publicly available
Uses Link State algorithm
 Each router learns the complete topology of the network (within
AS) via flooding of OSPF-advertisement messages
 Use Dijkstra’s algorithm to compute shortest paths between
all pairs of nodes
 For a given source/dest. pair, find “next hop” from the shortest
path.
Network Layer 4-112
OSPF: Types of links
1. Point-to-Point Link
2. Transient Link
(Broadcast link)
LAN
3. Stub Link
LAN
By means of LS-advertisement, other routers know
what destinations are connected in the AS.
Network Layer 4-113
OSPF
Link-State
4
3
4
5
A
B
3
D
E
2
3
D
E
4
C
A
3
D
5
2 C
4
3
4
5
2 C
E
B
4
4
3
E
A
B
4
3
4
D
A
4
B
3
Link-State of C:
(C, A, 2)
(C, B, 4)
(C, E, 4)
2 C
B
5
A
Link-State of B:
(B, A, 5)
(B, C, 4)
(B, E, 3)
Link-State of E
(E, B, 3)
(E, C, 4)
2 C
3
D
Link-State of A:
(A, B, 5)
(A, C, 2)
(A, D, 3)
Link-State of D:
(D, A, 3)
5
A
3
E
LS of
All nodes
3
D
5
2 C
B
4
4
3
E
Network Layer 4-114
Flooding of LS-advertisemnt
OSPF
Flooding example of LS of D: (D, A, 3) =
5
B
A
2
3
D
C
4
3
4
E
Similarly, LS of other nodes are flooded ……
Network Layer 4-115
A Link-State Routing Algorithm
Notation:
 c(x,y): link cost from node x to y;
= ∞ if not direct neighbors

D(v): current value of cost of path from
source to dest. v


p(v): predecessor node along path
Cost?
Hop count, delay,
loss, …
New: Cost computation (Cisco)
Default cost computation
Cost α 1/Link speed
Cost = 100 Mbps/Link speed
from source to v
Examples:
N': set of nodes whose least cost paths
1 Mbps link  Cost = 100
10 Mbps link  Cost = 10
100 Mbps link  Cost = 1
1 Gbps link  Cost 1
definitively known
You can configure routers with cost computation approaches.
Network Layer 4-116
Dijkstra’s Algorithm
1 Initialization: u is assumed to be “self.”
2
3
4
5
6
N' = {u}
for all nodes v
if v adjacent to u
then D(v) = c(u,v) and p(v) = u
else D(v) = ∞
8 Loop
9
find w not in N' such that D(w) is a minimum
10 add w to N'
11
12
13
14
All nodes run
this algorithm.
v
u
w
update D(v) for all v adjacent to w and not in N' :
D(v) = min( D(v), D(w) + c(w,v) ); p(v) = w /* if w is chosen */
/* new cost to v is either old cost to v or known
shortest path cost to w plus cost from w to v */
15 until all nodes in N'
Network Layer 4-117
Dijkstra’s algorithm: example
D(v) D(w) D(x) D(y) D(z)
Step
0
1
2
3
4
5
N'
p(v)
p(w)
p(x)
u
uw
uwx
uwxv
uwxvy
uwxvyz
7,u
6,w
6,w
3,u
∞
∞
5,u
∞
5,u 11,w
11,w 14,x
10,v 14,x
12,y
p(y)
u
construct shortest path tree by
tracing predecessor nodes
ties can exist
(broken arbitrarily)
9
7
4
8
3
u

w
x
5
Notes:

v
p(z)
w
y
3
7
2
z
4
v
Network Layer 4-118
Dijkstra’s algorithm: example
Resulting shortest-path tree from u:
x
5
u
Resulting routing table in u:
3
w
z
y
2
3
4
v
Destination
Next hop
v
x
w
x
y
w
w
w
z
w
Network Layer 4-119
OSPF “advanced” features (not in RIP)

security: all OSPF messages authenticated (to prevent
malicious intrusion)

multiple same-cost paths allowed (only one path in RIP)

hierarchical OSPF in large domains (next slide ….)
Network Layer 4-120
Hierarchical OSPF
A large AS is partitioned into several “Areas”.
boundary router (BGP router)
backbone router
area
border
routers
Backbone
(Area 0)
Area 3
internal
routers
Area 1
Area 2
Network Layer 4-121
Border Gateway Protocol
Network Layer 4-122
Inter-AS tasks

AS1 must:
1. learn which dests are
reachable through AS2,
which through AS3
suppose a router in AS1
receives datagram
destined outside of AS1:
 router should forward
packet to edge router,
but which one?
3c
3b
other
networks
3a
AS3
2c
1c
1a
AS1
1d
2a
1b
2b
other
networks
AS2
Network Layer4-123
Internet inter-AS routing: BGP

BGP (Border Gateway Protocol): the de facto inter-domain
routing protocol
 “glue that holds the Internet together”

allows subnets to advertise their existence to rest of Internet:
“I am here”

BGP provides each AS a means to:
 (eBGP) obtain subnet reachability information from neighboring ASs.
 (iBGP) propagate reachability information to other AS-internal BGP
routers.
 determine “good” routes to other networks based on reachability
information and policy
Network Layer 4-124
BGP basics

BGP session: two BGP routers (“peers”) exchange BGP msg:
 advertising paths to different destination network prefixes (“path
vector” protocol)
 exchanged over semi-permanent TCP connections

when AS3 advertises a prefix to AS1:
 AS3 promises that it will forward datagrams towards that prefix
 AS3 can aggregate prefixes in its advertisement
3c
3b
other
networks
3a
BGP
message
AS3
2a
1c
1a
AS1
1d
1b
2c
AS2
2b
other
networks
Network Layer 4-125
BGP basics: distributing path information

using eBGP session over 3a-to-1c, AS3 sends prefix
reachability info to AS1.
 1c can then use iBGP to distribute new prefix info to all other
BGP routers in AS1 (Note: with all-to-all TCP conns.)
 1b can then re-advertise new reachability info to AS2 over
1b-to-2a eBGP session

when a router learns of a new prefix, it creates an
entry for the prefix in its routing table.
3c
3b
other
networks
eBGP session
3a
AS3
1a
AS1
iBGP session
1c
1d
1b
2a
2c
AS2
2b
other
networks
Network Layer 4-126
Advt. of “paths to destination nets” between neighboring AS’s

Recall: Dest. Nets are represented by prefixes.

Route (Path) = prefix + attributes
 Two important attributes:
 AS-PATH: a sequence of ASs through which prefix
advertisement has passed: e.g., AS 67, AS 17, AS 205, …
N1
AS 205
17
67
55
({67, 17, 205}; N1)
70
({55, 67, 17, 205}; N1)
 NEXT-HOP: indicates specific router IP addr in next-hop AS.
Network Layer 4-127
Advt. of “paths to destination nets” between neighboring AS’s

gateway (BGP) router receiving route advt. uses import policy
to accept/decline a route
 Example: never route through AS 64496
Import policy:
Accept/decline a route
You receive an AS-PATH to dest. X: ({AS 12, AS 215, AS 64496, AS 99}; X)
You do not trust AS 64496…….
So you decline the above path to X
Documentation AS: 64496 -- 64511
Network Layer 4-128
Remember
BGP prefers policy-based routing
as opposed to
cost-based routing in OSPF and RIP
Network Layer 4-129
BGP route selection

router may learn about more than 1 route to
destination AS; it selects a route based on:
1. policy decision
2. shortest AS-PATH
Network Layer 4-130
BGP messages: exchanged between peers over TCP conn.
 BGP messages:
 OPEN
• opens TCP connection to peer and authenticates sender
 UPDATE: advertises new path (or withdraws old)
 KEEPALIVE: keeps connection alive in absence of
UPDATES
 NOTIFICATION: reports errors in previous msg; also used
to close connection
Network Layer 4-131
BGP routing policy
legend:
customer network
B
W
provider
network
X
A
C
Y

Export policy:
A router can hide a path
X is dual-homed: attached to two networks (B and C)
 if X does not want to route from B to C
X will not advertise to B a route to C
Essentially, X does not tell B that it (X) has a path to C…..
Network Layer 4-132
Why different Intra- and Inter-AS routing ?
Policy:


Inter-AS: admin wants control over how its traffic is
routed, who routes through its net.
Intra-AS: single admin, so no policy decisions needed
Scale:

hierarchical routing saves table size, reduces update
traffic
Performance:
 Intra-AS: focuses on performance
 Inter-AS: policy may dominate performance
Network Layer 4-133
Multi-hop communication done!
End of Network Layer
134