IP Address - Zoo - Yale University

Download Report

Transcript IP Address - Zoo - Yale University

Routing and Forwarding
11/16/2009
1
Exam 1
 Highest 96, lowest 79, medium 87
2
Recap
 Internet routing is divided into intra-AS
(intradomain) and inter-AS (interdomain)
routing
 BGP (Border Gateway Protocol), a path-
vector protocol, is the de facto standard
interdomain protocol
Recap: BGP Routing Decision Process
route
selection
policy:
rank paths
routing cache
select best
path
export path
to neighbors
export
policy:
which
paths
export to
which
neighbors
Recap: Understanding BGP Policies Using P-Graph
 Nodes in P-graph are feasible
2
paths
 A directed edge from path
N1P1 to P1
210
20
0
130
10
3
1
320
30
 A directed edge from a lower
ranked path to a higher
ranked path
 If the P-graph has no loop,
then BGP policy converges.
1 3 0
2 1 0
3 2 0
1 0
2 0
3 0
P-graph
Internet System and Protocol
 Assume a BGP path SABCD to destination AS
D. Consider the business relationship
between each pair:
S
A B
C
D
 Three types of business relationships:
 PC (provider-customer)
 CP (customer-provider)
 PP (peer-peer)
6
Typical Export Policies Imply Patterns of Routes
 Invariant 1 of valid BGP routes (with labels
representing business relationship)
P
C
P
C
P
C
?P
C
Dest
Reason: only route learned from customer is sent to provider; thus
after a PC, it is always PC to the destination
Typical Export Policies Imply Patterns of Routes
 Invariant 2 of valid BGP routes (with labels
representing business relationship)
CP
······
?
CP
CP/PP
Dest
Reason: routes learned from peer or provider are sent to only customers;
thus all relationship before is CP
Stability of BGP Routing
 Suppose
1.
2.
there is no loop formed by provider-customer
relationship in the Internet
each AS uses typical route selection policy:
C > E/P
3.
each AS uses the typical export policies
 Then BGP policy routing always converges !
Case 1: A Link is PC
Proof by contradiction. Assume a loop in P-graph. Consider a fixed link.
in the loop
PC
PC
PC
Case 2: Link is CP/PP
CP/PP
CP/PP
CP
CP
CP/PP
AS D
Routing: Example
E
d1
d
d2
AS A
(OSPF)
a2
F
i
AS C
a1
How to specify?
AS B
(OSPF intra
routing)
b
AS I
12
Outline
 Recap
 BGP policy routing
 Internet addressing
13
IP Addressing Scheme: Requirements
 We need an address to uniquely identify
each destination
 Routing scalability needs flexibility in
aggregation of destination addresses
 we
should be able to aggregate a set of
destinations as a single routing unit
 Preview: the unit of routing in the Internet
is a network---the destinations in the routing
protocols are networks
14
IP Address: An IP Address Identifies an Interface
 IP address: 32-bit
identifier for an
interface
 interface:
 routers typically
have multiple
interfaces
 host may have
multiple interfaces
223.1.1.1
223.1.2.1
223.1.1.2
223.1.1.4
223.1.1.3
223.1.2.9
223.1.3.27
223.1.2.2
223.1.3.2
223.1.3.1
%/sbin/ifconfig -a
223.1.3.2 = 11011111 00000001 00000011 00000010
223
1
3
2
15
IP Addressing
 IP address:
 network part
 host part
 What’s a network ?
(from IP address
perspective)
 is a unit of routing:
can be routed
together (depend
on the routing
protocol)
223.1.1.2
223.1.1.1
223.1.1.4
223.1.1.3
223.1.9.2
223.1.7.0
223.1.9.1
223.1.7.1
223.1.8.1
223.1.2.6
223.1.2.1 223.1.2.2
223.1.8.0
223.1.3.27
223.1.3.1
223.1.3.2
16
IP Addressing: Class-ful Addressing
given notion of “network”, let’s re-examine IP addresses:
“class-ful” addressing in the original IP design:
class
A
0 network
B
10
C
110
D
1110
1.0.0.0 to
127.255.255.255
host
128.0.0.0 to
191.255.255.255
host
network
network
multicast address
host
192.0.0.0 to
223.255.255.255
224.0.0.0 to
239.255.255.255
32 bits
Problem of class-ful addressing?
17
IP Addressing: CIDR
 (Static)classful addressing:

inefficient use of address space, address space exhaustion

not flexible for aggregation
• e.g., a class A net allocated enough addresses for 16 million
hosts; a class B address may also be too big
 CIDR: Classless InterDomain Routing


network portion of address of arbitrary length
address format: a.b.c.d/x, where x is # bits in network
portion of address
network
part
host
part
11001000 00010111 00010000 00000000
200.23.16.0/23
Some systems use mask (1’s to indicate network bits), instead of the /x format
18
CIDR Address Aggregation
AS D
d
d1
AS A
(OSPF)
130.132.1/24
i
i->a1: I can reach
130.132/22; my
path: I
a2
a1
130.132.2/24
AS I
intradomain
routing uses /24
130.132.3/24
19
CIDR Address Aggregation
B
x00/24: B
A
x01/24: C
C
x10/24: E
G
E
x11/24: F
F
20
IP Addressing: How to Get One?
Q: How does an ISP get its block of
addresses?
A: ICANN: Internet Corporation for Assigned
Names and Numbers
 allocates addresses
 manages DNS
 assigns domain names, resolves disputes
Use
%whois –h whois.arin.net “n <org>”
to check addresses allocated to <org>
21
Example: Yale
 Yale address blocks:
%whois –h whois.arin.net “n yale university”
 Yale BGP routing:

Using one of the looking glass servers:
http://www.bgp4.as/looking-glasses
22
Routing Table Size of BGP (number of globally
advertised networks)
Active BGP Entries (http://bgp.potaroo.net/as1221/bgp-active.html)
23
Routing Table Prefix Length Distr.
180000
160000
140000
120000
100000
80000
60000
40000
20000
0
8
13
18
23
28
24
IP addresses: How to Get One?
Q: How does a host get an IP address?
 Static configured
wintel: control-panel->network->configuration>tcp/ip->properties
 unix:
%/sbin/ifconfig eth0 inet 192.168.0.10 netmask
255.255.255.0

 DHCP: Dynamic Host Configuration Protocol:
dynamically get address from as server
 “plug-and-play”
25
DHCP: Dynamic Host Configuration Protocol
Goal: allow host to dynamically obtain its IP address
from network server when it joins network



can renew its lease on address in use
allows reuse of addresses (only hold address while
connected)
support for mobile users who want to join network
DHCP msgs:
 host broadcasts “DHCP discover” msg
 DHCP server responds with “DHCP offer” msg
 host requests IP address: “DHCP request” msg
 DHCP server sends address: “DHCP ack” msg
26
Network Address Translation: Motivation
 A local network uses just one public IP address as far as outside
world is concerned
 Each device on the local network is assigned a private IP address
rest of
Internet
local network
(e.g., home network)
192.168.1.0/24
192.168.1.1
192.168.1.2
192.168.1.3
138.76.29.7
192.168.1.4
All datagrams leaving local
network have same single source
NAT IP address: 138.76.29.7,
different source port numbers
Datagrams with source or
destination in this network
have 192.168.1/24 address for
source, destination (as usual)
27
NAT: Network Address Translation
Implementation: NAT router must:



outgoing datagrams: replace (source IP address, port
#) of every outgoing datagram to (NAT IP address,
new port #)
. . . remote clients/servers will respond using (NAT
IP address, new port #) as destination addr.
remember (in NAT translation table) every (source
IP address, port #) to (NAT IP address, new port #)
translation pair
incoming datagrams: replace (NAT IP address, new
port #) in dest fields of every incoming datagram
with corresponding (source IP address, port #)
stored in NAT table
28
NAT: Network Address Translation
NAT translation table
WAN side addr
LAN side addr
1: host 192.168.1.2
2: NAT router
sends datagram to
changes datagram
138.76.29.7, 5001 192.168.1.2, 3345
128.119.40.186, 80
source addr from
……
……
192.168.1.2, 3345 to
138.76.29.7, 5001,
S: 192.168.1.2, 3345
updates table
D: 128.119.40.186, 80
2
S: 138.76.29.7, 5001
D: 128.119.40.186, 80
138.76.29.7
S: 128.119.40.186, 80
D: 138.76.29.7, 5001
3: Reply arrives
dest. address:
138.76.29.7, 5001
3
192.168.1.2
1
192.168.1.1
192.168.1.3
S: 128.119.40.186, 80
D: 192.168.1.2, 3345
4
192.168.1.4
4: NAT router
changes datagram
dest addr from
138.76.29.7, 5001 to 192.168.1.2, 3345
29
Network Address Translation: Advantages
 No need to be allocated range of addresses
from ISP: - just one public IP address is
used for all devices
16-bit port-number field allows 60,000
simultaneous connections with a single LAN-side
address !
 can change ISP without changing addresses of
devices in local network
 can change addresses of devices in local network
without notifying outside world

 Devices inside local net not explicitly
addressable, visible by outside world (a
security plus)
30
Network Address Translation: Problems
 If both hosts are behind NAT, they will have
difficulty establishing connection
 NAT is controversial:
 routers
should process up to only layer 3
 violates end-to-end argument
• NAT possibility must be taken into account by app
designers, e.g., P2P applications
 address
shortage should instead be solved by
having more addresses --- IPv6 !
31
Outline
 Admin. and recap
 BGP
 IP addressing
 IP forwarding
32
IP Datagram Format
IP protocol version
number
header length
(bytes)
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to
how much overhead
with TCP?
 20 bytes of TCP
 20 bytes of IP
 = 40 bytes + app
layer overhead
32 bits
head. type of
length
ver
len service
fragment
16-bit identifier flgs
offset
time to upper
Internet
layer
live
checksum
total datagram
length (bytes)
for
fragmentation/
reassembly
32 bit source IP address
32 bit destination IP address
Options (if any)
data
(variable length,
typically a TCP
or UDP segment)
E.g. timestamp,
record route
taken, specify
list of routers
to visit.
33
Data Forwarding: Steps
 Error checking, e.g., check header checksum;
if error, set up error flag
 Decrement TTL; if TTL == 0, set error flag
 If error, drop the packet, and generate
ICMP report
34
The Network Layer
Host, router network layer functions:
Transport layer: TCP, UDP
Network
layer
The IP protocol
•addressing
•datagram format
Routing protocols
•path selection
•RIP, OSPF, BGP
forwarding
ICMP protocol
•error reporting
•router “signaling”
Link layer
Physical layer
35
ICMP: Internet Control Message Protocol
 communicate network-level
information
 error reporting:
unreachable host, network,
port, protocol
 echo request/reply (used by
ping)
 network-layer “above” IP:
 ICMP msgs carried in IP
datagrams
 ICMP message: type, code plus
first 8 bytes of IP datagram
causing error
type
code
checksum
ICMP message body
Type
0
3
3
3
3
3
3
4
Code
0
0
1
2
3
6
7
0
8
9
10
11
12
0
0
0
0
0
description
echo reply (ping)
dest network unreachable
dest host unreachable
dest protocol unreachable
dest port unreachable
dest network unknown
dest host unknown
source quench (congestion
control - not used)
echo request (ping)
route advertisement
router discovery
TTL expired
bad IP header
traceroute is developed by a clever use
of ICMP
36
Data Forwarding: Steps
 If no error, look up packet destination
address in forwarding table:


if datagram for a host on directly attached
network, it is the job of the link layer now
otherwise,
• lookup: find next-hop router, and its outgoing interface
• if needed, do fragmentation
• forward packet to outgoing interface (to the next hop
neighbor)
try %netstat –rn to see the forwarding table
37
Forwarding Look up
default: -
#
prefix
a)
b)
c)
d)
e)
f)
g)
h)
i)
j)
k)
00001
00010
00011
001
0101
011
10
100
1010
1011
1100
interface
0
1
a
g
f
d
e
abc
h
i
jk
The networks are represented by
a decision tree, e.g., a Patricia Trie
to look for the longest match of
the destination address
38
Example 1 (same network): A->B
src
forwarding table in A
dst
misc
data
fields 223.1.1.1 223.1.1.3
 Look up dest address
 find dest is on same net
 link layer will send the
datagram directly inside a
link-layer frame
Dest. Net. next router
Nhops
223.1.1/24
1
223.1.2/24 223.1.1.4
2
223.1.3/24 223.1.1.4
2
0.0.0.0/0 223.1.1.4
To Internet
A 223.1.1.1
223.1.4.1
223.1.2.1
B
223.1.1.2
223.1.1.4
223.1.2.9
223.1.2.2
223.1.1.3
223.1.3.1
223.1.3.27
E
223.1.3.2
39
Example 2 (Different Networks): A-> E
forwarding table in A
misc
data
fields 223.1.1.1 223.1.2.3
 look up dest address in
forwarding table
 routing table: next hop
router to dest is 223.1.1.4
 link layer sends datagram to
router 223.1.1.4 inside a linklayer frame

the dest. of the link layer
frame is 223.1.1.4
Dest. Net. next router
Nhops
223.1.1/24
1
223.1.2/24 223.1.1.4
2
223.1.3/24 223.1.1.4
2
0.0.0.0/0 223.1.1.4
To Internet
A 223.1.1.1
223.1.4.1
223.1.2.1
B
223.1.1.2
223.1.1.4
223.1.2.9
223.1.2.3
223.1.1.3
223.1.3.1
223.1.3.27
E
223.1.3.2
40
Example 2 (Different Networks): A-> E
misc
data
fields 223.1.1.1 223.1.2.3
Arriving at 223.1.1.4,
destined for 223.1.2.2
 look up dest address in
router’s forwarding table
 E on same network as router’s
interface 223.1.2.9
 router, E directly attached
 link layer sends datagram to
223.1.2.2 inside link-layer
frame via interface 223.1.2.9
 datagram arrives at
223.1.2.2!! (hooray!)
forwarding table in router
Dest. Net router Nhops interface
223.1.1/24
223.1.2/24
223.1.3/24
0.0.0.0/0
A
B
-
223.1.1.1
1
223.1.1.4
1
223.1.2.9
1
223.1.3.27
223.1.4.1
To Internet
223.1.4.1
223.1.2.1
223.1.1.2
223.1.1.4
223.1.2.9
223.1.2.3
223.1.1.3
223.1.3.1
223.1.3.27
E
223.1.3.2
41
Backup Slides
42
Another Method for Checking
Convergence: Dispute Wheels
 A dispute
wheel
prefers
R1Q2 over Q1
 u2 prefers
R2 Q3 over
Q2
 etc
 u1
2
210
20
R1
Q2
Q1
u1
130
10
1
3
Q3
d
u3
Qn
Rn
un
0
R2
u2
R3
320
30
43
Patterns of Valid Routes
 Consider how a path is
extended according to the
export policies

case 1 (format: Destination,
link type, …)
• Dest, …CP  Dest, … CP CP
• Dest, …CP  Dest, … CP PC
• Dest, …CP  Dest, … CP PP

case 2
 Valid consecutive link types
(starting from source to
destination, i.e., reserve)





PC PC
CP PC
PP PC
CP CP
CP PP
• Dest, …PC  Dest, … PC PC

case 3
• Dest, …PP  Dest, … PP PC
44
Case 1: The first link of Q1 is a PC link
 Then the first link of R1Q2 must be a PC
link

because u1 chooses R1Q2 and C > E/P
 Thus all remaining links along R1Q2 are PC
links

because only PC follows PC
u1
 Thus the first link of Q2 is
a PC link
 ……
 All links along R1, …, Rn are
PC links
 The network has a PC loop.
contradiction !
PC
Rn
R1
PC
Q1
u2
Qn
un
d
Rn-1
Q2
R2
45
Case 2: The first link of Q1 is a CP/PP link
 All links along Rn are CP links because all
links before CP/PP are CP
 The first link of Qn must be a CP/PP link

because C > E/P
 ……
CP
Rn
u1
CP/PP
Q1
 All links along R1, …, Rn are CP links
 The network has a PC loop.
contradiction !
u2
Qn
un
CP/PP
Rn-1
d
Q2
R2
46
Backup: IP Multicast
47
IP Fragmentation & Reassembly
 Network links have MTU
(max.transfer size) largest possible link-level
frame.
 different link types,
different MTUs, e.g.
Ethernet MTU is 1500
bytes
 Large IP datagram divided
(“fragmented”)
 one datagram becomes
several datagrams
 “reassembled” only at
final destination
 IP header bits used to
identify, order related
fragments
fragmentation:
in: one large datagram
out: 3 smaller datagrams
reassembly
48
IP Fragmentation and Reassembly
Example
 4000 byte
datagram
 MTU = 1500 bytes
length ID fragflag offset
=4000 =x
=0
=0
One large datagram becomes
several smaller datagrams
length ID fragflag offset
=1500 =x
=1
=0
length ID fragflag offset
=1500 =x
=1
=1480
length ID fragflag offset
=1040 =x
=0
=2960
49
IP Multicast: Service Model
128.59.16.12
128.119.40.186
multicast
group
226.17.30.197
128.34.108.63
128.34.108.60

Multicast group concept: use of indirection


Open group model



A group is identified by a location-independent
logical address (class D IP address: prefix 1110)
Anyone can send packets to the “logical” group address
Anyone can join a group and receive packets
Normal, best-effort delivery semantics of IP
Needed: infrastructure to deliver mcast-addressed datagrams to
all hosts that have joined that multicast group
50
Multicast Across LANs
 Goal: find a tree (or trees) connecting routers having
local mcast group members

source-based: different tree from sender to each receiver
– Distance-vector multicast routing protocol (DVMRP)
– Protocol-independent multicast-dense mode (PIM-DM)

shared-tree: same tree used by all group members
– Core-Based Tree (CBT)
– Protocol-independent multicast-sparse mode (PIM-SM)
shared tree
source-based trees
51
Source Tree:
Reverse Path Flooding (RPF)
 A router x forwards a packet from source (S)
iff it arrives via neighbor y, and y is on the
shortest path from x back to S
 A packet is replicated to all but the incoming
interface
1
S
1
y
x
1
1
z
t
1
a
52
Reverse Path Forwarding:
Improvement
 Basic idea: forward a packet from S only on
child links for S
 A child link of router x for source S
a link that has x as parent on
the shortest path from the
link to S
 a child x notifies its parent y
(through the routing protocol)
that it has selected y as its
parent

S
y
x
z
t
a
53
Reverse Path Forwarding: Pruning
 No
need to forward datagrams down subtree
with no mcast group members
 “prune” msgs sent upstream by router with
no downstream group members
LEGEND
S: source
R1
router with attached
group member
R4
R2
P
R5
R3
R6
P
R7
P
router with no attached
group member
prune message
links with multicast
forwarding
54
Pruning
 Prune (Source, Group) at a leaf router if no members
 send No-Membership Report (NMR) up tree
 If all children of router R prune (S,G)
 propagate prune for (S,G) to its parent
 What do you do when a member of a group (re)joins?
 send a Graft message to upstream parent
 How to deal with failures?
 prune dropped
 flow is reinstated
 down stream routers re-prune
 Note: again a soft-state approach
55
Implementation of
Source Trees in the Internet
 Multicast OSFP (MOSFP)
 Membership is part of the link state distribution; calculate
source specific, pre-pruned trees
 Reverse Path Forwarding
 Distance Vector Multicast Routing Protocol (DVMRP)
 Protocol Independent Multicast – Dense Mode (PIM-DM)
• very similar to DVMRP


Difference: PIM uses any unicast routing algorithm to
determine the path from a router to the source; DVMRP
uses distance vector
Question: the state requirement of Reverse Path
Forwarding
56
Building a Shared Tree
 Steiner Tree: minimum cost




tree connecting all routers
with attached group members
A Steiner tree is not a
spanning tree because you
do not need to connect all
nodes in the network
Problem is NP-hard
Excellent heuristics exists
Not used in practice:



computational complexity
information about entire network needed
monolithic: rerun whenever a router needs to join/leave
57
Center (Core) based Shared Tree
 Single delivery tree shared by all
 One router identified as “center” of tree
 Tree construction is receiver-based
 edge router sends unicast join-msg addressed to center
router
 join-msg “processed” by intermediate routers and
forwarded towards center
 join-msg either hits existing tree branch for this center,
or arrives at center
 path taken by join-msg becomes new branch of tree for
this router
 A sender unicasts a packet to center
 The packet is distributed on the tree when it hits the tree
58
Example: M3 Joins
 Group members: M1, M2
core
M1
M2
M3
shared tree
join message
S1
Discussion: what is property of the constructed tree?
59
Example: M1 Sends Data
 Group members: M1, M2, M3
 M1 sends data
core
M1
M2
control (join) messages
data
M3
S1
60
Shared Tree Protocols in the Internet
 Core Based Tree
 Protocol Independent Multicast (PIM)
Sparse mode
 The catch: how do you know the center?

session announcement
61
Mbone: Tunneling
Q: How to connect “islands” of multicast
routers in a “sea” of unicast routers?
physical topology
logical topology
 mcast datagram encapsulated inside “normal” (non-multicast-
addressed) datagram
 normal IP datagram sent thru “tunnel” via regular IP unicast to
receiving mcast router
 receiving mcast router unencapsulates to get mcast datagram
62