Network Layer Issues - SFU computer science
Download
Report
Transcript Network Layer Issues - SFU computer science
CMPT 771 Internet Architecture and
Protocols
Network Layer Issues
and Applications
CMPT771 Network Layer 1
Network Layer Issues and Applications
Contents
1. Network Service Models
2. Routing Principles
3.
4.
5.
6.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 2
Network layer functions
deliver packets from sending to
receiving hosts
network layer protocols in every
host, router
three important functions:
path determination: route taken
by packets from source to dest.
Routing algorithms
forwarding: move packets from
router’s input to appropriate
router output
call setup: some network
architectures require router call
setup along path before data
flows
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
CMPT771 Network Layer 3
Network service model
Q: What service model for
“channel” transporting
packets from sender to
receiver?
guaranteed bandwidth?
preservation of inter-packet
timing (no jitter)?
loss-free delivery?
in-order delivery?
congestion feedback to sender?
The most important
abstraction provided
by network layer:
? ?
?
virtual circuit
or
datagram?
CMPT771 Network Layer 4
Virtual circuits
“source-to-dest path behaves much like telephone circuit”
performance-wise
network actions along source-to-dest path
application
transport 5. Data flow begins
network 4. Call connected
data link 1. Initiate call
physical
6. Receive data application
3. Accept call transport
2. incoming call network
data link
physical
CMPT771 Network Layer 5
Virtual circuits
call setup, teardown for each call before data can flow
each packet carries VC identifier (not destination host ID)
every router on source-dest path maintains “state” for each
passing connection
transport-layer connection only involved two end systems
link, router resources (bandwidth, buffers) may be allocated to
VC
to get circuit-like perf.
used to setup, maintain teardown VC
used in ATM, frame-relay, X.25
not used in today’s Internet
CMPT771 Network Layer 6
Datagram networks: Internet’s model
no call setup at network layer
routers: no state about end-to-end connections
no network-level concept of “connection”
Forwarded: using destination host address
packets between same source-dest pair may take different
paths
application
transport
network
data link 1. Send data
physical
application
transport
2. Receive data network
data link
physical
CMPT771 Network Layer 7
Datagram or VC network: why?
Asynchronous Transfer Mode
- ATM (VC)
evolved from telephony
human conversation:
strict timing, reliability
requirements
need for guaranteed
service
“dumb” end systems
telephones
complexity inside network
Internet (Datagram)
data exchange among computers
“elastic” service, no strict
timing req.
“smart” end systems
(computers)
can adapt, perform control,
error recovery
simple inside network,
complexity at “edge”
heterogeneous link types
different characteristics
uniform service difficult
Hard state vs Soft state
CMPT771 Network Layer 8
Outline
1. Network Service Models
2. Routing Principles
3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 9
Routing
Routing protocol
5
Goal: determine a “good” path
(sequence of routers) thru
network from source to dest.
Graph abstraction for routing
algorithms:
graph nodes are routers
graph edges are physical
links
link cost: delay, $ cost, or
congestion level
2
A
B
2
1
D
3
C
3
1
5
F
1
E
2
“good” path:
typically means minimum
cost path
other def’s possible
CMPT771 Network Layer 10
Routing Algorithm classification
Global or decentralized
information?
Global:
all routers have complete
topology, link cost info
“link state” algorithms
Decentralized:
router knows physicallyconnected neighbors, link costs
to neighbors
iterative process of
computation, exchange of info
with neighbors
“distance vector” algorithms
Static or dynamic?
Static:
routes change slowly over
time
Dynamic:
routes change more quickly
periodic update
in response to link cost
changes
CMPT771 Network Layer 11
Link-State Routing Algorithm
Dijkstra’s algorithm
net topology, link costs known
to all nodes
accomplished via “link
state broadcast”
all nodes have same info
computes least cost paths
from one node (‘source”) to all
other nodes
gives routing table for
that node
Notation:
c(i,j): link cost from node i to
j. cost infinite if not direct
neighbors
D(v): current value of cost of
path from source to dest V
p(v): predecessor node along
path from source to v, that is
next v
N: set of nodes whose least
cost path definitively known
CMPT771 Network Layer 12
Distance Vector Routing Algorithm
Key Idea
Given my distance to a neighboring
node
Given the distances from the
neighboring nodes to remote nodes
My distances to remote nodes
iterative:
continues until no nodes
exchange info.
self-terminating: no “signal”
to stop
asynchronous:
nodes need not exchange
info/iterate in lock step!
distributed:
each node communicates only
with directly-attached
neighbors
CMPT771 Network Layer 13
Distance Vector Routing Algorithm
Distance Table data structure
each node has its own
row for each possible destination
column for each directly-attached
neighbor to node
example: in node X, for dest. Y via
neighbor Z:
X
D (Y,Z)
via
X
D ()
Y
Z
Y
1
2
Z
7
5
distance from X to
= Y, via Z as next hop
Z
= c(X,Z) + minw{D (Y,w)}
CMPT771 Network Layer 14
Distance Vector Routing: overview
Iterative, asynchronous: each
local iteration caused by:
message from neighbor: its
least cost path change from
neighbor
Distributed:
each node notifies neighbors
only when its least cost path
to any destination changes
neighbors then notify their
neighbors if necessary
Each node:
wait for (msg from neighbor)
recompute distance table
if least cost path to any dest
has changed, notify
neighbors
CMPT771 Network Layer 15
Distance Vector: Count-to-Infinity Problem
3
A
1
1
B
1
C
CMPT771 Network Layer 16
Comparison of LS and DV algorithms
Message complexity
LS: with n nodes, E links, O(nE)
msgs sent each
DV: exchange between
neighbors only
convergence time varies
Speed of Convergence
LS: O(n2) algorithm requires
O(nE) msgs
may have oscillations
DV: convergence time varies
may be routing loops
count-to-infinity problem
Robustness: what happens if
router malfunctions?
LS:
node can advertise incorrect
link cost
each node computes only its
own table
DV:
DV node can advertise
incorrect path cost
each node’s table used by
others
• error propagate thru
network
CMPT771 Network Layer 17
Routing in the Internet
The Global Internet consists of Autonomous Systems (AS)
interconnected with each other:
Stub AS: small corporation: one connection to other AS’s
Multihomed AS: large corporation (no transit): multiple connections
to other AS’s
Transit AS: provider, hooking many AS’s together
Two-level routing:
Intra-AS: administrator responsible for choice of routing
algorithm within network
Inter-AS: unique standard for inter-AS routing: BGP
CMPT771 Network Layer 18
Internet AS Hierarchy
Intra-AS border (exterior gateway) routers
Inter-AS interior (gateway) routers
CMPT771 Network Layer 19
Topology of Tier-1 ISP
CMPT771 Network Layer 20
Redundancy in AS Routes
CMPT771 Network Layer 21
Intra-AS Routing
Also known as Interior Gateway Protocols (IGP)
Most common Intra-AS routing protocols:
RIP: Routing Information Protocol
OSPF: Open Shortest Path First
IGRP: Interior Gateway Routing Protocol (Cisco
proprietary)
CMPT771 Network Layer 22
Internet inter-AS routing: BGP
BGP (Border Gateway Protocol): the de facto standard
Path Vector protocol:
similar to Distance Vector protocol
each Border Gateway broadcast to neighbors (peers)
entire path (i.e., sequence of AS’s) to destination
BGP routes to networks (ASs), not individual hosts
E.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z
CMPT771 Network Layer 23
Why different Intra-/Inter-AS routing ?
C. Huitema, Routing on the Internet. Prentice-Hall,
2000.
Telus
Shaw
Bell
CMPT771 Network Layer 24
Outline
1. Network Service Models
2. Routing Principles
3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 25
Multicast: one sender to many receivers
Multicast: act of sending datagram to multiple receivers with
single “transmit” operation
analogy: one teacher to many students
Question: how to achieve multicast (which layer ?)
Multicast via unicast
source sends N unicast
datagrams, one
addressed to each of N
receivers
routers
forward unicast
datagrams
multicast receiver (red)
not a multicast receiver (red)
CMPT771 Network Layer 26
Multicast: one sender to many receivers
Multicast: act of sending datagram to multiple receivers with
single “transmit” operation
analogy: one teacher to many students
Question: how to achieve multicast
Multicast via unicast:
Problems
1. Inefficient
2. Addressing
routers
forward unicast
datagrams
multicast receiver (red)
not a multicast receiver (red)
CMPT771 Network Layer 27
Multicast: one sender to many receivers
Multicast: act of sending datagram to multiple receivers with
single “transmit” operation
analogy: one teacher to many students
Question: how to achieve multicast
Network multicast
Router actively participate in
multicast, making copies of
packets as needed and
forwarding towards multicast
receivers
Multicast
routers (red) duplicate and
forward multicast datagrams
CMPT771 Network Layer 28
Multicast: one sender to many receivers
Multicast: act of sending datagram to multiple receivers with
single “transmit” operation
analogy: one teacher to many students
Question: how to achieve multicast
Application-layer multicast
end systems involved in
multicast copy and forward
unicast datagrams among
themselves
CMPT771 Network Layer 29
Multicast Routing: Problem Statement
Goal: find a tree (or trees) connecting routers having
local mcast group members
tree: not all paths between routers used
source-based: different tree from each sender to rcvrs
shared-tree: same tree used by all group members (senders)
Shared tree
Source-based trees
CMPT771 Network Layer 30
Approaches for building mcast trees
Approaches:
source-based tree: one tree per source
shortest path trees
reverse path forwarding
group-shared tree: group uses one tree
minimal spanning (Steiner)
center-based trees
…we first look at basic approaches, then specific
protocols adopting these approaches
CMPT771 Network Layer 31
Source based Tree:
Shortest Path Tree
mcast forwarding tree: tree of shortest path
routes from source to all receivers
Dijkstra’s algorithm
S: source
LEGEND
R1
1
2
R4
R2
3
R3
router with attached
group member
5
4
R6
router with no attached
group member
R5
6
R7
i
link used for forwarding,
i indicates order link
added by algorithm
CMPT771 Network Layer 32
Source based Tree:
Flooding
S: source
R1
R4
R2
R5
R3
R6
R7
• Problem: Broadcast storm
CMPT771 Network Layer 33
Source based Tree:
Reverse Path Forwarding
rely on router’s knowledge of unicast
shortest path from it to sender
each router has simple forwarding behavior:
if (mcast datagram received on incoming link on
shortest path back to center)
then flood datagram onto all outgoing links
else ignore datagram
CMPT771 Network Layer 34
Reverse Path Forwarding: example
S: source
LEGEND
R1
R4
router with attached
group member
R2
R5
R3
R6
R7
router with no attached
group member
datagram will be
forwarded
datagram will not be
forwarded
• result is a source-specific reverse SPT
– may be a bad choice with asymmetric links
CMPT771 Network Layer 35
Reverse Path Forwarding: pruning
forwarding tree contains subtrees with no mcast group
members
no need to forward datagrams down subtree
“prune” msgs sent upstream by router with no downstream
group members
LEGEND
S: source
R1
router with attached
group member
R4
R2
P
R5
R3
R6
P
R7
P
router with no attached
group member
prune message
links with multicast
forwarding
CMPT771 Network Layer 36
Reverse Path Forwarding:
Multiple trees for multi-sender
S: source
R1
R1
R4
R4
R2
R2
R5
R5
R3
R6
R7
R3
R6
R7
S: source
CMPT771 Network Layer 37
Shared-Tree: General Problem
Minimum Spanning Tree: minimum cost tree
connecting all routers with attached group
members
Algorithms ?
Steiner Tree : minimum cost tree connecting a set
of routers, which includes all that with attached
group members
CMPT771 Network Layer 38
Shared-Tree: General Problem
Minimum Spanning Tree: minimum cost tree
connecting all routers with attached group
members
Prim, Kurskal algorithms
Steiner Tree : minimum cost tree connecting a set
of routers, which includes all that with attached
group members
CMPT771 Network Layer 39
Spanning Tree vs Steiner Tree
CMPT771 Network Layer 40
An example of a minimum Steiner Tree
5
4
K
F
1
3
AG = 6
AE = 5
AD = 3
AC = 3
AI = 8
Shortest Paths from A
total distribution cost = 16
6
Relay Nodes
5
2
3*
1
D
A
E
C
4
I
1
2
5
J
2
B
Mcast group members
4
H
1
G
AG = 8
AE = 5
AD = 3
AC = 3
AI = 8
Steiner Tree paths from A
total distribution cost=13
CMPT771 Network Layer 41
Shared-Tree: General Problem
Minimum Spanning Tree: minimum cost tree connecting all
routers with attached group members
Prim, Kurskal algorithms
Minimum Steiner Tree : minimum cost tree connecting a set
of routers, which includes all that with attached group
members
problem is NP-complete
excellent heuristics exists
CMPT771 Network Layer 42
KMB - A Fast Algorithm for Steiner Trees
(Acta Informatica)
Output a Stiener Tree Th for G and the Z-vertices.
Step
1: Construct a complete directed distance graph G1=(V1,E1,c1) from
G and Z.
Step 2: Find the minimum spanning tree T1 of G1. (pick any to break
ties)
Step3: Construct a subgraph GS of G by replacing each edge in T1 by its
corresponding shortest path in G. (break ties arbitrarily).
Step 4: Find the minimum spanning tree TS of GS (break ties
arbitrarily).
Step 5: Construct a Steiner tree TH from TS by deleting edges in TS if
necessary, so that all the leaves in TH are Steiner points.
Worst case time complexity O(|S||V|2).
Cost no more than 2(1 - 1/l) *optimal cost
where l = number of leaves in the steiner tree.
CMPT771 Network Layer 43
Working of KMB
A
A
4
1
10
H
I
1/2
1/2
G
1
1
1
B
8
1
F
2
C
9
E
2
D
B
D
4
4
4
4
A
1
4
H
C
I
1/2
1/2
1
G
A
4
1
D
B
1
4
B
1
F
2
4
C
E
2
D
C
CMPT771 Network Layer 44
Working of KMB
A
A
1
1
H
1/2
1/2
G
B
1
1
F
2
C
I
I
1
1
E
B
1
1
F
2
2
C
D
E
2
D
Destination Nodes
Intermediate Nodes
CMPT771 Network Layer 45
From Theory to Practice (2)
Does the heuristic algorithm(s) work ?
Demands
Distributed implementation
Support reliable transmission
– link failure should not increase delay or reduce resource availability
Return optimal routes taking into consideration
• price to be paid (bandwidth consumed)
• end to end delay. (no. of links traversed)
Minimize network load.
– Avoid loops
– Avoid traffic concentration on a few links or sub-nets
Minimize the state stored in routers
Asymmetric links
Steiner tree is for undirectional
CMPT771 Network Layer 46
From Theory to Practice (2)
Performance metrics (costs?)
Delay
• End to end delay between source and receiver relative to
the shortest unicast path delay.
Cost
• Cost of total bandwidth consumption
• Cost of tree state info
Traffic Concentration
• Maximum number of flows on a unidirectional link.
• How evenly the routes are distributed.
CMPT771 Network Layer 47
Internet Multicast Service Model
128.59.16.12
128.119.40.186
multicast
group
226.17.30.197
128.34.108.63
128.34.108.60
multicast group concept: use of indirection
hosts addresses IP datagram to multicast group
routers forward multicast datagrams to hosts that have
“joined” that multicast group
Many-to-many communications
CMPT771 Network Layer 48
Multicast groups
class D Internet addresses reserved for multicast:
host group semantics:
o anyone can “join” (receive) multicast group
o anyone can send to multicast group
o no network-layer identification to hosts of members
needed: infrastructure to deliver mcast-addressed datagrams to
all hosts that have joined that multicast group
CMPT771 Network Layer 49
Joining a mcast group: two-step process
local: host informs local mcast router of desire to join group:
IGMP (Internet Group Management Protocol)
wide area: local router interacts with other routers to receive
mcast datagram flow
many protocols (e.g., DVMRP, MOSPF, PIM)
IGMP
IGMP
wide-area
multicast
routing
IGMP
CMPT771 Network Layer 50
IGMP: Internet Group Management Protocol
host: sends IGMP report when application joins mcast group
IP_ADD_MEMBERSHIP socket option
host need not explicitly “unjoin” group when leaving
router: sends IGMP query at regular intervals
host belonging to a mcast group must reply to query
query
report
CMPT771 Network Layer 51
IGMP
IGMP version 1
router: Host Membership
Query msg broadcast on
LAN to all hosts
host: Host Membership
Report msg to indicate
group membership
randomized delay before
responding
implicit leave via no reply
to Query
RFC 1112
IGMP v2: additions include
Leave Group msg
last host replying to Query
can send explicit Leave Group
msg
router performs groupspecific query to see if any
hosts left in group
RFC 2236
IGMP v3: under development as
Internet draft
CMPT771 Network Layer 52
Tunneling
Q: How to connect “islands” of multicast routers in a
“sea” of unicast routers?
physical topology
logical topology
mcast datagram encapsulated inside “normal” (non-multicast-
addressed) datagram
normal IP datagram sent thru “tunnel” via regular IP unicast to
receiving mcast router
receiving mcast router unencapsulates to get mcast datagram
CMPT771 Network Layer 53
Outline
1. Network Service Models
2. Routing Principles
3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 54
Video Multicast
Why attractive ?
Bandwidth efficiency
Multi-receiver nature of video programs
What is the support ?
IP multicast
– network layer
RTP protocol
– transport layer
But, generic interfaces only …
CMPT771 Network Layer 55
Challenges: Best-Effort Service
IP multicast is best-effort only
No quality-of-service guarantee
What if non-adaptive ?
- Unfair to adaptive traffic starve TCP
- Lead to congestion poor user experience
Solutions
Quality-of-service deployment
- IntServ (RSVP), DiffServ, QoS routing
- Long-term objective, will be addressed later
Adaptive Multicast
CMPT771 Network Layer 56
Challenges: Heterogeneity
Receivers
Bandwidth
- 56K Modem, 1.5 ADSL,10M/100M Ethernet …
Computation power
- PDA, Laptop, Workstation …
Sessions
Uneven session populations
CMPT771 Network Layer 57
Not an Atypical Network
CMPT771 Network Layer 58
Challenges: Heterogeneity
Receivers
Bandwidth
- 56K Modem, 1.5 ADSL,10M/100M Ethernet …
Computation power
- PDA, Laptop, Workstation …
Sessions
Uneven session populations
How can a single rate satisfy everyone simultaneous ?
Solution
Multi-rate Multicast
CMPT771 Network Layer 59
Solution 1: Simulcast
Replicated streams
Same video program
Different rates
Each stream for a cluster of receivers
Same access technology
Same bottleneck
1.5 Mbps encoding
28.8 Kbps encoding
CMPT771 Network Layer 60
Solution 1: Simulcast
Pros
Simplicity
Compatibility
Redundancy
Cons
Redundancy
Mismatch
Number of Streams
Single-rate Multicast
Simulcast
Multiple-unicast
CMPT771 Network Layer 61
Solution 2: Layered Multicast
Cumulative layered coding
(Scalable coding)
Base layer
- Most important feature
- Low rate, low quality
Enhancement layers
- Progressively refine the quality
Receiver-driven
Adaptation
Layer Multicast group
Congestion ?
- No: Join group
- Yes: Leave group
CMPT771 Network Layer 62
Solution 3: Proxy-based Video Filtering
Multiple video proxies (agents, gateways,..)
Each proxy for a confined region
Bandwidth monitoring
Video filtering
- Frame dropping, transcoding …
Pros
Fine-grained rate control
Work with no IP multicast
Cons
Expensive
High computation cost
Sender
Receiver
Router
Proxy
CMPT771 Network Layer 63
Active Node (Proxy) vs. Network Node
Application Level
Rate
Controller
Video
Transcoder
RTP Layer
Network Level
Packet
Scheduler
QoS
Monitor
IP Layer
IGMP
RTCP
Upstream
Node
Downstream
Node
UDP Layer
(b)
Packet
Scheduler
IP Layer
IGMP
Network Level
Upstream
Node
(a)
Downstream
Node
CMPT771 Network Layer 64
Taxonomy
Unicast
Video
Transmission
Single-Rate
Simulcasting
Multicast
Cumulative
Multi-Rate
Layering
Noncumulative
Proxy Filtering
CMPT771 Network Layer 65
Example: Receiver-Driven Layered Multicast
Server sends layered encoded stream where each layer is sent
to a separate multicast group
Receivers can control delivered quality by joining proper
number of multicast groups. i.e. controls no of delivered layers
Goal: to accommodate receiver bw heterogeneity!!
Basic idea:
When no cong. => receiver adds a new layer
When cong occurs => receiver drops top layer
CMPT771 Network Layer 66
RLM: example
Layer #
4
3
2
1
Time
CMPT771 Network Layer 67
Issues for adding layers
When?
Sustained performance with little to no loss.
Hysterics problems
When available bandwidth is between the rates of two
different layers.
Probing by new members.
Join latency
Takes a while for multicast joins to occur
CMPT771 Network Layer 68
Issues for dropping layers.
When?
Usually as a reaction to congestion.
Co-located receiver must drop the top layer at
about the same time!
New members probing for appropriate layer will
cause “false” congestion.
Leave latency
Similar to the problem with join latency.
CMPT771 Network Layer 69
Join Experiments
Receivers use “join-experiments” to determine if a
layer should be added.
A join timer is associated with each layer
• Initially, timers are fairly short.
If timer goes off and no congestion is experienced, then
next level joined.
If congestion is experienced, then level is dropped and join
timer is exponentially backed off.
Receiver learns from past experiences
CMPT771 Network Layer 70
RLM Problems (1)
Scaling issues.
Too many participants attempting join-experiments leads
to chronic congestion.
Possible fix?
Coordinate join experiments with others.
Shared learning
Still problematic
• Want to limit shared learning to topologically close
neighbors.
• Complexity of control mechanisms increases.
CMPT771 Network Layer 71
RLM Problems (2)
Bandwidth adaptation
How to allocate among sessions
- Uniform ? But session population is uneven
How to allocate among layers
- Heterogeneous bandwidth demands from receivers
- Non-linear video utility
- Num of layers
Objective: Better Fairness
Intra-session fairness
• Fairly treat all the users
• Ideal: each user receives video at a rate commensurate
with its own bandwidth demand
CMPT771 Network Layer 72
Control Bandwidth Scaling
Case Study: RTCP for Multicast
- To limit traffic, each participant reduces his RTCP traffic as the number
of conference participants increases.
CMPT771 Network Layer 73
RTCP Bandwidth Scaling
RTCP attempts to limit its
traffic to 5% of the
session bandwidth.
Example
Suppose one sender,
sending video at a rate of 2
Mbps. Then RTCP attempts
to limit its traffic to 100
Kbps.
RTCP gives 75% of this
rate to the receivers;
remaining 25% to the
sender
The 75 kbps is equally shared
among receivers:
With R receivers, each
receiver gets to send RTCP
traffic at 75/R kbps.
Sender gets to send RTCP
traffic at 25 kbps.
Participant determines RTCP
packet transmission period by
calculating avg RTCP packet
size (across the entire
session) and dividing by
allocated rate.
CMPT771 Network Layer 74
Randomization in Feedback
Case Study: Reliable Multicast (RM)
how to transfer data “reliably” from source to receivers ?
ACK/NAK ?
Recall
• TCP/UDP vs Multicast
CMPT771 Network Layer 75
Randomized Feedback Suppression
randomly delay NAK generation
“listen” to NAKs generated by others
if no NAK for lost pkt when delay
expires, transmit NAK
tradeoffs:
more complexity (backoff timer), more
pkts (data, NAKs) at rcvs
reduced feedback at sender
what’s the optimal delay distribution?
CMPT771 Network Layer 76
Local Recovery
Allow rcvr to recover lost pkt orthogonal (complementary)
from “nearby” rcvr
to feedback supression
“ask your neighbor”: send who to recover from?
localized NAK (repair
don’t want repair request
request)
to go to everyone
multicast: randomize local
scoping: how to restrict
repair transmission time to
how far request will
avoid too many replies
travel: IP time-to-live
field
CMPT771 Network Layer 77
Local Recovery: example
R2 detects lost pkt
waits random amount of
time, multicasts repair
request
limited scope
not seen by R4
R1 and R3 have pkt
R3 times out first and
multicasts repair, with
limited scope
Other possibilities
Feedback aggregation
CMPT771 Network Layer 78
Outline
1. Network Layer Service Models
2. Routing Principles
3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 79
More about Proxy: Caching
Goal: satisfy client request without involving origin server
user sets browser: Web
accesses via proxy
browser sends all HTTP
requests to proxy
object in cache: cache
returns object
else cache requests object
from origin server, then
returns object to client
origin
server
client
Proxy
server
Proxy: both client and server;
typically installed by ISP
(university, company,
residential ISP)
client
origin
server
CMPT771 Network Layer 80
Caching example (1)
Assumptions
average object size = 100,000
bits
avg. request rate from
institution’s browser to origin
serves = 15/sec
delay from institutional router to
any origin server and back to
router = 2 sec
Consequences
utilization on LAN = 15%
utilization on access link = 100%
total delay = Internet delay + access
delay + LAN delay
= 2 sec + minutes + milliseconds
origin
servers
public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
institutional
cache
CMPT771 Network Layer 81
Caching example (2)
Possible solution
increase bandwidth of access link
to, say, 10 Mbps
origin
servers
public
Internet
Consequences (if 10 Mbps)
=
utilization on LAN = 15%
utilization on access link = 15%
Total delay = Internet delay + access institutional
delay + LAN delay
network
2 sec + msecs + msecs
often a costly upgrade
10 Mbps
access link
10 Mbps LAN
institutional
cache
CMPT771 Network Layer 82
Caching example (3)
origin
servers
Install cache
suppose hit rate is .4
Consequence
=
40% requests will be satisfied
almost immediately
60% requests satisfied by origin
server
utilization of access link reduced to
60%, resulting in negligible delays
(say 10 msec)
total delay = Internet delay +
access delay + LAN delay
.6*2 sec + .4*.01 secs + milliseconds
< 1.3 secs
public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
institutional
cache
CMPT771 Network Layer 83
Hierarchical cache
origin
server
How to measure cache ?
Hit Ratio (HR)
Calculation
HR of Proxy 1 = 90%
HR of Proxy 2 = 95%
Independent
client
Joint HR
= 1-(1-0.9)*(1-0.95)*100%
= 99.5%
Proxy
Server 1
Proxy
Server 2
client
CMPT771 Network Layer 84
Why Web caching ?
Reduce response time for client request.
Reduce traffic on an institution’s access link.
Avoid overloading origin server
Internet dense with caches enables “poor” content
providers to effectively deliver content
Enhance data availability (Related: Google)
CMPT771 Network Layer 85
More about Web caching
Why caching is effective for Web, even if
cache space is quite limited ?
Access Probability
Zipf distribution
pj
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
(1 / j )
|S |
8
j 1
(1 / j )
9
,
j 1,2,..., | S |,
10
Obejct ID
CMPT771 Network Layer 86
Video Caching
Proxy caching
saving objects at proxies close to clients
Temporal locality
Geographical locality
Skewed access rates
• Zipf distribution, too
• Theta = 0.271
Proxy
Backbone Network
Video Server
Proxy
CMPT771 Network Layer 87
Video objects vs. Web objects
High data rate, yet adaptive
Long playback duration
► Various interactions:
• random access
• early termination
► Huge volume
• one-hour MPEG-1, about 675 MB
Semi-static
Low validation cost (consistency)
CMPT771 Network Layer 88
Revisit: Consistency of Web Objects
client
Inconsistency
HTTP request msg
out-of-date objects…
If-modified-since:
News, stock websites
<date>
Solution: Conditional GET
HTTP response
server
object
not
modified
HTTP/1.0
304 Not Modified
HTTP request msg
If-modified-since:
<date>
HTTP response
object
modified
HTTP/1.0 200 OK
<data>
CMPT771 Network Layer 89
Solution: Partial Video Caching
Consistency is not a big issue for video caching
But data volume is …
Client
Control Channel
(RTSP, RTCP)
Media Proxy
Control Channel
(RTSP,RTCP)
Proxy Manager
Enterprise
Network
Buffer
Data Channel
(RTP)
Assembler/Switcher
Player
Server
Scheduler
Backbone
Network
Cache
Media
Repository
Data Channel
(RTP)
CMPT771 Network Layer 90
Generalizing Proxy Functionalities (1):
Content Distribution Networks
Content replication
CDN company installs hundreds
of CDN servers throughout
Internet
in lower-tier ISPs, close to
users
CDN replicates its customers’
content in CDN servers. When
provider updates content, CDN
updates servers
origin server
in North America
CDN distribution node
CDN server
in S. America CDN server
in Europe
CDN server
in Asia
CMPT771 Network Layer 91
CDN example
HTTP request for
www.foo.com/sports/sports.html
Origin server
1
2
3
DNS query for www.cdn.com
CDNs authoritative
DNS server
HTTP request for
www.cdn.com/www.foo.com/sports/ruth.gif
origin server
www.foo.com
distributes HTML
Nearby
CDN server
Replaces:
http://www.foo.com/sports.ruth.gif
with
CDN company
cdn.com
distributes gif files
uses its authoritative DNS
server to route redirect
requests (CNAME)
http://www.cdn.com/www.foo.com/sports/ruth.gif
CMPT771 Network Layer 92
More about CDNs
routing requests
CDN creates a “map”, indicating distances from leaf
ISPs and CDN nodes
when query arrives at authoritative DNS server:
server determines ISP from which query originates
uses “map” to determine best CDN server
Caching vs. CDN
Pull: passive
Push: active
CMPT771 Network Layer 93
Outline
1. Network Layer Service Models
2. Routing Principles
3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 94
Generalizing Proxy Functionalities (2):
Peer-to-Peer Paradigm
A peer is both client and server
Recall Client/Server paradigm
Client:
application
transport
network
data link
physical
request
initiates contact with server
(“speaks first”)
typically requests service from
server,
Web: client implemented in
browser; e-mail: in mail reader
Server:
provides requested service to client
reply
application
transport
network
data link
physical
e.g., Web server sends requested Web
page, mail server delivers e-mail
CMPT771 Network Layer 95
Peer-to-Peer Communications
Example
Alice runs P2P client
application on her notebook
computer
Intermittently connects to
Internet; gets new IP
address for each
connection
Asks for “X.mp3”
Application displays other
peers that have copy of
X.mp3.
Alice chooses one of the
peers, Bob.
File is copied from Bob’s PC
to Alice’s notebook: HTTP
While Alice downloads,
other users uploading from
Alice.
Alice’s peer is both a Web
client and a transient Web
server.
All peers are servers = highly
scalable!
CMPT771 Network Layer 96
P2P: centralized directory
For file sharing
centralized
Key issue: File searching directory
server
- which peer has the file ?
original “Napster” design
1) when peer connects, it informs
central server:
IP address
content
2) Alice queries for “X.mp3”
3) Alice requests file from Bob
Bob
1
peers
1
3
1
2
1
Alice
CMPT771 Network Layer 97
P2P: problems with centralized directory
Single point of failure
Performance bottleneck
Copyright infringement
file transfer is
decentralized, but
locating content is highly
decentralized
CMPT771 Network Layer 98
P2P: decentralized directory
Each peer is either a group
leader or assigned to a
group leader.
Group leader tracks the
content in all its children.
Peer queries group leader;
group leader may query
other group leaders.
ordinary peer
group-leader peer
neighoring relationships
in overlay network
CMPT771 Network Layer 99
More about decentralized directory
advantages of approach
no centralized directory server
location service distributed over peers
more difficult to shut down
disadvantages of approach
bootstrap node needed
group leaders can get overloaded
CMPT771 Network Layer 100
P2P: Query flooding
Gnutella
no hierarchy
use bootstrap node to learn
about others
join message
Send query to neighbors
Neighbors forward query
If queried peer has object, it
sends message back to
querying peer
join
CMPT771 Network Layer 101
P2P: more on query flooding
Pros
peers have similar
responsibilities: no group
leaders
highly decentralized
no peer maintains directory
info
Cons
excessive query traffic
query radius: may not have
content when present
bootstrap node
maintenance of overlay
network
CMPT771 Network Layer 102
DHT: A New Story…
In 2000-2001, academic researchers said
“we want to play too!”
Motivation:
Frustrated by popularity of all these “half-baked” P2P apps
We can do better!
Guaranteed lookup success for files in system
Provable bounds on search time
Provable scalability to millions of node
Hot Topic in networking ever since
No practical use so far (?)
CMPT771 Network Layer 103
P2P: Content Addressing (Hash Routing)
Hash routing
Given an object identifier I, calculate its hash value
H=hash(I), and (hopefully) find it (or its location info) in
peer H
Not a new idea
Load balancing – hash IP address, re-direct to different
servers
application
put(key, data)
node
get (key)
hash table
node
….
data
node
CMPT771 Network Layer 104
Hash Routing
Two alternatives
Node can cache each (existing) object that hashes within
its range
Pointer-based: level of indirection - node caches pointer to
location(s) of object
1000-1999
What’s new in P2P?
Dynamic overlay
9000-9500
4500-6999
8000-8999
7000-8500
0-999
• peer join/leave
• number of peers is not fixed
1500-4999
9500-9999
Traditional hash function doesn’t work
• SHA-1
CMPT771 Network Layer 105
Distributed Hash Table (DHT)
Challenges
For each object, node(s) whose range(s) cover that
object must be reachable via a “short” path
# neighbors for each node should scale well (e.g., should
not be O(N))
Fully distributed (no centralized bottleneck/single point
of failure)
DHT mechanism should gracefully handle nodes
joining/leaving
need to repartition the range space over existing nodes
need to reorganize neighbor set
need bootstrap mechanism to connect new nodes into the
existing DHT infrastructure
CMPT771 Network Layer 106
Case Studies
Structure overlay (p2p) systems
Chord
CAN (Content Addressable Network)
Key Questions
Q1: How is hash space divided “evenly” among existing
nodes?
Q2: How is routing implemented that connects an
arbitrary node to the node responsible for a given
object?
Q3: How is the hash space repartitioned when nodes
join/leave?
Let N be the number of nodes in the overlay
Let H be the size of the range of the hash
function (when applicable)
CMPT771 Network Layer 107
Chord
Associate to each node and file a unique id in an uni-
dimensional space (a Ring)
E.g., pick from the range [0...2m]
Usually the hash of the file or IP address
Properties:
Routing table size is O(log N) , where N is the total number of
nodes
Guarantees that a file is found in O(log N) hops
from MIT in 2001
CMPT771 Network Layer 108
Consistent Hashing
Key 5
Node 105
K5
N105
K20
Circular ID space
N32
N90
K80
A key is stored at its successor: node with next higher ID
CMPT771 Network Layer 109
Chord Basic Lookup
N120
N10
N105
“N90 has K80”
“Where is key 80?”
N32
K80 N90
N60
CMPT771 Network Layer 110
Chord “Finger Table”
1/4
1/2
1/8
1/16
1/32
1/64
1/128
N80
Entry i in the finger table of node n is the first node that succeeds or
equals n + 2i
In other words, the ith finger points 1/2n-i way around the ring
CMPT771 Network Layer 111
Chord Join
Assume an identifier space [0..8]
Node n1 joins
Succ. Table
i id+2i succ
0 2
1
1 3
1
2 5
1
0
1
7
6
2
5
3
4
CMPT771 Network Layer 112
Chord Join
Node n2 joins
Succ. Table
i id+2i succ
0 2
2
1 3
1
2 5
1
0
1
7
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
1
1 4
1
2 6
1
CMPT771 Network Layer 113
Chord Join
Succ. Table
i id+2i succ
0 1
1
1 2
2
2 4
6
Nodes n0, n6 join
Succ. Table
i id+2i succ
0 2
2
1 3
6
2 5
6
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
CMPT771 Network Layer 114
Chord Join
Succ. Table
i
Nodes:
i id+2
0 1
1 2
2 4
n1, n2, n0, n6
Items:
f7, f2
Items
7
succ
1
2
6
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
Succ. Table
6
i
i id+2
0 2
1 3
2 5
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
CMPT771 Network Layer 115
Chord Routing
Succ. Table
i
i id+2
0 1
1 2
2 4
Upon receiving a query for item
id, a node:
Checks whether stores the item
locally
If not, forwards the query to
the largest node in its successor
table that does not exceed id
Items
7
succ
1
2
6
0
Succ. Table
1
7
i
i id+2
0 2
1 3
2 5
query(7)
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
CMPT771 Network Layer 116
Chord Summary
Routing table size?
Log N fingers
Routing time?
Each hop expects to 1/2 the distance to the desired id =>
expect O(log N) hops.
CMPT771 Network Layer 117
CAN (Content Addressable Network)
Hyper-cube space
hash value is viewed as a point
in a D-dimensional cartesian
space
each node responsible for a
D-dimensional “cube” in the
space
The space is covered by all
the nodes
Example:
Initial node n1:(1, 2)
7
6
5
4
3
n1
2
1
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 118
CAN Illustration: 2-D
node n2:(4, 2) joins
7
6
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 119
CAN Illustration: 2-D
node n3:(3, 5) joins。
7
6
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 120
CAN Illustration: 2-D
node n4:(5, 5) and
n5:(6,6) join
7
6
n5
n4
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 121
CAN Illustration: 2-D
nodes: n1:(1, 2); n2:(4,2);
n3:(3, 5); n4:(5,5);n5:(6,6)
Data (key): f1:(2,3); f2:(5,0);
f3:(2,1); f4:(7,5)
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 122
CAN Illustration: 2-D
Association
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 123
CAN Illustration: 2-D
A node knows its neighbors
Forward query to the
neighbor that is nearest to
key
Example: n1 want to find f4
Average length (d/4)(n1/d),
and each node maintains
O(2d) neighboring info
Multiple paths -> reliable
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 124
DHT: Discussion
Pros:
Guaranteed Lookup
O(log N) per node state and search scope
Cons:
No one uses them?
Supporting non-exact match search is hard
Topology-awareness ?
CMPT771 Network Layer 125
Outline
1. Network Layer Service Models
2. Routing Principles
Link state routing
Distance vector routing
Hierarchical routing
3. Multicast Routing
4. Video Multicast
5. Proxy Caching
6. Peer-to-Peer
7. Internet QoS
Scheduling and policing
IntServ/DiffServ
CMPT771 Network Layer 126
Beyond Best Effort
Internet service model
Best-effort (or least-effort)
Guarantee only one thing (sometimes even cannot)
-- delivery a packet to destination
Thus far: “making the best of best effort”
TCP
Multicast
Optimal adaptation
Proxy filtering/caching
Content distribution (replication)
P2P …
But, many things cannot be guaranteed in transport/application
layer if network layer does not guarantee them
Delay, bandwidth – important to media applications
CMPT771 Network Layer 127
Improving QOS in IP Networks
Future: next generation Internet with QoS guarantees
RSVP: signaling for resource reservations
Differentiated Services: differential guarantees
Integrated Services: firm guarantees
simple model
for sharing and
congestion
studies:
CMPT771 Network Layer 128
Principles for QOS Guarantees
Example: 1Mbps IP phone, FTP share 1.5 Mbps link.
bursts of FTP can congest router, cause audio loss
want to give priority to audio over FTP
Principle 1
packet marking needed for router to distinguish
between different classes; and new router policy
to treat packets accordingly
CMPT771 Network Layer 129
Principles for QOS Guarantees (more)
what if applications misbehave (audio sends higher
than declared rate)
policing: force source adherence to bandwidth allocations
marking and policing at network edge:
similar to ATM UNI (User Network Interface)
Principle 2
provide protection (isolation) for one class from others
CMPT771 Network Layer 130
Principles for QOS Guarantees (more)
Allocating fixed (non-sharable) bandwidth to flow:
inefficient use of bandwidth if flows doesn’t use
its allocation
Principle 3
While providing isolation, it is desirable to use
resources as efficiently as possible
CMPT771 Network Layer 131
Principles for QOS Guarantees (more)
Basic fact of life: can not support traffic demands
beyond link capacity
Principle 4
Call Admission: flow declares its needs, network may
block call (e.g., busy signal) if it cannot meet needs
CMPT771 Network Layer 132
Summary of QoS Principles
Let’s next look at mechanisms for achieving this ….
CMPT771 Network Layer 133
Scheduling Polices (1)
scheduling: choose next packet to send on link
FIFO (first in first out) scheduling: send in order of
arrival to queue
discard policy: if packet arrives to full queue: who to discard?
• Tail drop: drop arriving packet
• priority: drop/remove on priority basis
• random: drop/remove randomly
CMPT771 Network Layer 134
Work-Conserving vs. Non-Work-Conserving
Work conserving
Idle only if queue(s) is empty
Conservation law
å
r iqi = C
Utility*mean waiting time
Do we need non-work-conserving ?
Jitter
Buffer
?
CMPT771 Network Layer 135
Scheduling Policies (2)
Priority scheduling: transmit highest priority queued
packet
multiple classes, with different priorities
class may depend on marking or other header info, e.g. IP
source/dest, port numbers, etc..
CMPT771 Network Layer 136
Scheduling Policies (3)
round robin scheduling:
multiple classes
cyclically scan class queues, serving one from each
class (if available)
CMPT771 Network Layer 137
Weighted round robin
Normalize the weights so that they become
integer
WA=1.4 WB=0.2 WC=0.8
Normalize
WA=7
WB=1
WC=4
…
round length = 13
CMPT771 Network Layer 138
Weighted RR –
variable length packet
If different connection have different packet size, then
divide the weight of each connection with connection’s
mean packet size before normalizing
weights {0.5, 0.75, 1.0},
mean packet sizes {50, 500, 1500}
normalize weights: {0.5/50, 0.75/500, 1.0/1500} = { 0.01, 0.0015,
0.000666},
normalize again {60, 9, 4}
CMPT771 Network Layer 139
Scheduling Policies (4)
Generalized Processor Sharing
Round-robin, sounds good
Is it fair ? Only if packets are of same size
Does it guarantee bandwidth (in a small time scale) ?
Fairness, protection.
GPS can provide fairness and protection
Visit each non-empty queue in turn, serve infinitesimal data
(1 bit)
GPS is not implementable; we can serve only packets
CMPT771 Network Layer 140
Weighted Fair Queueing (WFQ)
Deals better with variable size packets and
weights
Also known as packet-by-packet GPS (PGPS)
Find finish time of a packet, had we been
doing GPS; serve packets in order of their
finish times
CMPT771 Network Layer 141
WFQ details
Uses notions of round number and finish number
Suppose, in each round, the server served one
bit from each active connection
Round number is the number of rounds already
completed – a virtual time
can be fractional
depends on the number of active connections
CMPT771 Network Layer 142
WFQ – finish number
If a packet of length p arrives to an empty
queue when the round number is R, it will
complete service when the round number is R
+ p => finish number is R + p
independent of the number of other connections.
CMPT771 Network Layer 143
WFQ – service order
If a packet arrives to a queue, and the
previous packet has/had a finish number of f,
then the packet’s finish number is f+p
Serve packets in order of finish numbers
Finish time of a packet is not the same as the
finish number
CMPT771 Network Layer 144
Active connections
A connection is active if the last packet served from
it, or in its queue, has a finish number greater than
the current round number
finish number of packet of length p
if arriving to active connection = previous finish number + p
if to inactive connection = R + p
CMPT771 Network Layer 145
virtual time
WFQ Example
4
3.5
3
2.5
2
1.5
1
0.5
0
A
1
B
1/3
C
1/2
1
2
3
A
F1=1
B
F1=2
L=2
C
F1=2
L=2
4
5
real time
F2=3.5
L=1
fB
r=1
fC
fA = fB = fC = 1
1/3
0
fA
6
L=2
7
8
t=0: Packets of sizes
1,2,2 arrive at
connections A, B, C.
t=4: Packet of size 2
arrives at connection A
CMPT771 Network Layer 146
Example (contd.)
At time 0, slope of 1/3,
Finish number of A = 1, Finish number of B, C = 2
At time 3, connection A become inactive, slope
becomes 1/2
At time 4,
second packet at A gets finish number 2 + 1.5 = 3.5, Slope
decreases to 1/3
CMPT771 Network Layer 147
Example (contd.)
At time 5.5,
round number becomes 2 and connection B and C become
inactive, Slope becomes 1
At time 7,
round number becomes 3.5 and A becomes inactive.
CMPT771 Network Layer 148
WFQ implementation
On packet arrival:
first updates round number
compute finish number for the packet
insert in priority queue sorted by finish numbers, so that it is served
according to the finish number
if no space, drop the packet with largest finish number
On service completion
After servicing one packet, select the packet with the lowest finish
number
CMPT771 Network Layer 149
Computing the round number
re-computes round number
on each packet arrival
At any re-computation, the
number of connections can
go up at most by one, but can
go down to zero, leading to
change in round number
Solution: use previous count
of connections to compute
round number; if this makes
some connection inactive, recompute; repeat until no
connections become inactive
virtual time
Iterated deletion:
4
3.5
3
2.5
2
1.5
1
0.5
0
1
1/3
1/2
1/3
0
1
2
3
A
F1=1
B
F1=2
4/3L=2
C
F1=2
L=2
4
5
real time
F2=3.5
L=1
6
7
8
L=2
CMPT771 Network Layer 150
WFQ Evaluation
Let Pmax(i) be the largest possible packet sent on flow i, and Pmax
be the largest possible packet in the network
Let G(i, τ, t) to be the service received by flow i, using GPS, in the
interval (τ, t), and g(i) to be the smallest rate allocated to flow i
along it’s path
It can be shown that WFQ can lag GPS by at most Pmax/g(i) that is,
G(i, τ, t) /g(i) - S(i, τ, t) /g(i) <= Pmax/g(i)
G(i, τ, t) /g(i) - S(i, τ, t) /g(i) is defined as Absolute Fairness Bound
(ABS) of a scheduling policy with respect to GPS.
CMPT771 Network Layer 151
WFQ Evaluation (Contd..)
Protection
The rate provided to each flow is not affected by the sending
rate of other flows.
This firewalling property is desirable by public networks.
Incentive to intelligent flow controller
no need to send slower than allocated rate
if sent more then likely packet drop
Requires per connection scheduler state
Computing round numbers with iterated deletion is costly
Bound on end to end delay (?)
CMPT771 Network Layer 152
Policing Mechanisms
Goal: limit traffic to not exceed declared parameters
Three common-used criteria:
(Long term) Average Rate: how many pkts can be sent
per unit time (in the long run)
crucial question: what is the interval length: 100 packets per
sec or 6000 packets per min have same average!
Peak Rate: e.g., 6000 pkts per min. (ppm) avg.; 1500
ppm peak rate
(Max.) Burst Size: max. number of pkts sent
consecutively (with no intervening idle)
CMPT771 Network Layer 153
Policing Mechanisms
Token Bucket: limit input to specified Burst Size
and Average Rate.
bucket can hold b tokens
tokens generated at rate r token/sec unless bucket
full
over interval of length t: number of packets
admitted less than or equal to (r t + b).
CMPT771 Network Layer 154
Policing Mechanisms (more)
token bucket, WFQ combine to provide guaranteed
upper bound on delay, i.e., QoS guarantee!
arriving
traffic
token rate, r
bucket size, b
WFQ
per-flow
rate, R
D = b/R
max
CMPT771 Network Layer 155
IETF Integrated Services
architecture for providing QOS guarantees in IP
networks for individual application sessions
resource reservation: routers maintain state info
(a la VC) of allocated resources, QoS req’s
admit/deny new call setup requests:
Question: can newly arriving flow be admitted
with performance guarantees while not violated
QoS guarantees made to already admitted flows?
CMPT771 Network Layer 156
Intserv: QoS guarantee scenario
Resource reservation
call setup, signaling (RSVP)
traffic, QoS declaration
per-element admission control
request/
reply
QoS-sensitive
scheduling (e.g.,
WFQ)
CMPT771 Network Layer 157
Call Admission
Arriving session must :
declare its QOS requirement
R-spec: defines the QOS being requested
characterize traffic it will send into network
T-spec: defines traffic characteristics
signaling protocol: needed to carry R-spec and Tspec to routers (where reservation is required)
RSVP
CMPT771 Network Layer 158
Signaling in the Internet
connectionless
(stateless)
forwarding by IP
routers
+
best effort
service
=
no network
signaling protocols
in initial IP
design
New requirement: reserve resources along end-to-end path (end
system, routers) for QoS for multimedia applications
RSVP: Resource Reservation Protocol [RFC 2205]
“ … allow users to communicate requirements to network in
robust and efficient way.” i.e., signaling !
earlier Internet Signaling protocol: ST-II [RFC 1819]
CMPT771 Network Layer 159
IETF Differentiated Services
Concerns with Intserv:
Scalability: signaling, maintaining per-flow router
state difficult with large number of flows
Flexible Service Models: Intserv has only two
classes. Also want “qualitative” service classes
“behaves like a wire”
relative service distinction: Platinum, Gold, Silver
Diffserv approach:
simple functions in network core, relatively
complex functions at edge routers (or hosts)
Do’t define define service classes, provide
functional components to build service classes
CMPT771 Network Layer 160
Diffserv Architecture
Edge router:
r
- per-flow traffic management
- marks packets as in-profile
and out-profile
b
marking
scheduling
..
.
Core router:
- per class traffic management
- buffering and scheduling
based on marking at edge
- preference given to in-profile
packets
- Assured Forwarding
CMPT771 Network Layer 161
Edge-router Packet Marking
profile: pre-negotiated rate A, bucket size B
packet marking at edge based on per-flow profile
Rate A
B
User packets
Possible usage of marking:
class-based marking: packets of different classes marked differently
intra-class marking: conforming portion of flow marked differently than
non-conforming one
CMPT771 Network Layer 162
Classification and Conditioning
Packet is marked in the Type of Service (TOS) in
IPv4, and Traffic Class in IPv6
6 bits used for Differentiated Service Code Point
(DSCP) and determine PHB that the packet will
receive
2 bits are currently unused
CMPT771 Network Layer 163
Classification and Conditioning
may be desirable to limit traffic injection rate of
some class:
user declares traffic profile (e.g., rate, burst size)
traffic metered, shaped if non-conforming
CMPT771 Network Layer 164
Forwarding (PHB)
PHB result in a different observable (measurable)
forwarding performance behavior
PHB does not specify what mechanisms to use to
ensure required PHB performance behavior
Examples:
Class A gets x% of outgoing link bandwidth over time
intervals of a specified length
Class A packets leave first before packets from class B
CMPT771 Network Layer 165
Randomization in Router Queue Management
normally, packets dropped only when queue overflows
“Drop-tail” queueing
P6 P5 P4 P3 P2 P1
ISP
router
Internet
FCFS
Scheduler
ISP
router
CMPT771 Network Layer 166
The case against drop-tail queue management
P6 P5 P4 P3 P2 P1
FCFS
Scheduler
large queues in routers are “a bad thing”
End-to-end latency dominated by length of queues at
switches in network
allowing queues to overflow is “a bad thing”
connections transmitting at high rates can starve
connections transmitting at low rates
connections can synchronize their response to congestion
CMPT771 Network Layer 167
Idea: early random packet drop
P6 P5 P4 P3 P2 P1
FCFS
Scheduler
When queue length exceeds threshold, packets
dropped with fixed probability
probabilistic packet drop: flows see same loss rate
problem: bursty traffic (burst arrives when queue is near
full) can be overpenalized
CMPT771 Network Layer 168
Random early detection (RED) packet drop
Average queue length
Max
queue length
Drop probability
Forced drop
Max
threshold
Probabilistic
early drop
Min
threshold
No drop
Time
use exponential average of queue length to determine when to
drop
avoid overly penalizing short-term bursts
React to longer term trends
tie drop prob. to weighted avg. queue length
avoids over-reaction to mild overload conditions
CMPT771 Network Layer 169
Random early detection (RED) packet drop
Average queue length
Max
queue length
Drop probability
Forced drop
Max
threshold
Probabilistic
early drop
Min
threshold
No drop
Time
Drop probability
100%
maxp
min
max
Weighted Average
Queue Length
CMPT771 Network Layer 170
Random early detection (RED) packet drop
large number (5) of parameters: difficult to tune
(at least for http traffic)
gains over drop-tail FCFS not that significant
still not widely deployed …
CMPT771 Network Layer 171
RED: why probabilistic drop?
provide gentle transition from no-drop to all-drop
provide “gentle” early warning
provide same loss rate to all sessions:
with tail-drop, low-sending-rate sessions can be
completely starved
avoid synchronized loss bursts among sources
avoid cycles of large-loss followed by no-transmission
WRED (Cisco)
CMPT771 Network Layer 172