Network Layer Issues - SFU computer science

Download Report

Transcript Network Layer Issues - SFU computer science

CMPT 771 Internet Architecture and
Protocols
Network Layer Issues
and Applications
CMPT771 Network Layer 1
Network Layer Issues and Applications
Contents
1. Network Service Models
2. Routing Principles



3.
4.
5.
6.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 2
Network layer functions
 deliver packets from sending to
receiving hosts
 network layer protocols in every
host, router
three important functions:
 path determination: route taken
by packets from source to dest.
Routing algorithms
 forwarding: move packets from
router’s input to appropriate
router output
 call setup: some network
architectures require router call
setup along path before data
flows
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
CMPT771 Network Layer 3
Network service model
Q: What service model for
“channel” transporting
packets from sender to
receiver?
 guaranteed bandwidth?
 preservation of inter-packet
timing (no jitter)?
 loss-free delivery?
 in-order delivery?
 congestion feedback to sender?
The most important
abstraction provided
by network layer:
? ?
?
virtual circuit
or
datagram?
CMPT771 Network Layer 4
Virtual circuits
“source-to-dest path behaves much like telephone circuit”


performance-wise
network actions along source-to-dest path
application
transport 5. Data flow begins
network 4. Call connected
data link 1. Initiate call
physical
6. Receive data application
3. Accept call transport
2. incoming call network
data link
physical
CMPT771 Network Layer 5
Virtual circuits
 call setup, teardown for each call before data can flow
 each packet carries VC identifier (not destination host ID)
 every router on source-dest path maintains “state” for each
passing connection

transport-layer connection only involved two end systems
 link, router resources (bandwidth, buffers) may be allocated to
VC

to get circuit-like perf.
 used to setup, maintain teardown VC
 used in ATM, frame-relay, X.25
 not used in today’s Internet
CMPT771 Network Layer 6
Datagram networks: Internet’s model
 no call setup at network layer
 routers: no state about end-to-end connections

no network-level concept of “connection”
 Forwarded: using destination host address

packets between same source-dest pair may take different
paths
application
transport
network
data link 1. Send data
physical
application
transport
2. Receive data network
data link
physical
CMPT771 Network Layer 7
Datagram or VC network: why?
Asynchronous Transfer Mode
- ATM (VC)
 evolved from telephony
 human conversation:
strict timing, reliability
requirements
 need for guaranteed
service
 “dumb” end systems
 telephones
 complexity inside network

Internet (Datagram)
 data exchange among computers
“elastic” service, no strict
timing req.
 “smart” end systems
(computers)
 can adapt, perform control,
error recovery
 simple inside network,
complexity at “edge”
 heterogeneous link types
 different characteristics
 uniform service difficult

Hard state vs Soft state
CMPT771 Network Layer 8
Outline
1. Network Service Models
2. Routing Principles



3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 9
Routing
Routing protocol
5
Goal: determine a “good” path
(sequence of routers) thru
network from source to dest.
Graph abstraction for routing
algorithms:
 graph nodes are routers
 graph edges are physical
links

link cost: delay, $ cost, or
congestion level
2
A
B
2
1
D
3
C
3
1
5
F
1
E
2
 “good” path:


typically means minimum
cost path
other def’s possible
CMPT771 Network Layer 10
Routing Algorithm classification
Global or decentralized
information?
Global:
 all routers have complete
topology, link cost info
 “link state” algorithms
Decentralized:
 router knows physicallyconnected neighbors, link costs
to neighbors
 iterative process of
computation, exchange of info
with neighbors
 “distance vector” algorithms
Static or dynamic?
Static:
 routes change slowly over
time
Dynamic:
 routes change more quickly
 periodic update
 in response to link cost
changes
CMPT771 Network Layer 11
Link-State Routing Algorithm
Dijkstra’s algorithm
 net topology, link costs known
to all nodes
 accomplished via “link
state broadcast”
 all nodes have same info
 computes least cost paths
from one node (‘source”) to all
other nodes
 gives routing table for
that node
Notation:
 c(i,j): link cost from node i to
j. cost infinite if not direct
neighbors
 D(v): current value of cost of
path from source to dest V
 p(v): predecessor node along
path from source to v, that is
next v
 N: set of nodes whose least
cost path definitively known
CMPT771 Network Layer 12
Distance Vector Routing Algorithm
Key Idea
 Given my distance to a neighboring
node
 Given the distances from the
neighboring nodes to remote nodes
 My distances to remote nodes
iterative:
 continues until no nodes
exchange info.
 self-terminating: no “signal”
to stop
asynchronous:
 nodes need not exchange
info/iterate in lock step!
distributed:
 each node communicates only
with directly-attached
neighbors
CMPT771 Network Layer 13
Distance Vector Routing Algorithm
Distance Table data structure
 each node has its own
 row for each possible destination
 column for each directly-attached
neighbor to node
 example: in node X, for dest. Y via
neighbor Z:
X
D (Y,Z)
via
X
D ()
Y
Z
Y
1
2
Z
7
5
distance from X to
= Y, via Z as next hop
Z
= c(X,Z) + minw{D (Y,w)}
CMPT771 Network Layer 14
Distance Vector Routing: overview
Iterative, asynchronous: each
local iteration caused by:
 message from neighbor: its
least cost path change from
neighbor
Distributed:
 each node notifies neighbors
only when its least cost path
to any destination changes

neighbors then notify their
neighbors if necessary
Each node:
wait for (msg from neighbor)
recompute distance table
if least cost path to any dest
has changed, notify
neighbors
CMPT771 Network Layer 15
Distance Vector: Count-to-Infinity Problem
3
A
1
1
B
1
C
CMPT771 Network Layer 16
Comparison of LS and DV algorithms
Message complexity
 LS: with n nodes, E links, O(nE)
msgs sent each
 DV: exchange between
neighbors only
 convergence time varies
Speed of Convergence
 LS: O(n2) algorithm requires
O(nE) msgs
 may have oscillations
 DV: convergence time varies
 may be routing loops
 count-to-infinity problem
Robustness: what happens if
router malfunctions?
LS:


node can advertise incorrect
link cost
each node computes only its
own table
DV:


DV node can advertise
incorrect path cost
each node’s table used by
others
• error propagate thru
network
CMPT771 Network Layer 17
Routing in the Internet
 The Global Internet consists of Autonomous Systems (AS)
interconnected with each other:



Stub AS: small corporation: one connection to other AS’s
Multihomed AS: large corporation (no transit): multiple connections
to other AS’s
Transit AS: provider, hooking many AS’s together
 Two-level routing:


Intra-AS: administrator responsible for choice of routing
algorithm within network
Inter-AS: unique standard for inter-AS routing: BGP
CMPT771 Network Layer 18
Internet AS Hierarchy
Intra-AS border (exterior gateway) routers
Inter-AS interior (gateway) routers
CMPT771 Network Layer 19
Topology of Tier-1 ISP
CMPT771 Network Layer 20
Redundancy in AS Routes
CMPT771 Network Layer 21
Intra-AS Routing
 Also known as Interior Gateway Protocols (IGP)
 Most common Intra-AS routing protocols:

RIP: Routing Information Protocol

OSPF: Open Shortest Path First

IGRP: Interior Gateway Routing Protocol (Cisco
proprietary)
CMPT771 Network Layer 22
Internet inter-AS routing: BGP
 BGP (Border Gateway Protocol): the de facto standard
 Path Vector protocol:




similar to Distance Vector protocol
each Border Gateway broadcast to neighbors (peers)
entire path (i.e., sequence of AS’s) to destination
BGP routes to networks (ASs), not individual hosts
E.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z
CMPT771 Network Layer 23
Why different Intra-/Inter-AS routing ?
C. Huitema, Routing on the Internet. Prentice-Hall,
2000.
Telus
Shaw
Bell
CMPT771 Network Layer 24
Outline
1. Network Service Models
2. Routing Principles



3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 25
Multicast: one sender to many receivers
 Multicast: act of sending datagram to multiple receivers with
single “transmit” operation
 analogy: one teacher to many students
 Question: how to achieve multicast (which layer ?)
Multicast via unicast
 source sends N unicast
datagrams, one
addressed to each of N
receivers
routers
forward unicast
datagrams
multicast receiver (red)
not a multicast receiver (red)
CMPT771 Network Layer 26
Multicast: one sender to many receivers
 Multicast: act of sending datagram to multiple receivers with
single “transmit” operation
 analogy: one teacher to many students
 Question: how to achieve multicast
Multicast via unicast:
Problems
1. Inefficient
2. Addressing
routers
forward unicast
datagrams
multicast receiver (red)
not a multicast receiver (red)
CMPT771 Network Layer 27
Multicast: one sender to many receivers
 Multicast: act of sending datagram to multiple receivers with
single “transmit” operation
 analogy: one teacher to many students
 Question: how to achieve multicast
Network multicast
 Router actively participate in
multicast, making copies of
packets as needed and
forwarding towards multicast
receivers
Multicast
routers (red) duplicate and
forward multicast datagrams
CMPT771 Network Layer 28
Multicast: one sender to many receivers
 Multicast: act of sending datagram to multiple receivers with
single “transmit” operation
 analogy: one teacher to many students
 Question: how to achieve multicast
Application-layer multicast
 end systems involved in
multicast copy and forward
unicast datagrams among
themselves
CMPT771 Network Layer 29
Multicast Routing: Problem Statement
 Goal: find a tree (or trees) connecting routers having
local mcast group members



tree: not all paths between routers used
source-based: different tree from each sender to rcvrs
shared-tree: same tree used by all group members (senders)
Shared tree
Source-based trees
CMPT771 Network Layer 30
Approaches for building mcast trees
Approaches:
 source-based tree: one tree per source


shortest path trees
reverse path forwarding
 group-shared tree: group uses one tree
 minimal spanning (Steiner)
 center-based trees
…we first look at basic approaches, then specific
protocols adopting these approaches
CMPT771 Network Layer 31
Source based Tree:
Shortest Path Tree
 mcast forwarding tree: tree of shortest path
routes from source to all receivers

Dijkstra’s algorithm
S: source
LEGEND
R1
1
2
R4
R2
3
R3
router with attached
group member
5
4
R6
router with no attached
group member
R5
6
R7
i
link used for forwarding,
i indicates order link
added by algorithm
CMPT771 Network Layer 32
Source based Tree:
Flooding
S: source
R1
R4
R2
R5
R3
R6
R7
• Problem: Broadcast storm
CMPT771 Network Layer 33
Source based Tree:
Reverse Path Forwarding
 rely on router’s knowledge of unicast
shortest path from it to sender
 each router has simple forwarding behavior:
if (mcast datagram received on incoming link on
shortest path back to center)
then flood datagram onto all outgoing links
else ignore datagram
CMPT771 Network Layer 34
Reverse Path Forwarding: example
S: source
LEGEND
R1
R4
router with attached
group member
R2
R5
R3
R6
R7
router with no attached
group member
datagram will be
forwarded
datagram will not be
forwarded
• result is a source-specific reverse SPT
– may be a bad choice with asymmetric links
CMPT771 Network Layer 35
Reverse Path Forwarding: pruning
 forwarding tree contains subtrees with no mcast group
members
 no need to forward datagrams down subtree
 “prune” msgs sent upstream by router with no downstream
group members
LEGEND
S: source
R1
router with attached
group member
R4
R2
P
R5
R3
R6
P
R7
P
router with no attached
group member
prune message
links with multicast
forwarding
CMPT771 Network Layer 36
Reverse Path Forwarding:
Multiple trees for multi-sender
S: source
R1
R1
R4
R4
R2
R2
R5
R5
R3
R6
R7
R3
R6
R7
S: source
CMPT771 Network Layer 37
Shared-Tree: General Problem
 Minimum Spanning Tree: minimum cost tree
connecting all routers with attached group
members

Algorithms ?
 Steiner Tree : minimum cost tree connecting a set
of routers, which includes all that with attached
group members
CMPT771 Network Layer 38
Shared-Tree: General Problem
 Minimum Spanning Tree: minimum cost tree
connecting all routers with attached group
members

Prim, Kurskal algorithms
 Steiner Tree : minimum cost tree connecting a set
of routers, which includes all that with attached
group members
CMPT771 Network Layer 39
Spanning Tree vs Steiner Tree
CMPT771 Network Layer 40
An example of a minimum Steiner Tree
5
4
K
F
1
3
AG = 6
AE = 5
AD = 3
AC = 3
AI = 8
Shortest Paths from A
total distribution cost = 16
6
Relay Nodes
5
2
3*
1
D
A
E
C
4
I
1
2
5
J
2
B
Mcast group members
4
H
1
G
AG = 8
AE = 5
AD = 3
AC = 3
AI = 8
Steiner Tree paths from A
total distribution cost=13
CMPT771 Network Layer 41
Shared-Tree: General Problem
 Minimum Spanning Tree: minimum cost tree connecting all
routers with attached group members

Prim, Kurskal algorithms
 Minimum Steiner Tree : minimum cost tree connecting a set
of routers, which includes all that with attached group
members


problem is NP-complete
excellent heuristics exists
CMPT771 Network Layer 42
KMB - A Fast Algorithm for Steiner Trees
(Acta Informatica)
 Output a Stiener Tree Th for G and the Z-vertices.





Step
1: Construct a complete directed distance graph G1=(V1,E1,c1) from
G and Z.
Step 2: Find the minimum spanning tree T1 of G1. (pick any to break
ties)
Step3: Construct a subgraph GS of G by replacing each edge in T1 by its
corresponding shortest path in G. (break ties arbitrarily).
Step 4: Find the minimum spanning tree TS of GS (break ties
arbitrarily).
Step 5: Construct a Steiner tree TH from TS by deleting edges in TS if
necessary, so that all the leaves in TH are Steiner points.
 Worst case time complexity O(|S||V|2).
 Cost no more than 2(1 - 1/l) *optimal cost
where l = number of leaves in the steiner tree.
CMPT771 Network Layer 43
Working of KMB
A
A
4
1
10
H
I
1/2
1/2
G
1
1
1
B
8
1
F
2
C
9
E
2
D
B
D
4
4
4
4
A
1
4
H
C
I
1/2
1/2
1
G
A
4
1
D
B
1
4
B
1
F
2
4
C
E
2
D
C
CMPT771 Network Layer 44
Working of KMB
A
A
1
1
H
1/2
1/2
G
B
1
1
F
2
C
I
I
1
1
E
B
1
1
F
2
2
C
D
E
2
D
Destination Nodes
Intermediate Nodes
CMPT771 Network Layer 45
From Theory to Practice (2)
 Does the heuristic algorithm(s) work ?
 Demands


Distributed implementation
Support reliable transmission
– link failure should not increase delay or reduce resource availability

Return optimal routes taking into consideration
• price to be paid (bandwidth consumed)
• end to end delay. (no. of links traversed)

Minimize network load.
– Avoid loops
– Avoid traffic concentration on a few links or sub-nets

Minimize the state stored in routers

Asymmetric links
 Steiner tree is for undirectional
CMPT771 Network Layer 46
From Theory to Practice (2)
 Performance metrics (costs?)
 Delay
• End to end delay between source and receiver relative to
the shortest unicast path delay.

Cost
• Cost of total bandwidth consumption
• Cost of tree state info

Traffic Concentration
• Maximum number of flows on a unidirectional link.
• How evenly the routes are distributed.
CMPT771 Network Layer 47
Internet Multicast Service Model
128.59.16.12
128.119.40.186
multicast
group
226.17.30.197
128.34.108.63
128.34.108.60
multicast group concept: use of indirection
 hosts addresses IP datagram to multicast group
 routers forward multicast datagrams to hosts that have
“joined” that multicast group
 Many-to-many communications
CMPT771 Network Layer 48
Multicast groups
 class D Internet addresses reserved for multicast:
 host group semantics:
o anyone can “join” (receive) multicast group
o anyone can send to multicast group
o no network-layer identification to hosts of members
 needed: infrastructure to deliver mcast-addressed datagrams to
all hosts that have joined that multicast group
CMPT771 Network Layer 49
Joining a mcast group: two-step process
 local: host informs local mcast router of desire to join group:
IGMP (Internet Group Management Protocol)
 wide area: local router interacts with other routers to receive
mcast datagram flow
 many protocols (e.g., DVMRP, MOSPF, PIM)
IGMP
IGMP
wide-area
multicast
routing
IGMP
CMPT771 Network Layer 50
IGMP: Internet Group Management Protocol
 host: sends IGMP report when application joins mcast group
IP_ADD_MEMBERSHIP socket option
 host need not explicitly “unjoin” group when leaving
 router: sends IGMP query at regular intervals
 host belonging to a mcast group must reply to query

query
report
CMPT771 Network Layer 51
IGMP
IGMP version 1
 router: Host Membership
Query msg broadcast on
LAN to all hosts
 host: Host Membership
Report msg to indicate
group membership


randomized delay before
responding
implicit leave via no reply
to Query
 RFC 1112
IGMP v2: additions include
 Leave Group msg



last host replying to Query
can send explicit Leave Group
msg
router performs groupspecific query to see if any
hosts left in group
RFC 2236
IGMP v3: under development as
Internet draft
CMPT771 Network Layer 52
Tunneling
Q: How to connect “islands” of multicast routers in a
“sea” of unicast routers?
physical topology
logical topology
 mcast datagram encapsulated inside “normal” (non-multicast-
addressed) datagram
 normal IP datagram sent thru “tunnel” via regular IP unicast to
receiving mcast router
 receiving mcast router unencapsulates to get mcast datagram
CMPT771 Network Layer 53
Outline
1. Network Service Models
2. Routing Principles



3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 54
Video Multicast
 Why attractive ?

Bandwidth efficiency

Multi-receiver nature of video programs
 What is the support ?

IP multicast
– network layer

RTP protocol
– transport layer

But, generic interfaces only …
CMPT771 Network Layer 55
Challenges: Best-Effort Service
 IP multicast is best-effort only

No quality-of-service guarantee

What if non-adaptive ?
- Unfair to adaptive traffic  starve TCP
- Lead to congestion  poor user experience
 Solutions

Quality-of-service deployment
- IntServ (RSVP), DiffServ, QoS routing
- Long-term objective, will be addressed later

Adaptive Multicast
CMPT771 Network Layer 56
Challenges: Heterogeneity
 Receivers

Bandwidth
- 56K Modem, 1.5 ADSL,10M/100M Ethernet …

Computation power
- PDA, Laptop, Workstation …
 Sessions

Uneven session populations
CMPT771 Network Layer 57
Not an Atypical Network
CMPT771 Network Layer 58
Challenges: Heterogeneity
 Receivers

Bandwidth
- 56K Modem, 1.5 ADSL,10M/100M Ethernet …

Computation power
- PDA, Laptop, Workstation …
 Sessions

Uneven session populations
 How can a single rate satisfy everyone simultaneous ?
 Solution

Multi-rate Multicast
CMPT771 Network Layer 59
Solution 1: Simulcast
 Replicated streams


Same video program
Different rates
 Each stream for a cluster of receivers


Same access technology
Same bottleneck
1.5 Mbps encoding
28.8 Kbps encoding
CMPT771 Network Layer 60
Solution 1: Simulcast
 Pros


Simplicity
Compatibility
Redundancy
 Cons

Redundancy
Mismatch
Number of Streams
Single-rate Multicast
Simulcast
Multiple-unicast
CMPT771 Network Layer 61
Solution 2: Layered Multicast
 Cumulative layered coding
(Scalable coding)

Base layer
- Most important feature
- Low rate, low quality

Enhancement layers
- Progressively refine the quality
 Receiver-driven
Adaptation


Layer  Multicast group
Congestion ?
- No: Join group
- Yes: Leave group
CMPT771 Network Layer 62
Solution 3: Proxy-based Video Filtering
 Multiple video proxies (agents, gateways,..)
 Each proxy for a confined region


Bandwidth monitoring
Video filtering
- Frame dropping, transcoding …
 Pros
Fine-grained rate control
 Work with no IP multicast
 Cons
 Expensive
 High computation cost

Sender
Receiver
Router
Proxy
CMPT771 Network Layer 63
Active Node (Proxy) vs. Network Node
Application Level
Rate
Controller
Video
Transcoder
RTP Layer
Network Level
Packet
Scheduler
QoS
Monitor
IP Layer
IGMP
RTCP
Upstream
Node
Downstream
Node
UDP Layer
(b)
Packet
Scheduler
IP Layer
IGMP
Network Level
Upstream
Node
(a)
Downstream
Node
CMPT771 Network Layer 64
Taxonomy
Unicast
Video
Transmission
Single-Rate
Simulcasting
Multicast
Cumulative
Multi-Rate
Layering
Noncumulative
Proxy Filtering
CMPT771 Network Layer 65
Example: Receiver-Driven Layered Multicast
Server sends layered encoded stream where each layer is sent
to a separate multicast group
 Receivers can control delivered quality by joining proper
number of multicast groups. i.e. controls no of delivered layers



Goal: to accommodate receiver bw heterogeneity!!
Basic idea:


When no cong. => receiver adds a new layer
When cong occurs => receiver drops top layer
CMPT771 Network Layer 66
RLM: example
Layer #
4
3
2
1
Time
CMPT771 Network Layer 67
Issues for adding layers
 When?
 Sustained performance with little to no loss.
 Hysterics problems
 When available bandwidth is between the rates of two
different layers.
 Probing by new members.
 Join latency
 Takes a while for multicast joins to occur
CMPT771 Network Layer 68
Issues for dropping layers.
 When?
 Usually as a reaction to congestion.
 Co-located receiver must drop the top layer at
about the same time!
 New members probing for appropriate layer will
cause “false” congestion.
 Leave latency

Similar to the problem with join latency.
CMPT771 Network Layer 69
Join Experiments
 Receivers use “join-experiments” to determine if a
layer should be added.

A join timer is associated with each layer
• Initially, timers are fairly short.


If timer goes off and no congestion is experienced, then
next level joined.
If congestion is experienced, then level is dropped and join
timer is exponentially backed off.
 Receiver learns from past experiences
CMPT771 Network Layer 70
RLM Problems (1)
 Scaling issues.
 Too many participants attempting join-experiments leads
to chronic congestion.
 Possible fix?
 Coordinate join experiments with others.
 Shared learning
 Still problematic
• Want to limit shared learning to topologically close
neighbors.
• Complexity of control mechanisms increases.
CMPT771 Network Layer 71
RLM Problems (2)
 Bandwidth adaptation

How to allocate among sessions
- Uniform ? But session population is uneven

How to allocate among layers
- Heterogeneous bandwidth demands from receivers
- Non-linear video utility
- Num of layers
 Objective: Better Fairness

Intra-session fairness
• Fairly treat all the users
• Ideal: each user receives video at a rate commensurate
with its own bandwidth demand
CMPT771 Network Layer 72
Control Bandwidth Scaling
Case Study: RTCP for Multicast
- To limit traffic, each participant reduces his RTCP traffic as the number
of conference participants increases.
CMPT771 Network Layer 73
RTCP Bandwidth Scaling
 RTCP attempts to limit its
traffic to 5% of the
session bandwidth.
Example
 Suppose one sender,
sending video at a rate of 2
Mbps. Then RTCP attempts
to limit its traffic to 100
Kbps.
 RTCP gives 75% of this
rate to the receivers;
remaining 25% to the
sender
 The 75 kbps is equally shared
among receivers:

With R receivers, each
receiver gets to send RTCP
traffic at 75/R kbps.
 Sender gets to send RTCP
traffic at 25 kbps.
 Participant determines RTCP
packet transmission period by
calculating avg RTCP packet
size (across the entire
session) and dividing by
allocated rate.
CMPT771 Network Layer 74
Randomization in Feedback
 Case Study: Reliable Multicast (RM)

how to transfer data “reliably” from source to receivers ?

ACK/NAK ?

Recall
• TCP/UDP vs Multicast
CMPT771 Network Layer 75
Randomized Feedback Suppression
 randomly delay NAK generation


“listen” to NAKs generated by others
if no NAK for lost pkt when delay
expires, transmit NAK
 tradeoffs:



more complexity (backoff timer), more
pkts (data, NAKs) at rcvs
reduced feedback at sender
what’s the optimal delay distribution?
CMPT771 Network Layer 76
Local Recovery
 Allow rcvr to recover lost pkt  orthogonal (complementary)
from “nearby” rcvr
to feedback supression
 “ask your neighbor”: send  who to recover from?
localized NAK (repair
 don’t want repair request
request)
to go to everyone
 multicast: randomize local
 scoping: how to restrict
repair transmission time to
how far request will
avoid too many replies
travel: IP time-to-live
field
CMPT771 Network Layer 77
Local Recovery: example
 R2 detects lost pkt
 waits random amount of
time, multicasts repair
request
 limited scope

not seen by R4
 R1 and R3 have pkt

R3 times out first and
multicasts repair, with
limited scope
 Other possibilities

Feedback aggregation
CMPT771 Network Layer 78
Outline
1. Network Layer Service Models
2. Routing Principles



3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 79
More about Proxy: Caching
Goal: satisfy client request without involving origin server
 user sets browser: Web
accesses via proxy
 browser sends all HTTP
requests to proxy


object in cache: cache
returns object
else cache requests object
from origin server, then
returns object to client
origin
server
client
Proxy
server
 Proxy: both client and server;
typically installed by ISP
(university, company,
residential ISP)
client
origin
server
CMPT771 Network Layer 80
Caching example (1)
Assumptions
 average object size = 100,000
bits
 avg. request rate from
institution’s browser to origin
serves = 15/sec
 delay from institutional router to
any origin server and back to
router = 2 sec
Consequences
utilization on LAN = 15%
 utilization on access link = 100%
 total delay = Internet delay + access
delay + LAN delay
= 2 sec + minutes + milliseconds

origin
servers
public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
institutional
cache
CMPT771 Network Layer 81
Caching example (2)
Possible solution
 increase bandwidth of access link
to, say, 10 Mbps
origin
servers
public
Internet
Consequences (if 10 Mbps)



=

utilization on LAN = 15%
utilization on access link = 15%
Total delay = Internet delay + access institutional
delay + LAN delay
network
2 sec + msecs + msecs
often a costly upgrade
10 Mbps
access link
10 Mbps LAN
institutional
cache
CMPT771 Network Layer 82
Caching example (3)
origin
servers
Install cache

suppose hit rate is .4
Consequence




=
40% requests will be satisfied
almost immediately
60% requests satisfied by origin
server
utilization of access link reduced to
60%, resulting in negligible delays
(say 10 msec)
total delay = Internet delay +
access delay + LAN delay
.6*2 sec + .4*.01 secs + milliseconds
< 1.3 secs
public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
institutional
cache
CMPT771 Network Layer 83
Hierarchical cache
origin
server
 How to measure cache ?

Hit Ratio (HR)
 Calculation




HR of Proxy 1 = 90%
HR of Proxy 2 = 95%
Independent
client
Joint HR
= 1-(1-0.9)*(1-0.95)*100%
= 99.5%
Proxy
Server 1
Proxy
Server 2
client
CMPT771 Network Layer 84
Why Web caching ?
 Reduce response time for client request.
 Reduce traffic on an institution’s access link.
 Avoid overloading origin server
 Internet dense with caches enables “poor” content
providers to effectively deliver content

Enhance data availability (Related: Google)
CMPT771 Network Layer 85
More about Web caching
Why caching is effective for Web, even if
cache space is quite limited ?
Access Probability
 Zipf distribution
pj 
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
(1 / j )

|S |
8
j 1
(1 / j )
9

,
j  1,2,..., | S |,
10
Obejct ID
CMPT771 Network Layer 86
Video Caching
 Proxy caching
saving objects at proxies close to clients



Temporal locality
Geographical locality
Skewed access rates
• Zipf distribution, too
• Theta = 0.271
Proxy
Backbone Network
Video Server
Proxy
CMPT771 Network Layer 87
Video objects vs. Web objects
 High data rate, yet adaptive
 Long playback duration
► Various interactions:
• random access
• early termination
► Huge volume
• one-hour MPEG-1, about 675 MB
 Semi-static
 Low validation cost (consistency)
CMPT771 Network Layer 88
Revisit: Consistency of Web Objects
client
 Inconsistency
HTTP request msg
 out-of-date objects…
If-modified-since:
 News, stock websites
<date>
 Solution: Conditional GET
HTTP response
server
object
not
modified
HTTP/1.0
304 Not Modified
HTTP request msg
If-modified-since:
<date>
HTTP response
object
modified
HTTP/1.0 200 OK
<data>
CMPT771 Network Layer 89
Solution: Partial Video Caching
 Consistency is not a big issue for video caching
 But data volume is …
Client
Control Channel
(RTSP, RTCP)
Media Proxy
Control Channel
(RTSP,RTCP)
Proxy Manager
Enterprise
Network
Buffer
Data Channel
(RTP)
Assembler/Switcher
Player
Server
Scheduler
Backbone
Network
Cache
Media
Repository
Data Channel
(RTP)
CMPT771 Network Layer 90
Generalizing Proxy Functionalities (1):
Content Distribution Networks
Content replication
 CDN company installs hundreds
of CDN servers throughout
Internet
 in lower-tier ISPs, close to
users
 CDN replicates its customers’
content in CDN servers. When
provider updates content, CDN
updates servers
origin server
in North America
CDN distribution node
CDN server
in S. America CDN server
in Europe
CDN server
in Asia
CMPT771 Network Layer 91
CDN example
HTTP request for
www.foo.com/sports/sports.html
Origin server
1
2
3
DNS query for www.cdn.com
CDNs authoritative
DNS server
HTTP request for
www.cdn.com/www.foo.com/sports/ruth.gif
origin server
 www.foo.com
 distributes HTML

Nearby
CDN server
Replaces:
http://www.foo.com/sports.ruth.gif
with
CDN company
 cdn.com
 distributes gif files
 uses its authoritative DNS
server to route redirect
requests (CNAME)
http://www.cdn.com/www.foo.com/sports/ruth.gif
CMPT771 Network Layer 92
More about CDNs
routing requests
 CDN creates a “map”, indicating distances from leaf
ISPs and CDN nodes
 when query arrives at authoritative DNS server:


server determines ISP from which query originates
uses “map” to determine best CDN server
Caching vs. CDN
 Pull: passive
 Push: active
CMPT771 Network Layer 93
Outline
1. Network Layer Service Models
2. Routing Principles



3.
4.
5.
6.
7.
Link state routing
Distance vector routing
Hierarchical routing
Multicast Routing
Video Multicast
Proxy Caching
Peer-to-Peer
Internet QoS
CMPT771 Network Layer 94
Generalizing Proxy Functionalities (2):
Peer-to-Peer Paradigm
A peer is both client and server
Recall Client/Server paradigm
Client:
application
transport
network
data link
physical
request
 initiates contact with server
(“speaks first”)
 typically requests service from
server,
 Web: client implemented in
browser; e-mail: in mail reader
Server:
 provides requested service to client
reply
application
transport
network
data link
physical
 e.g., Web server sends requested Web
page, mail server delivers e-mail
CMPT771 Network Layer 95
Peer-to-Peer Communications
Example
 Alice runs P2P client
application on her notebook
computer
 Intermittently connects to
Internet; gets new IP
address for each
connection
 Asks for “X.mp3”
 Application displays other
peers that have copy of
X.mp3.
 Alice chooses one of the
peers, Bob.
 File is copied from Bob’s PC
to Alice’s notebook: HTTP
 While Alice downloads,
other users uploading from
Alice.
 Alice’s peer is both a Web
client and a transient Web
server.
All peers are servers = highly
scalable!
CMPT771 Network Layer 96
P2P: centralized directory
For file sharing
centralized
Key issue: File searching directory
server
- which peer has the file ?
original “Napster” design
1) when peer connects, it informs
central server:


IP address
content
2) Alice queries for “X.mp3”
3) Alice requests file from Bob
Bob
1
peers
1
3
1
2
1
Alice
CMPT771 Network Layer 97
P2P: problems with centralized directory
 Single point of failure
 Performance bottleneck
 Copyright infringement
file transfer is
decentralized, but
locating content is highly
decentralized
CMPT771 Network Layer 98
P2P: decentralized directory
 Each peer is either a group
leader or assigned to a
group leader.
 Group leader tracks the
content in all its children.
 Peer queries group leader;
group leader may query
other group leaders.
ordinary peer
group-leader peer
neighoring relationships
in overlay network
CMPT771 Network Layer 99
More about decentralized directory
advantages of approach
 no centralized directory server


location service distributed over peers
more difficult to shut down
disadvantages of approach
 bootstrap node needed
 group leaders can get overloaded
CMPT771 Network Layer 100
P2P: Query flooding
 Gnutella
 no hierarchy
 use bootstrap node to learn
about others
 join message
 Send query to neighbors
 Neighbors forward query
 If queried peer has object, it
sends message back to
querying peer
join
CMPT771 Network Layer 101
P2P: more on query flooding
Pros
 peers have similar
responsibilities: no group
leaders
 highly decentralized
 no peer maintains directory
info
Cons
 excessive query traffic
 query radius: may not have
content when present
 bootstrap node
 maintenance of overlay
network
CMPT771 Network Layer 102
DHT: A New Story…
 In 2000-2001, academic researchers said
“we want to play too!”
 Motivation:





Frustrated by popularity of all these “half-baked” P2P apps
We can do better!
Guaranteed lookup success for files in system
Provable bounds on search time
Provable scalability to millions of node
 Hot Topic in networking ever since

No practical use so far (?)
CMPT771 Network Layer 103
P2P: Content Addressing (Hash Routing)
Hash routing
 Given an object identifier I, calculate its hash value
H=hash(I), and (hopefully) find it (or its location info) in
peer H
 Not a new idea

Load balancing – hash IP address, re-direct to different
servers
application
put(key, data)
node
get (key)
hash table
node
….
data
node
CMPT771 Network Layer 104
Hash Routing
 Two alternatives


Node can cache each (existing) object that hashes within
its range
Pointer-based: level of indirection - node caches pointer to
location(s) of object
1000-1999
 What’s new in P2P?

Dynamic overlay
9000-9500
4500-6999
8000-8999
7000-8500
0-999
• peer join/leave
• number of peers is not fixed

1500-4999
9500-9999
Traditional hash function doesn’t work
• SHA-1
CMPT771 Network Layer 105
Distributed Hash Table (DHT)
Challenges
 For each object, node(s) whose range(s) cover that
object must be reachable via a “short” path
 # neighbors for each node should scale well (e.g., should
not be O(N))
 Fully distributed (no centralized bottleneck/single point
of failure)
 DHT mechanism should gracefully handle nodes
joining/leaving



need to repartition the range space over existing nodes
need to reorganize neighbor set
need bootstrap mechanism to connect new nodes into the
existing DHT infrastructure
CMPT771 Network Layer 106
Case Studies
 Structure overlay (p2p) systems
 Chord
 CAN (Content Addressable Network)
 Key Questions
 Q1: How is hash space divided “evenly” among existing
nodes?
 Q2: How is routing implemented that connects an
arbitrary node to the node responsible for a given
object?
 Q3: How is the hash space repartitioned when nodes
join/leave?
 Let N be the number of nodes in the overlay
 Let H be the size of the range of the hash
function (when applicable)
CMPT771 Network Layer 107
Chord
 Associate to each node and file a unique id in an uni-
dimensional space (a Ring)
E.g., pick from the range [0...2m]
 Usually the hash of the file or IP address

 Properties:
Routing table size is O(log N) , where N is the total number of
nodes
 Guarantees that a file is found in O(log N) hops

from MIT in 2001
CMPT771 Network Layer 108
Consistent Hashing
Key 5
Node 105
K5
N105
K20
Circular ID space
N32
N90
K80
A key is stored at its successor: node with next higher ID
CMPT771 Network Layer 109
Chord Basic Lookup
N120
N10
N105
“N90 has K80”
“Where is key 80?”
N32
K80 N90
N60
CMPT771 Network Layer 110
Chord “Finger Table”
1/4
1/2
1/8
1/16
1/32
1/64
1/128
N80
 Entry i in the finger table of node n is the first node that succeeds or
equals n + 2i
 In other words, the ith finger points 1/2n-i way around the ring
CMPT771 Network Layer 111
Chord Join
 Assume an identifier space [0..8]
 Node n1 joins
Succ. Table
i id+2i succ
0 2
1
1 3
1
2 5
1
0
1
7
6
2
5
3
4
CMPT771 Network Layer 112
Chord Join
 Node n2 joins
Succ. Table
i id+2i succ
0 2
2
1 3
1
2 5
1
0
1
7
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
1
1 4
1
2 6
1
CMPT771 Network Layer 113
Chord Join
Succ. Table
i id+2i succ
0 1
1
1 2
2
2 4
6
 Nodes n0, n6 join
Succ. Table
i id+2i succ
0 2
2
1 3
6
2 5
6
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
CMPT771 Network Layer 114
Chord Join
Succ. Table
i
 Nodes:
i id+2
0 1
1 2
2 4
n1, n2, n0, n6
 Items:
f7, f2
Items
7
succ
1
2
6
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
Succ. Table
6
i
i id+2
0 2
1 3
2 5
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
CMPT771 Network Layer 115
Chord Routing
Succ. Table
i
i id+2
0 1
1 2
2 4
 Upon receiving a query for item
id, a node:
 Checks whether stores the item
locally
 If not, forwards the query to
the largest node in its successor
table that does not exceed id
Items
7
succ
1
2
6
0
Succ. Table
1
7
i
i id+2
0 2
1 3
2 5
query(7)
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
CMPT771 Network Layer 116
Chord Summary
 Routing table size?
 Log N fingers
 Routing time?
 Each hop expects to 1/2 the distance to the desired id =>
expect O(log N) hops.
CMPT771 Network Layer 117
CAN (Content Addressable Network)
Hyper-cube space
 hash value is viewed as a point
in a D-dimensional cartesian
space
 each node responsible for a
D-dimensional “cube” in the
space
 The space is covered by all
the nodes
 Example:
 Initial node n1:(1, 2)
7
6
5
4
3
n1
2
1
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 118
CAN Illustration: 2-D
 node n2:(4, 2) joins
7
6
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 119
CAN Illustration: 2-D
 node n3:(3, 5) joins。
7
6
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 120
CAN Illustration: 2-D
 node n4:(5, 5) and
n5:(6,6) join
7
6
n5
n4
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 121
CAN Illustration: 2-D
 nodes: n1:(1, 2); n2:(4,2);
n3:(3, 5); n4:(5,5);n5:(6,6)
 Data (key): f1:(2,3); f2:(5,0);
f3:(2,1); f4:(7,5)
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 122
CAN Illustration: 2-D
 Association
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 123
CAN Illustration: 2-D
 A node knows its neighbors
 Forward query to the
neighbor that is nearest to
key
 Example: n1 want to find f4
 Average length (d/4)(n1/d),
and each node maintains
O(2d) neighboring info
 Multiple paths -> reliable
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
7
CMPT771 Network Layer 124
DHT: Discussion
 Pros:
 Guaranteed Lookup
 O(log N) per node state and search scope
 Cons:
 No one uses them?
 Supporting non-exact match search is hard
 Topology-awareness ?
CMPT771 Network Layer 125
Outline
1. Network Layer Service Models
2. Routing Principles



Link state routing
Distance vector routing
Hierarchical routing
3. Multicast Routing
4. Video Multicast
5. Proxy Caching
6. Peer-to-Peer
7. Internet QoS


Scheduling and policing
IntServ/DiffServ
CMPT771 Network Layer 126
Beyond Best Effort
Internet service model
Best-effort (or least-effort)

Guarantee only one thing (sometimes even cannot)
-- delivery a packet to destination
Thus far: “making the best of best effort”






TCP
Multicast
Optimal adaptation
Proxy filtering/caching
Content distribution (replication)
P2P …
But, many things cannot be guaranteed in transport/application
layer if network layer does not guarantee them

Delay, bandwidth – important to media applications
CMPT771 Network Layer 127
Improving QOS in IP Networks
Future: next generation Internet with QoS guarantees
 RSVP: signaling for resource reservations
 Differentiated Services: differential guarantees
 Integrated Services: firm guarantees
 simple model
for sharing and
congestion
studies:
CMPT771 Network Layer 128
Principles for QOS Guarantees
 Example: 1Mbps IP phone, FTP share 1.5 Mbps link.
 bursts of FTP can congest router, cause audio loss
 want to give priority to audio over FTP
Principle 1
packet marking needed for router to distinguish
between different classes; and new router policy
to treat packets accordingly
CMPT771 Network Layer 129
Principles for QOS Guarantees (more)
 what if applications misbehave (audio sends higher
than declared rate)

policing: force source adherence to bandwidth allocations
 marking and policing at network edge:
 similar to ATM UNI (User Network Interface)
Principle 2
provide protection (isolation) for one class from others
CMPT771 Network Layer 130
Principles for QOS Guarantees (more)
 Allocating fixed (non-sharable) bandwidth to flow:
inefficient use of bandwidth if flows doesn’t use
its allocation
Principle 3
While providing isolation, it is desirable to use
resources as efficiently as possible
CMPT771 Network Layer 131
Principles for QOS Guarantees (more)
 Basic fact of life: can not support traffic demands
beyond link capacity
Principle 4
Call Admission: flow declares its needs, network may
block call (e.g., busy signal) if it cannot meet needs
CMPT771 Network Layer 132
Summary of QoS Principles
Let’s next look at mechanisms for achieving this ….
CMPT771 Network Layer 133
Scheduling Polices (1)
 scheduling: choose next packet to send on link
 FIFO (first in first out) scheduling: send in order of
arrival to queue

discard policy: if packet arrives to full queue: who to discard?
• Tail drop: drop arriving packet
• priority: drop/remove on priority basis
• random: drop/remove randomly
CMPT771 Network Layer 134
Work-Conserving vs. Non-Work-Conserving
 Work conserving
 Idle only if queue(s) is empty
 Conservation law
å
r iqi = C
Utility*mean waiting time
 Do we need non-work-conserving ?
 Jitter
 Buffer
 ?
CMPT771 Network Layer 135
Scheduling Policies (2)
Priority scheduling: transmit highest priority queued
packet
 multiple classes, with different priorities

class may depend on marking or other header info, e.g. IP
source/dest, port numbers, etc..
CMPT771 Network Layer 136
Scheduling Policies (3)
round robin scheduling:
 multiple classes
 cyclically scan class queues, serving one from each
class (if available)
CMPT771 Network Layer 137
Weighted round robin
 Normalize the weights so that they become
integer
WA=1.4 WB=0.2 WC=0.8
Normalize
WA=7
WB=1
WC=4
…
round length = 13
CMPT771 Network Layer 138
Weighted RR –
variable length packet
If different connection have different packet size, then
divide the weight of each connection with connection’s
mean packet size before normalizing




weights {0.5, 0.75, 1.0},
mean packet sizes {50, 500, 1500}
normalize weights: {0.5/50, 0.75/500, 1.0/1500} = { 0.01, 0.0015,
0.000666},
normalize again {60, 9, 4}
CMPT771 Network Layer 139
Scheduling Policies (4)
Generalized Processor Sharing
 Round-robin, sounds good


Is it fair ? Only if packets are of same size
Does it guarantee bandwidth (in a small time scale) ?
 Fairness, protection.
 GPS can provide fairness and protection
 Visit each non-empty queue in turn, serve infinitesimal data
(1 bit)
 GPS is not implementable; we can serve only packets
CMPT771 Network Layer 140
Weighted Fair Queueing (WFQ)
 Deals better with variable size packets and
weights
 Also known as packet-by-packet GPS (PGPS)
 Find finish time of a packet, had we been
doing GPS; serve packets in order of their
finish times
CMPT771 Network Layer 141
WFQ details
 Uses notions of round number and finish number
 Suppose, in each round, the server served one
bit from each active connection
 Round number is the number of rounds already
completed – a virtual time


can be fractional
depends on the number of active connections
CMPT771 Network Layer 142
WFQ – finish number
 If a packet of length p arrives to an empty
queue when the round number is R, it will
complete service when the round number is R
+ p => finish number is R + p

independent of the number of other connections.
CMPT771 Network Layer 143
WFQ – service order
 If a packet arrives to a queue, and the
previous packet has/had a finish number of f,
then the packet’s finish number is f+p

Serve packets in order of finish numbers
 Finish time of a packet is not the same as the
finish number
CMPT771 Network Layer 144
Active connections
 A connection is active if the last packet served from
it, or in its queue, has a finish number greater than
the current round number
 finish number of packet of length p


if arriving to active connection = previous finish number + p
if to inactive connection = R + p
CMPT771 Network Layer 145
virtual time
WFQ Example
4
3.5
3
2.5
2
1.5
1
0.5
0
A
1
B
1/3
C
1/2
1
2
3
A
F1=1
B
F1=2
L=2
C
F1=2
L=2
4
5
real time
F2=3.5
L=1
fB
r=1
fC
fA = fB = fC = 1
1/3
0
fA
6
L=2
7
8
t=0: Packets of sizes
1,2,2 arrive at
connections A, B, C.
t=4: Packet of size 2
arrives at connection A
CMPT771 Network Layer 146
Example (contd.)
 At time 0, slope of 1/3,
 Finish number of A = 1, Finish number of B, C = 2
 At time 3, connection A become inactive, slope
becomes 1/2
 At time 4,

second packet at A gets finish number 2 + 1.5 = 3.5, Slope
decreases to 1/3
CMPT771 Network Layer 147
Example (contd.)
 At time 5.5,
 round number becomes 2 and connection B and C become
inactive, Slope becomes 1
 At time 7,
 round number becomes 3.5 and A becomes inactive.
CMPT771 Network Layer 148
WFQ implementation
 On packet arrival:




first updates round number
compute finish number for the packet
insert in priority queue sorted by finish numbers, so that it is served
according to the finish number
if no space, drop the packet with largest finish number
 On service completion

After servicing one packet, select the packet with the lowest finish
number
CMPT771 Network Layer 149
Computing the round number



re-computes round number
on each packet arrival
At any re-computation, the
number of connections can
go up at most by one, but can
go down to zero, leading to
change in round number
Solution: use previous count
of connections to compute
round number; if this makes
some connection inactive, recompute; repeat until no
connections become inactive
virtual time
 Iterated deletion:
4
3.5
3
2.5
2
1.5
1
0.5
0
1
1/3
1/2
1/3
0
1
2
3
A
F1=1
B
F1=2
4/3L=2
C
F1=2
L=2
4
5
real time
F2=3.5
L=1
6
7
8
L=2
CMPT771 Network Layer 150
WFQ Evaluation



Let Pmax(i) be the largest possible packet sent on flow i, and Pmax
be the largest possible packet in the network
Let G(i, τ, t) to be the service received by flow i, using GPS, in the
interval (τ, t), and g(i) to be the smallest rate allocated to flow i
along it’s path
It can be shown that WFQ can lag GPS by at most Pmax/g(i) that is,
G(i, τ, t) /g(i) - S(i, τ, t) /g(i) <= Pmax/g(i)
G(i, τ, t) /g(i) - S(i, τ, t) /g(i) is defined as Absolute Fairness Bound
(ABS) of a scheduling policy with respect to GPS.
CMPT771 Network Layer 151
WFQ Evaluation (Contd..)
 Protection


The rate provided to each flow is not affected by the sending
rate of other flows.
This firewalling property is desirable by public networks.
 Incentive to intelligent flow controller


no need to send slower than allocated rate
if sent more then likely packet drop
 Requires per connection scheduler state
 Computing round numbers with iterated deletion is costly
 Bound on end to end delay (?)
CMPT771 Network Layer 152
Policing Mechanisms
Goal: limit traffic to not exceed declared parameters
Three common-used criteria:
 (Long term) Average Rate: how many pkts can be sent
per unit time (in the long run)

crucial question: what is the interval length: 100 packets per
sec or 6000 packets per min have same average!
 Peak Rate: e.g., 6000 pkts per min. (ppm) avg.; 1500
ppm peak rate
 (Max.) Burst Size: max. number of pkts sent
consecutively (with no intervening idle)
CMPT771 Network Layer 153
Policing Mechanisms
Token Bucket: limit input to specified Burst Size
and Average Rate.
 bucket can hold b tokens
 tokens generated at rate r token/sec unless bucket
full
 over interval of length t: number of packets
admitted less than or equal to (r t + b).
CMPT771 Network Layer 154
Policing Mechanisms (more)
 token bucket, WFQ combine to provide guaranteed
upper bound on delay, i.e., QoS guarantee!
arriving
traffic
token rate, r
bucket size, b
WFQ
per-flow
rate, R
D = b/R
max
CMPT771 Network Layer 155
IETF Integrated Services
 architecture for providing QOS guarantees in IP
networks for individual application sessions
 resource reservation: routers maintain state info
(a la VC) of allocated resources, QoS req’s
 admit/deny new call setup requests:
Question: can newly arriving flow be admitted
with performance guarantees while not violated
QoS guarantees made to already admitted flows?
CMPT771 Network Layer 156
Intserv: QoS guarantee scenario
 Resource reservation
 call setup, signaling (RSVP)
 traffic, QoS declaration
 per-element admission control
request/
reply

QoS-sensitive
scheduling (e.g.,
WFQ)
CMPT771 Network Layer 157
Call Admission
Arriving session must :
 declare its QOS requirement
R-spec: defines the QOS being requested
 characterize traffic it will send into network
 T-spec: defines traffic characteristics
 signaling protocol: needed to carry R-spec and Tspec to routers (where reservation is required)
 RSVP

CMPT771 Network Layer 158
Signaling in the Internet
connectionless
(stateless)
forwarding by IP
routers
+
best effort
service
=
no network
signaling protocols
in initial IP
design
 New requirement: reserve resources along end-to-end path (end
system, routers) for QoS for multimedia applications
 RSVP: Resource Reservation Protocol [RFC 2205]
 “ … allow users to communicate requirements to network in
robust and efficient way.” i.e., signaling !
 earlier Internet Signaling protocol: ST-II [RFC 1819]
CMPT771 Network Layer 159
IETF Differentiated Services
Concerns with Intserv:
 Scalability: signaling, maintaining per-flow router
state difficult with large number of flows
 Flexible Service Models: Intserv has only two
classes. Also want “qualitative” service classes


“behaves like a wire”
relative service distinction: Platinum, Gold, Silver
Diffserv approach:
 simple functions in network core, relatively
complex functions at edge routers (or hosts)
 Do’t define define service classes, provide
functional components to build service classes
CMPT771 Network Layer 160
Diffserv Architecture
Edge router:
r
- per-flow traffic management
- marks packets as in-profile
and out-profile
b
marking
scheduling
..
.
Core router:
- per class traffic management
- buffering and scheduling
based on marking at edge
- preference given to in-profile
packets
- Assured Forwarding
CMPT771 Network Layer 161
Edge-router Packet Marking
 profile: pre-negotiated rate A, bucket size B
 packet marking at edge based on per-flow profile
Rate A
B
User packets
Possible usage of marking:
 class-based marking: packets of different classes marked differently
 intra-class marking: conforming portion of flow marked differently than
non-conforming one
CMPT771 Network Layer 162
Classification and Conditioning
 Packet is marked in the Type of Service (TOS) in
IPv4, and Traffic Class in IPv6
 6 bits used for Differentiated Service Code Point
(DSCP) and determine PHB that the packet will
receive
 2 bits are currently unused
CMPT771 Network Layer 163
Classification and Conditioning
may be desirable to limit traffic injection rate of
some class:
 user declares traffic profile (e.g., rate, burst size)
 traffic metered, shaped if non-conforming
CMPT771 Network Layer 164
Forwarding (PHB)
 PHB result in a different observable (measurable)
forwarding performance behavior
 PHB does not specify what mechanisms to use to
ensure required PHB performance behavior
 Examples:


Class A gets x% of outgoing link bandwidth over time
intervals of a specified length
Class A packets leave first before packets from class B
CMPT771 Network Layer 165
Randomization in Router Queue Management
 normally, packets dropped only when queue overflows

“Drop-tail” queueing
P6 P5 P4 P3 P2 P1
ISP
router
Internet
FCFS
Scheduler
ISP
router
CMPT771 Network Layer 166
The case against drop-tail queue management
P6 P5 P4 P3 P2 P1
FCFS
Scheduler
 large queues in routers are “a bad thing”
 End-to-end latency dominated by length of queues at
switches in network
 allowing queues to overflow is “a bad thing”
 connections transmitting at high rates can starve
connections transmitting at low rates
 connections can synchronize their response to congestion
CMPT771 Network Layer 167
Idea: early random packet drop
P6 P5 P4 P3 P2 P1
FCFS
Scheduler
 When queue length exceeds threshold, packets
dropped with fixed probability


probabilistic packet drop: flows see same loss rate
problem: bursty traffic (burst arrives when queue is near
full) can be overpenalized
CMPT771 Network Layer 168
Random early detection (RED) packet drop
Average queue length
Max
queue length
Drop probability
Forced drop
Max
threshold
Probabilistic
early drop
Min
threshold
No drop
Time
 use exponential average of queue length to determine when to
drop
 avoid overly penalizing short-term bursts
 React to longer term trends
 tie drop prob. to weighted avg. queue length
 avoids over-reaction to mild overload conditions
CMPT771 Network Layer 169
Random early detection (RED) packet drop
Average queue length
Max
queue length
Drop probability
Forced drop
Max
threshold
Probabilistic
early drop
Min
threshold
No drop
Time
Drop probability
100%
maxp
min
max
Weighted Average
Queue Length
CMPT771 Network Layer 170
Random early detection (RED) packet drop
 large number (5) of parameters: difficult to tune
(at least for http traffic)
 gains over drop-tail FCFS not that significant
 still not widely deployed …
CMPT771 Network Layer 171
RED: why probabilistic drop?
 provide gentle transition from no-drop to all-drop
 provide “gentle” early warning
 provide same loss rate to all sessions:
 with tail-drop, low-sending-rate sessions can be
completely starved
 avoid synchronized loss bursts among sources
 avoid cycles of large-loss followed by no-transmission
 WRED (Cisco)
CMPT771 Network Layer 172