Transcript CHAPTER 10
10. IP QoS Service
The current Internet: IP Protocol
Best-Effort Service – no Quality of
Service (QoS) guarantees are provided.
Connectionless Service – no connection is
established prior to sending the packets.
Each packet carries the full destination
address. Routing is performed using a
shortest path algorithm, independently
for each packet.
ECE6609
Why do we need a New Protocol?
The emerging multimedia applications require QoS
guarantees
Real-time applications require connection oriented
services.
Other routing algorithms may be more appropriate than
the shortest path algorithm in order to increase
network efficiency and provide QoS.
ECE6609
IETF* Proposed Solutions
Integrated Services (IntServ)
Resource Reservation Protocol (RSVP)
– Disadvantage: Scalability (per-flow reservations)
Differentiated Services (DiffServ)
– Disadvantage: No per-flow QoS guarantee
Multiprotocol Label Switching (MPLS)
http://www.ietf.org
*IETF – Internet Engineering Task Force
ECE6609
REQUIREMENTS for IP QoS
A network is characterized as having EDGE and
CORE ROUTERS.
Edge routers accept customer traffic, i.e.,
packets from any source outside the network into
the network.
Core routers provide transit packet forwarding
service between other Core routers and/or Edge
routers.
ECE6609
REQUIREMENTS for IP QoS
Edge routers characterize, police, and mark customer
traffic being admitted to the network.
Edge routers may decline requests signaled by outside
sources (Admission Control).
Core routers differentiate traffic insofar as necessary to
cope with transient congestion within the network itself.
Statistical multiplexing must be utilized wherever
appropriate to maximize utilizaton of core resources.
ECE6609
Network Architecture
ECE6609
Integrated Services (IntServ)
GOAL: Augment existing Best Effort Internet
with a range of end-to-end services for
real-time streaming in interactive applications.
IntServ developed an architecture requiring per-flow
traffic handling at every hop along an application’s endto-end path and explicit a priori signaling using RSVP
(Resource Reservation Protocol) of each flow’s
requirements.
ECE6609
Integrated Services (IntServ)
IntServ model requires resources such as
bandwidth and buffers to be explicitly reserved
for a given data flow to ensure that the
application receives its requested QoS.
A flow is composed by a stream of packets with
the same source and destination addresses and
port numbers.
A flow descriptor is used to describe the traffic
and QoS requirements of a flow.
ECE6609
Per-flow QoS guarantees are provided at the
expense of installing and maintaining flow-specific
state in each router along the flow’s path.
Basic components of the IntServ architecture:
Setup Protocol, Traffic Control (filterspec),
flowspec and Traffic Classes.
ECE6609
Architecture Basic Components
Setup Protocol – enables a host or an application to
request a specific amount of resources from the
network realized by
(Resource Reservation Protocol (RSVP))
Traffic Control (filterspec) – includes packet
classifier, packet scheduler, and admission control.
flowspec – objects such as token bucket parameters.
Traffic Classes – best-effort, controlled load, and
guaranteed services.
ECE6609
Setup Protocol: RSVP
Every application is presumed to use some form of
signaling to negotiate service with an IntServ capable
network.
IntServ signaling has 2 functions:
Negotiation: When the network decides whether it
can support the applications requested service
(Admission Control)
Configuration: When the network configures the
routers along the path to support the negotiated
flow characteristics.
The applications use
RSVP: Resource Reservation Protocol.
ECE6609
Goals for the Design of RSVP
Must support both unicast and multicast
traffic flows (i.e., RSVP sessions).
Must allow parties of a multicast session to
request different levels of QoS.
Must be deployable on top of existing IP
infrastructure.
ECE6609
Basics of RSVP
Performs resource reservations for unicast and multicast
applications.
Requests resources in one direction from a sender to a receiver
(simplex resource reservation)
Requires the receiver to initiate and maintain the resource
reservation.
Maintains soft state at each intermediate router: A resource
reservation at a router is maintained for a limited time only, and
so the sender must periodically refresh its reservation.
Does not require each router to be RSVP capable.
Non-RSVP capable routers use Best Effort delivery technique.
Provides different reservation styles so that requests may be
merged in several ways according to the applications.
Supports both IPv4 and IPv6.
ECE6609
RSVP: Receiver Initiated Reservation
Similar to “Leaf Join Case” in ATM Multicasting.
Motivation: RSVP is primarily designed to support multiparty
conferencing with heterogeneous receivers.
In this environment the receiver actually knows how much
bandwidth it needs.
If the sender were to make the reservation request, then the
sender must obtain the bandwidth requirement from each receiver.
This may cause an implosion problem for large multicast groups.
Problem: Receiver does not directly know the path taken by data
packets.
Solution:
Use Path messages.
ECE6609
RSVP
The application source transmits a “Path” message
along the routed path to the unicast or multicast
destination.
– The Path message has two purposes:
* to mark the routed path in each router (store
the “path state”) between sender/receiver and
* to collect information about the QoS viability
of each router along that path.
– Upon receiving the Path message, the destination
host(s) can determine what services the network
can support (e.g., guaranteed service or controlled
load) and then generate an RSVP reservation (Resv)
message.
ECE6609
RSVP
Resv messages are sent back towards the sender along
the reverse path.
The Resv message carries reservation requests to the
routers along the path.
The Resv message contains traffic and QoS objects that
are processed by the traffic control component of each
router as it follows the reverse path upstream toward the
sender.
If the router has sufficient capacity, then resources
along the path back towards the receiver are reserved for
that flow. If resources are not available, RSVP error
messages are generated and returned to the receiver.
ECE6609
SOFT STATE in RSVP
RSVP Path and Resv messages are periodically
sent by senders and receivers, respectively, to
refresh the reservations performed.
When a state is not refreshed within a certain
time out, the state is deleted.
The type of state that is maintained by a timer
is called “Soft State” as opposed to hard state
where the establishment and teardown of a
state are explicitly controlled by signaling
messages.
ECE6609
RESERVATION STYLES in RSVP
Wildcard Filter Reservation
A single reservation shared by all senders. Kind of shared
pipe whose resource is the largest of the resource
requests from all receivers, independent of the number of
senders. (e.g., Audioconferencing).
Fixed Filter Reservation
A distinct reservation is created for each sender. S_i is
the selected sender and Q_i is the resource request for
sender i. The total reservation on a link for a given
session is the sum of all Q_i’s.
Shared Explicit Reservation
A single reservation shared by a set of explicit senders
where S_i is the selected sender and Q is the flowspec.
ECE6609
flowspec and filterspec
flowspec is used to set parameters in the
router’s packet scheduler.
flowspec (Flow Specification) consists of traffic
specification (Tspec) (T for traffic) and a
service request specification (Rspec) (R for
reserve).
Tspec describes the sender’s traffic
characteristics, i.e., it specifies the traffic
behavior of the flow in terms of a token bucket.
ECE6609
Flow Specification (flowspec)
Rspec reserves a service class which defines
the requested QoS,
i.e., it specifies the requested QoS in terms
of bandwidth, packet delay or packet loss.
flowspec is carried by RSVP messages into
the network and defines the application’s QoS
requirements as a series of objects, such as
token bucket parameters.
ECE6609
Traffic Control Components (filterspec)
filterspec (Filter Specification) provides the information
required by the packet classifier to identify the packets
that belong to the flow.
Classifier - examines the source and destination
addresses, and port number fields in each packet to
determine what class the packet belongs to.
Scheduler - determines which packet will be served
next.
Admission Control - determines whether a new flow
can be granted the requested QoS without affecting
other flows existing in the network.
ECE6609
Traffic Classes Components
Best-Effort - same as in the traditional
IP networks.
Controlled Load - approximates a besteffort over an uncongested network.
Guaranteed Service - supports real-time
traffic flows that require a delay bound.
ECE6609
Controlled Load Service
Under CL service, the packets of a given flow will
experience loss and delays comparable to a network
with a light traffic load, assuming the flow complies
with the traffic contract.
No guarantees are provided but both loss probability
and delay are expected to be very low.
The application provides the network with an estimate
of the traffic it will generate.
This estimate is done by specifying the data flow’s
desired traffic parameters (Tspec) to the network
element.
ECE6609
Controlled Load Service
Tspec (Traffic Specification) Model:
It is a refinement of the Token Bucket model.
A source characterizes itself with the following
SENDER-Tspec (traffic characteristics) parameters:
* Token bucket rate r (bytes/sec) and size
b (bytes)
* Peak data rate p
* Minimum policed unit m
* Maximum packet size M
ECE6609
Controlled Load Service
Admission Control is performed in order to deliver the expected
QoS.
Traffic flows are policed.
Non-conformant packets are either dropped or delivered when
possible using the best-effort service.
Packets larger than the agreed maximum packet size will also be
considered as non-conformant.
Adaptive real-time applications are supposed to use the controlled
load service.
These applications perform well when the network is not heavily
loaded, but suffer rapid degradation in performance as the
network load increases.
ECE6609
Guaranteed Service
GS guarantees the packets will arrive within a
certain delivery time, and that they will not be
discarded due to queue overflow, provided that
the flow’s traffic complies with the traffic
contract.
GS also uses the Tspec model.
The service is requested by a sender
specifying Tspec and the receiver subsequently
requesting a desired service level (Rspec).
ECE6609
Guaranteed Service
Rspec (Reservation Specification) Model:
Works together with the Tspec model to guarantee a desired
service level.
The desired service level is described using the following
parameters (R data rate and S slack term)
in addition to r,b,p,m and M used for CL service:
– Data rate R is measured in the same units as r and must
be equal to or more than r (token rate). R reflects the
theoretical service rate that, at each router, will result in
a desirable delay bound.
– Slack term S is measured in microsec and reflects how far
each router is allowed to deviate from the ideal delay
bound, i.e., the difference between the desired delay and
the delay obtained by using a reservation level R.
REMARK: Larger values for R and smaller values for S represent
stricter delay bounds.
ECE6609
Guaranteed Service
Making use of TSpec and RSpec, a certain
amount of bandwidth and buffer space is
allocated at each node for each flow.
Resources are allocated using worst-case
analysis.
Upper bounds for the end-to-end delay and
the packet loss probability can be evaluated
mathematically.
ECE6609
SIGNALING and ADMISSION CONTROL
Sources emit regular PATH messages downstream
toward the receiver(s) for reservation
Two message objects relevant to IntServ are
carried in PATH messages: SENDER_Tspec
(describing the traffic) and ADspec (modified at
each hop to reflect the network characteristics
between source and receiver).
ADspec informs the receiver which service classes
(CL, GS or both) are appropriate for the traffic.
Along the way, IntServ capable routers may modify
the ADspec relevant to reflect restrictions or
modifications required by the network.
ECE6609
SIGNALING and ADMISSION CONTROL
Receiver(s) respond with Resv messages upstream
toward the sender
Receiver uses the SENDER_Tspec and (possibly
modified) ADspec to determine which parameters to
send back upstream in a flowspec element.
flowspec selects either CL or GS and carries
parameters required by the routers along the upstream
path to determine whether the request can be honored
or not.
One message object relevant to IntServ is carried in
Resv messages: flowspec (describing the receiver’s
desired QoS service to be applied to the sources’
traffic).
ECE6609
IntServ Drawbacks
Scalability – per flow resources reservation.
Flexibility – IntServ provides a small number
of pre-specified traffic classes: Guaranteed
and Controlled Load Services.
Efficiency – The Guaranteed Service of the
IntServ model is based on the worst case
analysis and thus, is very conservative.
Moreover, bandwidth and delay requirements
are coupled, causing network inefficiency.
ECE6609
Resource Reservation Protocol Drawbacks
Complicated RSVP signaling (unidirectional,
frequent refresh messages).
The current version of RSVP lacks both
adequate security mechanisms to prevent
unauthorized parties from instigating theftof-service attacks, and policy control.
ECE6609
Looking for a New Solution…
Because of the difficulty in
implementing and deploying IntServ and
RSVP, the IETF proposed the
Differentiated Services (DiffServ)
architecture
ECE6609
Differentiated Services (DiffServ)
Solves scalability and flexibility problems
Forces as much complexity as possible to the
edge nodes which process lower volumes of
traffic and lesser number of flows.
Offers service per aggregate traffic, rather than
per flow.
Reservations are made for a set of related flows.
It does not require new applications or extensive
router upgrades.
It does not define specific services or service
classes, as IntServ does.
ECE6609
Differentiated Services
The objective of the DiffServ is
to propose a small, well-defined set
of building blocks from which
a variety of services may
be constructed.
Complexity is moved from
the core of the network to
the edge of the network.
Packet forwarding in the
core network is simple and
per-aggregate rather than
per-flow.
ECE6609
Differentiated Services
A DiffServ Domain is a set of contiguous DS nodes defining the same per
hop behaviors (PHBs) and under the same policy strategy.
A DS domain consists of DS interior, edge, and boundary nodes.
A boundary node interconnects the DS domain to other DS or non-DScomplaint nodes.
Edge and interior nodes only connect to other interior, edge, or boundary
nodes within the same DS domain.
ECE6609
Differentiated Services
The DSCP (DiffServ Code Point)
byte is used to
specifythe forwarding
treatment (or per-hop
behavior) to be used for
packets
The DS byte coincides
with the TOS octet in
IPv4 and the Traffic
Class octet in IPv6.
The DS byte is used to specify the
forwarding treatment (or per-hop
behavior) to be used for a packet.
ECE6609
Edge and Core Nodes
Edge nodes handle a relatively small number of traffic
flows.
Therefore, they can execute per-flow traffic
management.
Edge nodes are responsible for policing and shaping.
They are also responsible for admission control, if any.
Core nodes handle a large amount of traffic flows.
They perform per-aggregate rather than per-flow
traffic management.
ECE6609
Basic Approach
• Traffic is divided into a small number of
groups called forwarding classes
• Forwarding class that a packet belongs to
is encoded into a field in the IP packet
header.
• Each forwarding class represents a
predefined forwarding treatment in terms
of drop priority and bandwidth allocation.
ECE6609
Basic Approach (cont.)
Achieves scalability by implementing traffic classification
and conditioning functions at network boundary nodes
Classification involves mapping packets to different
forwarding classes.
Conditioning: checking whether traffic flows meet the
service agreement and dropping/remarking non-conformant
packets.
Interior nodes forward packets based solely on the
forwarding class.
ECE6609
Basic Approach (cont.)
Resource allocation for aggregated traffic rather than
individual flows
Performance assurance to individual flows in a
forwarding class provided through prioritization and
provisioning rather than per-flow reservation
Traffic policing on the edge and class-based forwarding
in the core
Define forwarding behaviors not services
ECE6609
Basic Approach (cont.)
Guarantee by provisioning rather than reservation
Allocate resources to forwarding class and control the
amount of traffic for these classes
Provides only service assurance; no BW or delay
guarantee
Based on SLAs, not dynamic signaling
Focus on a single domain, not end-to-end
Forwarding classes can be defined for a single domain
and between domains service providers can extend or
map their definitions through bilateral agreement
ECE6609
Services and Forwarding Treatment
Two important concepts in DiffServ architecture
Forwarding treatment refers to the externally observable
behavior of a specific algorithm or mechanism that is
implemented in a node e.g. Express forwarding (using
priority queue)
Service is defined by the overall performance that a
customer’s traffic receives e.g. a no-loss service
provided by Express Forwarding
ECE6609
Per Hop Behavior (PHB)
Forwarding treatments at a node
Each PHB is represented by a 6-bit value called DSCP
All packets with the same code points are referred to as
a behavior aggregate (BA) and they receive the same
forwarding treatment.
ECE6609
PHB (cont.)
Describe forwarding behavior in either relative or absolute
terms
* Minimal BW for BA: absolute term
* Allocate BW proportionally: relative
Typically implemented by means of buffer management and
packet scheduling.
ECE6609
Per-Hop Behavior
The PHB defines the service a packet receives at each
hop as it is forwarded through the network.
It is realized through internal queue management and
scheduling techniques.
5 bits of the DS byte can be used to specify the PHB.
Therefore, (2^5) = 32 PHBs can be defined.
The IETF intends to standardize only a few of them.
Packets marked with different DS byte values should
receive different PHB and, accordingly, should
experience different services in the core network.
Services can be differentiated using appropriate
– Scheduling
– Queue Management
ECE6609
Services (cont.)
SLAs may be static or dynamic
Services can be defined in either quantitative or qualitative
terms
Services may have different scopes:
* All traffic from ingress node A and any egress
nodes
* All traffic between ingress node A and egress node
B
ECE6609
IETF Per-Hop Behaviors
The IETF DiffServ Working Group is
finishing work on two PHBs:
– Expedited Forwarding (EF)
– Assured Forwarding (AF)
ECE6609
Expedited Forwarding PHB
The EF PHB was designed to support low loss, low delay, and low
jitter connections.
It appears as a point-to-point virtual leased line (VLL) service
between endpoints with a peak bandwidth.
To minimize jitter and delay, packets must spend little or no
time in router queues.
Therefore, the EF PHB requires that the traffic be conditioned
to conform to the peak rate at the boundary, and the network
of routers be provisioned such that this peak rate is less than
the minimum packet departure rate at each router in the
network.
The EF PHB uses a single DSCP bit to indicate that the packet
should be placed in a high-priority queue on the outbound link of
each router hop.
ECE6609
Assured Forwarding PHB
The AF PHB defines four relative classes of service
with each service supporting three levels of drop
precedence.
Twelve distinct DSCP bit combinations define the AF
classes and the drop precedence within each class.
When congestion is encountered at a router, packets
with a higher drop precedence will be discarded ahead
of those with a lower drop precedence.
The four AF classes define no specific bandwidth or
delay constrains other than that AF class 1 is distinct
from AF class 2, and so on.
ECE6609
Services
Describes the overall treatment of a customer’s traffic
within a DS domain or end-to-end.
This is what is visible to the customers; PHBs are hidden
inside the network node.
Realizing a service involves many components to work
together:
* Mapping of traffic to specific PHBs,
* Traffic conditioning at the boundary,
* Network provisioning,
* PHB-based forwarding in the core
ECE6609
Services (cont.)
In Diffserv, services are defined in the form of a Service
Level Agreement (SLA) between a customer and its service
provider
One important element of SLA in Diffserv is the Traffic
Conditioning Agreement (TCA).
TCA details the service parameters for traffic profiles and
policing actions.
ECE6609
Services (cont.)
This may include
Traffic profiles, such as token bucket
parameters for each of the classes
Performance metrics: throughput, delay
Actions for non-conformant packets
In addition to TCA, an SLA may also contain other
characteristics and business-related agreements such as
availability, security, monitoring, auditing, billing.
ECE6609
Packet Classifier and Traffic Conditioner
METER
PACKETS
CLASSIFIER
MARKER
ECE6609
SHAPER
DROPPER
Traffic Conditioning Components
Meter
Packets
Classifier
Marker
Shaper&Dropper
– Meter: A meter measures the temporal properties of the stream of
packets selected by the classifier against a traffic profile.
– Marker: A packet is marked by setting its DS field to a particular
codepoint. The packet now belongs to a certain behavior aggregate.
– Shaper: A shaper holds (delays) some or all the packets in a traffic
stream to make the stream to become compliant to the traffic
profile.
– Dropper: A dropper discards some or all the packets in a traffic
stream to bring the stream into compliance with the traffic profile.
ECE6609
Classifier
Divides an incoming packet stream into multiple groups
based on predefined rules
Two basic types of classifiers:
* Behavior Aggregate (BA)
* Multifield (MF)
BA classifier selects packets based solely on DSCP
DiffServ Code Point) value in the packet header
BA classifier is used when DSCP has been set (marked)
before the packet reaches the classifier
ECE6609
Classifier (Cont.)
MF classifier uses a combination of one or more fields of
the five-tuple
(src addr, src port, dest addr, dest port, proto ID)
in the packet header for classification
Classification policies may specify a set of rules and
corresponding DSCP values for marking the matched
packets
ECE6609
Traffic Conditioner
Performs traffic policing function to enforce the TCA
(Traffic Conditioning Agreement) between customer
and service providers
Four basic elements:
•Meter
•Marker
•Shaper and
•Dropper
ECE6609
Meter
For each forwarding class meter measures the traffic flow
from a customer against its traffic profile
In-profile packets are allowed to enter the network
Out-profile packets are further conditioned based on TCA
ECE6609
Marker
Sets the DS field of a packet to a particular DSCP,
adding marked packet to forwarding class.
May act on unmarked packets or remark previously
marked packets.
Can occur at different locations:
* Can be marked by the application
* Marked by the first-hop routers
ECE6609
Marker (cont.)
Marking is done on non-conforming packets:
* Packets may be marked with a special DSCP to
indicate non-conformance
* These packets would be dropped first in the event
of network congestion
Since packets travel through different domains, packets
that have been marked may be remarked (to a different
DSCP).
ECE6609
Marker (cont.)
When packet REmarked with new DSCP receives
worse forwarding treatment than from previous
DSCP:
PHB demotion
With better forwarding treatment:
PHB promotion
ECE6609
Shaper
Shapers delay non-conformance packets in order to bring
the stream into compliance.
A stronger form of policing than marking
Shaping may also be needed at a boundary node to a
different domain (to make sure that the traffic is
conformant before entering the next domain)
Usually has finite buffer, so may also drop packets when
buffer is full
ECE6609
Dropper
Discards packets in a traffic stream in order to bring
the stream into compliance with a traffic profile.
Strongest policing entity
Can be implemented as a special case of a shaper by
setting the shaper buffer size to zero.
ECE6609
Differentiated Services Field
Uses 6 bits in the IP header to encode forwarding treatment
These 6 bits are those out of the IP TOS field (8 bits long)
DiffServ redefines existing IP TOS field to indicate
forwarding behavior
Replacement field, called DS field supersedes existing
definition of TOS
First 6 bits used as DSCP to encode the PHB, remaining 2
bits are currently unused (CU).
ECE6609
Differentiated Services Field (cont.)
xxxxx0 – standard action
xxxx11 – experimental and local use
xxxx01 – experimental and local use but may be subject
to standard action (in case pool 1 is exhausted)
ECE6609
Assured Forwarding (AF)
The basic idea came from RIO scheme
In RIO scheme packets are marked as In or Out
During congestion, out packets are dropped first:
in/out bit indicates drop priorities
AF standard extended the basic in or out marking in RIO
into four forwarding classes and within each forwarding
class, three drop precedences
ECE6609
Assured Forwarding (AF) (cont.)
Customers can subscribe to the service built with AF
forwarding class and their packets will be marked with
appropriate AF DSCPs.
Drop priorities within each forwarding class are used to
select which packets to drop during congestion
When backlogged packets from an AF forwarding class
exceed a specified threshold, packets with highest drop
priority is dropped first, then packets with lower drop
priority
ECE6609
AF Implementation
Can be implemented as BW partition between
classes and drop priorities within a class
BW partition is specified in terms of minimum BW
Can be achieved by WFQ scheduling and assigning
weights according to min BW requirement
ECE6609
AF Implementation (cont.)
AF standard specifies certain properties
Attempt to minimize short-term fluctuation in congestion:
Some smoothing function should be applied.
Dropping mechanism should be insensitive to the short term traffic
characteristics and discard packets from flows of the same long
term characteristics with equal probability:
Use random function for dropping
Discard rate of a flow within a drop priority should be proportional
to the flow’s percentage of the total amount of traffic passing
through that drop priority level
Can use RED or RIO for dropping
ECE6609
Buffer Management
When a router runs out of buffer
space packets must be dropped.
In DiffServ, dropping decisions
take the DS byte value into
account.
For example if Weighted Random
Early Detection (WRED) is used:
ECE6609
Random Early Detection (RED)
P(drop)
1.0
MAX-thr
MIN-thr
maxp
qlen-avg
MIN-thr
ECE6609
MAX-thr
RED Algorithm (Cont.)
for each packet arrival
calculate the average queue size “avg”
if min-thr <= avg < max-thr
calculate probability pa
with probability pa mark the
arriving packet
else if max-thr <= avg
mark the arriving packet
ECE6609
RED Algorithm (Cont.)
pb = maxp (avg – min_thr) / (max_thr – min_thr)
pa = pb / (1 – count * pb)
count is the number of packets unmarked since the last
packet marking.
pa ensures that the EDGE ROUTER/INGRESS ROUTER
does not wait too long before marking a packet.
ECE6609
RED Algorithm (Cont.)
Avoids global synchronization problem by
virtue of its randomness
No bias against bursty traffic
ECE6609
RED-In/Out (RIO)
Uses same mechanism as RED, but is configured with
two sets of parameters,
(in-profile packets and out-profile packets)
Out-packets are dropped more aggressively than inpackets
ECE6609
RED-In/Out (RIO)
Pout = Pmaxout (avgout+in – minout) / (maxout – minout)
Pin = Pmaxin (avgin – minin) / (maxin – minin)
If avgout+in < minout, no packet dropped,
If avgout+in > maxout, all “Out” packets are dropped
If avgin < minin, no packet dropped,
If avgin > maxin, all “In” packets are dropped
ECE6609
RIO (Cont.)
P_out (drop)
P_in (drop)
1.0
1.0
P_max_out
P_max_in
MIN-in
MAX-in
Avg_in
ECE6609
MIN-out
MAX-out
Avg_tot
RIO (cont.)
Discrimination against out packet is created by carefully
choosing the parameters
(min_in, max_in, Pmax_in) and (min_out, max_out,
Pmax_out)
Drops “out packets” earlier than “in packets”:
done by choosing
min_out < min_in
Drops “out packets” with a higher probability:
Pmax_out > Pmax_in (Congestion Avoidance Phase)
ECE6609
RIO (cont.)
Goes into congestion control phase for “out packets” much
earlier than for “in packets” by choosing
max_out <<max_in.
So, RIO drops “out packets” first when it detects some
congestion and drops all “out packets” if congestion persists
Only as a last resort, it may drop “in packets” to control
congestion
If a router is consistently dropping in packets then the
router may be under-provisioned
ECE6609
Expedited Forwarding (EF)
Proposed to characterize a forwarding treatment similar
to that of a simple priority queueing.
Forwarding treatment of traffic aggregate must equal or
exceed a configurable rate
Should receive this rate independent of load of other
traffic passing through the node
Provides low delay and low loss service
Code point <101110> used for EF PHB
ECE6609
EF Implementation
Several queueing mechanisms can be used to implement EF PHB
Priority queueing with token bucket
1. Priority of EF traffic should be highest in the
system
2. Token bucket is used to limit the total amount of
EF traffic so that other traffic will not starve
WFQ can be used such that weight assigned to EF
traffic has relative priority than other traffic
ECE6609
Interoperability with Non-DS-Compliant Node
Non-DS-compliant node is a node that does not implement
some or all of the standardized PHBs.
A special case of a non-DS-compliant node is a legacy
node which implements IPv4 Precedence classification as
defined in RFC1812 and RFC791
Nodes that are non-DS-compliant and not legacy nodes
may exhibit unpredictable forwarding behavior for packets
with non-zero DSCP.
ECE6609
Non-DS-Compliant Node within a Domain
When links connected to a non-DS-compliant node are
lightly loaded, the performance degradation may be
negligible
However, in general, lack of PHB forwarding a node will
make it impossible to offer low-delay, low-loss service
Use of legacy node may be acceptable if DS domain
restricts itself if the precedence implementation in the
legacy node is compatible with services offered along the
path
ECE6609
Transit Non-DS-Compliant Domain
DS domain and non-DS domain may negotiate how egress
traffic from DS domain be marked before entry into the
non-DS domain
When there is no traffic management service available or
no agreement in place, DS domain egress node may remark
the DSCP to zero, under the assumption that non-DS
domain will treat the traffic uniformly as best-effort
traffic
ECE6609
Differentiated Services (DiffServ)
Scalable: Only simple functions in the core, and relatively
complex functions at edge routers (or hosts)
Flexible: Does not define service classes, instead provides
functional components with which service classes can be
built
Simple: Users only specify a qualitative notion of service
End host
End host
core
routers
edge
routers
ECE6609
DiffServ Drawbacks
The QoS enjoyed by a flow is dependent
on the behavior of the other flows
belonging to the same aggregate.
There is no per-flow guarantees.
ECE6609
IntServ over DiffServ
Since IntServ has scaling issues in the core of the
network, DiffServ was proposed.
IntServ provides guaranteed service per flow whereas
DiffServ only provides assurance for aggregated traffic
Thus, application would still like to use IntServ until the
edge of the DiffServ core in the ingress side and from
edge of the DiffServ core to the end host/router on the
egress side
Hence the need for IntServ over DiffServ
ECE6609
IntServ over DiffServ (cont.)
Request for Intserv services needs to be mapped onto
underlying capabilities of Diffserv network:
* Selecting appropriate PHB for the requested service
* Performing appropriate policing at the edge of the
Diffserv network
* Performing admission control on the Intserv
ECE6609
IntServ over DiffServ (cont.)
When PHB has been selected for a particular Intserv
flow, it may be necessary to communicate the choice to
other network elements, e.g. when marking is not done
at the edge
Two schemes may be used to achieve this:
* Network Driven Mapping (Default)
* Microflow Separation
ECE6609
IntServ over DiffServ (cont.)
1. Network Driven Mapping
•
RSVP capable routers in Diffserv network (perhaps at
the edge) may do the well-known mapping
ECE6609
IntServ over DiffServ (cont.)
2. Microflow Separation
•
Boundary nodes at the edge of Diffserv network police
traffic from outside Diffserv network
•
But this policing is applied to aggregate traffic
ECE6609
MicroFlow Separation
So it is possible for a misbehaving microflow to claim
more than its fair share of resources within the
aggregate and degrade service provided to other
microflows.
This problem can be addressed in three ways:
* Provide per microflow policing at border routers:
but this approach puts management burden on the
Diffserv region
* Rely on upstream elements to do shaping and
policing
ECE6609
IntServ over DiffServ (cont.)
Two scenarios in this framework:
* Differv Network is RSVP-unaware
* Diffserv Network is RSVP-aware
ECE6609
Differv Network is RSVP-Unaware
1. Diffserv network and the customer of this network
have negotiated SLAs, e.g., amount of BW Diffserv
will provide for each SLA
2. RSVP messages just pass through the Diffserv
network as tunnels, without any action being taken.
3. The edge router in Intserv network will identify the
service level (DSCP) of the flow and will run
admission control to make sure that resources are
available in the Diffserv network at the
corresponding service level.
ECE6609
Differv Network is RSVP-Aware
1. Border routers and possibly some/all core routers in Diffserv
network are RSVP-aware
2. These routers participate in RSVP signaling, but schedule
traffic in aggregate, (like the control plane is RSVP while their
data plane is Diffserv)
3. Admission control agent is part of Diffserv network.
ECE6609
Multiprotocol Label Switching (MPLS)
MPLS is a forwarding paradigm.
Choosing the next hop can be thought as the
composition of two functions:
– Partitioning the entire set of possible
packets into a set of Forwarding
Equivalence Classes (FECs).
– Mapping each FEC to a next hop.
In the Multiprotocol Label Switching (MPLS),
the assignment of a packet to a particular
FEC is done just once: when the packet
enters the network.
ECE6609
Operation of MPLS
Remove Layer 2 header
New Layer 2 header
layer 2
header
layer 3
header
data
Network (3)
Link (2)
Physical (1)
IP Network
MPLS Network
ECE6609
Small tag lookup
“Label Substitution” What is it?
One of the many ways of getting from A to B:
• BROADCAST:
Go everywhere, stop when you get to B, never ask for directions.
• HOP BY HOP ROUTING:
Continually ask who’s closer to B go there, repeat … stop when you
get to B. “Going to B? You’d better go to X, its on the way”.
•SOURCE ROUTING:
Ask for a list (that you carry with you) of places to go that
eventually lead you to B. “Going to B? Go straight 5 blocks, take
the next left, 6 more blocks and take a right at the lights”.
ECE6609
Label Substitution
Have a friend go to B ahead of you using one of the previous two
techniques.
At every road they reserve a lane just for you.
At ever intersection they post a big sign that says for a given lane
which way to turn and what new lane to take.
LANE#1 TURN RIGHT USE LANE#2
LANE#1
LANE#2
ECE6609
A Label by Any Other Name ...
There are many examples of label substitution
protocols already in existence.
• ATM - label is called VPI/VCI and travels with cell.
• Frame Relay - label is called a DLCI and travels with
frame.
• TDM - label is called a timeslot its implied, like a lane.
• X25 - a label is an LCN
• Proprietary TAGs etc..
• One day perhaps Frequency Substitution where label is a
light frequency?
ECE6609
SO WHAT IS MPLS ?
• Hop-by-hop or source routing
to establish labels
• Uses label native to the media
• Multi level label substitution transport
ECE6609
ROUTE AT EDGE, SWITCH IN CORE
IP
IP
IP Forwarding
#L1
IP
#L2
LABEL SWITCHING
ECE6609
IP
#L3
IP
IP Forwarding
MPLS: HOW DOES IT WORK
UDP-Hello
TIME
UDP-Hello
TCP-open
Initialization(s)
Label request
IP
#L2
TIME
Label mapping
ECE6609
WHY MPLS ?
Leverage existing ATM hardware
Ultra fast forwarding
IP Traffic Engineering
– Constraint-based Routing
Virtual Private Networks
– Controllable tunneling mechanism
Voice/Video on IP
– Delay variation + QoS constraints
ECE6609
Need for MPLS
IP Routing
• Slow
• No path choice towards destination
• No QoS guarantees
• IP/ATM/SONET/DWDM architecture is not scalable
for very large traffic, and very cost-ineffective
ECE6609
BEST OF BOTH WORLDS
PACKET
ROUTING
IP
HYBRID
MPLS
+IP
CIRCUIT
SWITCHING
ATM
• MPLS + IP form a middle ground that combines the best of IP and
the best of circuit switching technologies.
• ATM and Frame Relay cannot easily come to the middle so IP has!!
ECE6609
MPLS Terminology
LDP: Label Distribution Protocol
LSP: Label Switched Path
FEC: Forwarding Equivalence Class
LSR: Label Switching Router
LER: Label Edge Router (Useful term
not in standards)
ECE6609
Forwarding Equivalence Classes
LSR
LER
LSR
LER
LSP
IP1
IP1
IP1
#L1
IP1
#L2
IP1
#L3
IP2
#L1
IP2
#L2
IP2
#L3
IP2
IP2
Packets are destined for different address prefixes, but can be
mapped to common path
• FEC = “A subset of packets that are all treated the same way by a router”
• The concept of FECs provides for a great deal of flexibility and scalability
• In conventional routing, a packet is assigned to a FEC at each hop (i.e., L3
look-up), in MPLS it is only done once at the network ingress
ECE6609
LABEL SWITCHED PATH (vanilla)
#216
#14
#311
#99
#311
#963
#311
#963
#14
#612
#5
#462
#99
#311
- A Vanilla LSP is actually part of a tree from every
source to that destination (unidirectional).
- Vanilla LDP builds that tree using existing IP forwarding
tables to route the control messages.
ECE6609
IP FORWARDING USED BY HOP-BY-HOP CONTROL
Dest
47.1
47.2
47.3
Dest
47.1
47.2
47.3
Out
1
2
3
1 47.1
1
Dest
47.1
47.2
47.3
Out
1
2
3
IP 47.1.1.1
2
IP 47.1.1.1
3
Out
1
2
3
2
IP 47.1.1.1
1
47.2
47.3 3
2
IP 47.1.1.1
ECE6609
MPLS Label Distribution
Intf Label Dest Intf Label
In In
Out Out
3
0.50 47.1 1
0.40
Intf
In
3
Label Dest Intf
In
Out
0.40 47.1 1
1
Request: 47.1
Intf Dest Intf Label
In
Out Out
3
47.1 1
0.50
3
1
47.1
3
2
1
Mapping: 0.40
2
47.3 3
47.2
2
ECE6609
Label Switched Path (LSP)
Intf Label Dest Intf Label
In In
Out Out
3
0.50 47.1 1
0.40
Intf Dest Intf Label
In
Out Out
3
47.1 1
0.50
Label Dest Intf
In
Out
0.40 47.1 1
1 47.1
IP 47.1.1.1
3
3
1
1
Intf
In
3
2
2
47.3 3
47.2
2
IP 47.1.1.1
ECE6609
EXPLICITLY ROUTED OR ER-LSP
#14
Route=
{A,B,C}
#972
#216
B
#14
A
C
#972
#462
ER-LSP follows route that source chooses.
In other words, the control message to establish the LSP
(label request) is source routed.
ECE6609
EXPLICITLY ROUTED LSP ER-LSP
Intf Label Dest Intf Label
In In
Out Out
3
0.50 47.1 1
0.40
Intf
In
3
3
Dest
47.1.1
47.1
Intf
Out
2
1
Label
Out
1.33
0.50
Label Dest Intf
In
Out
0.40 47.1 1
IP 47.1.1.1
1 47.1
3
3
1
1
Intf
In
3
2
2
47.3 3
47.2
2
IP 47.1.1.1
ECE6609
ER LSP - Advantages
• Operator has routing flexibility (policy-based, QoS-based)
• Can use routes other than shortest path
• Can compute routes based on constraints in exactly the
same manner as ATM based on distributed topology
database. (traffic engineering)
ECE6609
MPLS Link Layers
•MPLS is intended to run over multiple link layers
•Specifications for the following link layers currently exist:
• ATM: label contained in VCI/VPI field of ATM header
• Frame Relay: label contained in DLCI field in FR header
• PPP/LAN: uses ‘shim’ header inserted between L2 and L3
headers
Translation between link layers types must be supported
MPLS intended to be “multi-protocol” below as well as above
ECE6609
MPLS Encapsulation - ATM
ATM LSR constrained by the cell format imposed by existing ATM standards
5 Octets
ATM Header
Format
Option 1
VPI
Label
PT
CLP
HEC
Label
Combined Label
Option 2
Option 3
VCI
ATM VPI (Tunnel)
Label
AAL 5 PDU Frame (nx48 bytes)
n
ATM
SAR
•••
1
Network Layer Header
and Packet (eg. IP)
Generic Label Encap.
(PPP/LAN format)
AAL5 Trailer
48 Bytes
ATM Header
ATM Payload
48 Bytes
•••
• Top 1 or 2 labels are contained in the VPI/VCI fields of ATM header
- one in each or single label in combined field, negotiated by LDP
• Further fields in stack are encoded with ‘shim’ header in PPP/LAN format
- must be at least one, with bottom label distinguished with ‘explicit NULL’
• TTL is carried in top label in stack, as a proxy for ATM header (that lacks TTL)
ECE6609
MPLS Encapsulation - PPP & LAN Data Links
MPLS ‘Shim’ Headers (1-n)
n
•••
1
Network Layer Header
and Packet (eg. IP)
Layer 2 Header
(eg. PPP, 802.3)
4 Octets
Label Stack
Entry Format
Label
Exp.
S
TTL
Label: Label Value, 20 bits (0-16 reserved)
Exp.:
Experimental, 3 bits (was Class of Service)
S:
Bottom of Stack, 1 bit (1 = last entry in label stack)
TTL:
Time to Live, 8 bits
• Network layer must be inferable from value of bottom label of the stack
• TTL must be set to the value of the IP TTL field when packet is first labelled
• When last label is popped off stack, MPLS TTL to be copied to IP TTL field
• Pushing multiple labels may cause length of frame to exceed layer-2 MTU
- LSR must support “Max. IP Datagram Size for Labelling” parameter
- any unlabelled datagram greater in size than this parameter is to be fragmented
MPLS on PPP links and LANs uses ‘Shim’ Header Inserted
Between Layer 2 and Layer 3 Headers
ECE6609
MPLS & ATM
Several Models for running MPLS on ATM:
1. Label-Controlled ATM:
• Use ATM hardware for label switching
• Replace ATM Forum SW by IP/MPLS
IP Routing
MPLS
ATM HW
ECE6609
Label-Controlled ATM
• Label switching is used to forward network-layer packets
• It combines the fast, simple forwarding technique of ATM with
network layer routing and control of the TCP/IP protocol suite
Label Switching Router
Network Layer
Routing
(eg. OSPF, BGP4)
Switched path topology
formed using network
layer routing
(I.e. TCP/IP technique)
Forwarding
Table
Forwarding
Table
B 17
C 05
•
•
•
Label
Port
A
C
IP Packet
05
Label
IP Packet
17
B
D
Packets forwarded
by swapping short,
fixed length labels
(I.e. ATM technique)
ATM Label Switching is the combination of L3 routing and L2 ATM switching
ECE6609
MPLS Over ATM
MPLS
MPLS
L
S
R
ATM Network
L
S
R
Two Models
VP
VC
Internet Draft:
VCID notification over ATM Link
ECE6609
Ships in the Night
L
S
R
ATM
SW
MPLS
ATM
L
S
R
ATM
SW
ATM and MPLS control planes both run on the
same hardware but are isolated from each
other, i.e. they do not interact.
This allows a single device to simultaneously
operate as both an MPLS LSR and an ATM
switch.
Important for migrating MPLS into an ATM
network
ECE6609
Ships in the Night Requirements
Resource
Management
–VPI.VCI Space Partitioning
–Traffic management
•Bandwidth Reservation
•Admission Control
•Queuing & Scheduling
•Shaping/Policing
–Processing Capacity
ECE6609
Bandwidth Management
Port Capacity
A. Full Sharing
MPLS
Pool 1
•MPLS
•ATM
ATM
Available
B. Protocol Partition
Pool 1 MPLS
•50%
•ATM
Available
C. Service Partition
MPLS
Pool 1
•50%
•rt-VBR ATM
•COS2
Available
Pool 2 ATM
•50%
•rt-VBR
Available
Pool 2 MPLS
•50%
•nrt-VBR ATM
•COS1
Available
• Bandwidth Guarantees
• Flexibility
ECE6609
ATM Merge
Multipoint-to-point capability
Motivation
– Stream Merge to achieve scalability in MPLS:
• O(n) VCs with Merge as opposed to O(n2) for
full mesh
• less labels required
– Reduce number of receive VCs on terminals
Alternatives
– Frame-based VC Merge
– Cell-based VP Merge
ECE6609
Stream Merge
Input cell streams
1 1 1
2 2 2
3 3
in
1
2
3
out
7
6
9
6 7 9 6 7 9 6 7
Non-VC merging (Nin--Nout)
Input cell streams
in out
7 7 7 7 7 7 7 7
1 1 1
1 7
AAL5 Cell Interleaving Problem
2 2 2
2 7
7 7 7 7 7 7 7 7
3 3
3 7
No Cell Interleaving
VC merging (Nin-1out)
ECE6609
VC-Merge: Output Module
Reassembly buffers
Output buffer
Merge
ECE6609
VP-Merge
VCI=1
Option 1: Dynamic VCI Mapping
VCI=2
VPI=1
No Cell Interleaving Problem
Since VCI is unique
VCI=1
VCI=2
VPI=2
VCI=3
VPI=3
Option 2: Root
Assigned VCI
VCI=3
–merge multiple VPs into one VP
–use separate VCIs within VPs to distinguish frames
–less efficient use of VPI/VCI space, needs support of SVP
ECE6609
Summary of Motivations for MPLS
• Simplified forwarding based on exact match of fixed length label
- initial drive for MPLS was based on existance of cheap, fast ATM
switches
• Separation of routing and forwarding in IP networks
- facilitates evolution of routing techniques by fixing the forwarding
method
- new routing functionality can be deployed without changing the
forwarding techniques of every router in the Internet
• Facilitates the integration of ATM and IP
- allows carriers to leverage their large investment of ATM
equipment
- eliminates the adjacency problem of VC-mesh over ATM
•Enables the use of explicit routing/source routing in IP networks
- can be easily used for such things as traffic management, QoS
routing
ECE6609
Summary of Motivations for MPLS
• Promotes the partitioning of functionality within the network
- move granular processing of packets to edge; restrict core to
packet forwarding
- assists in maintaining scalability of IP protocols in large networks
• Improved routing scalability through stacking of labels
- removes the need for full routing tables from interior routers in
transit domain; only routes to border routers are required
• Applicability to both cell and packet link-layers
- can be deployed on both cell (eg. ATM) and packet (eg. FR,
Ethernet) media
- common management and techniques simplifies engineering
Many drivers exist for MPLS above and beyond high speed forwarding
ECE6609
IP and ATM Integration
IP over ATM VCs
IP over MPLS
• ATM cloud invisible to Layer 3 Routing
• ATM network visible to Layer 3 Routing
• Full mesh of VCs within ATM cloud
• Singe adjacency possible with edge router
• Many adjacencies between edge routers
• Hierachical network design possible
• Topology change generates many route updates
• Reduces route update traffic and power
needed to process them
• Routing algorithm made more complex
MPLS eliminates the “n-squared” problem of IP over ATM VCs
ECE6609
Traffic Engineering
B
C
Demand
A
D
Traffic engineering is the process of mapping traffic demand onto a network
Network
Topology
Purpose of traffic engineering:
• Maximize utilization of links and nodes throughout the network
• Engineer links to achieve required delay, grade-of-service
• Spread the network traffic across network links, minimize impact of single failure
• Ensure available spare link capacity for re-routing traffic on failure
• Meet policy requirements imposed by the network operator
Traffic engineering key to optimizing cost/performance
ECE6609
Traffic Engineering Alternatives
Current Methods of Traffic Engineering:
Manipulating routing metrics
Difficult to manage
Use PVCs over an ATM backbone
Not scalable
Over-provision bandwidth
Not economical
MPLS provides a new method to do traffic engineering (traffic steering)
Example Network:
Ingress node
explicitly routes
traffic over
uncongested path
Chosen by Traffic Eng.
(least congestion)
Congested Node
Chosen by routing protocol
(least cost)
Potential benefits of MPLS for traffic engineering:
operator control
- allows explicitly routed paths
- no “n-squared” problem
scalable
- per FEC traffic monitoring
granularity of feedback
- backup paths may be configured
redundancy/restoration
MPLS combines benefits of ATM and IP-layer traffic engineering
ECE6609
MPLS Traffic Engineering Methods
• MPLS can use the source routing capability to steer traffic on desired path
• Operator may manually configure these in each LSR along the desired path
- analogous to setting up PVCs in ATM switches
• Ingress LSR may be configured with the path, RSVP used to set up LSP
- some vendors have extended RSVP for MPLS path set-up
• Ingress LSR may be configured with the path, LDP used to set up LSP
- many vendors believe RSVP not suited
• Ingress LSR may be configured with one or more LSRs along the desired path,
hop-by-hop routing may be used to set up the rest of the path
- a.k.a loose source routing, less configuration required
• If desired for control, route discovered by hop-by-hop routing can be frozen
- a.k.a “route pinning”
• In the future, constraint-based routing will offload traffic engineering tasks from
the operator to the network itself
ECE6609
MPLS: Scalability Through Routing Hierarchy
AS1
BR2
AS2
TR1
BR1
AS3
TR2
BR3
TR4
Ingress router
receives packet
Packet labelled
based on
egress router
TR3
BR4
Forwarding in the interior
based on IGP route
Egress border
router pops
label and fwds.
• Border routers BR1-4 run an EGP, providing inter-domain routing
• Interior transit routers TR1-4 run an IGP, providing intra-domain routing
• Normal layer 3 forwarding requires interior routers to carry full routing tables
- transit router must be able to identify the correct destination ASBR (BR1-4)
• Carrying full routing tables in all routers limits scalability of interior routing
- slower convergence, larger routing tables, poorer fault isolation
• MPLS enables ingress node to identify egress router, label packet based on interior
route
• Interior LSRs would only require enough information to forward packet to egress
MPLS increases scalability by partitioning exterior routing from interior routing
ECE6609
MPLS: Partitioning Routing and Forwarding
Routing
Based on:
OSPF, IS-IS, BGP, RIP
Forwarding Table
Forwarding
Classful Addr. Prefix?
Classless Addr. Prefix?
Multicast Addr.?
Port No.?
ToS Field?
Based on:
Exact Match on Fixed Length Label
MPLS
• Current network has multiple forwarding paradigms
- class-ful longest prefix match (Class A,B,C boundaries)
- classless longest prefix match (variable boundaries)
- multicast (exact match on source and destination)
- type-of-service (longest prefix. match on addr. + exact match on ToS)
• As new routing methods change, new route look-up algorithms are required
- introduction of CIDR
• Next generation routers will be based on hardware for route look-up
- changes will require new hardware with new algorithm
• MPLS has a consistent algorithm for all types of forwarding; partitions routing/fwding
- minimizes impact of the introduction of new forwarding methods
MPLS introduces flexibility through consistent forwarding paradigm
ECE6609
Upper Layer Consistency Across Link Layers
Ethernet
PPP
(SONET, DS-3 etc.)
ATM
Frame
Relay
• MPLS is “multiprotocol” below (link layer) as well as above (network layer)
• Provides for consistent operations, engineering across multiple technologies
• Allows operators to leverage existing infrastructure
• Co-existence with other protocols is provided for
- eg. “Ships in the Night” operation with ATM, muxing over PPP
MPLS positioned as end-to-end forwarding paradigm
ECE6609
Common Misconceptions
IP
QoS is not ready for real, production
networks.
QoS is not useful unless it is deployed endto-end.
Only ATM networks can support true, end-toend QoS.
ECE6609