10SLAs&CHs - BNRG - University of California, Berkeley
Download
Report
Transcript 10SLAs&CHs - BNRG - University of California, Berkeley
Berkeley-Helsinki Summer Course
Lecture #10: Service Level
Agreements and Clearinghouses
Randy H. Katz
Computer Science Division
Electrical Engineering and Computer Science Department
University of California
Berkeley, CA 94720-1776
1
Outline
•
•
•
•
•
Applications and Performance
Service Level Agreements
Traffic Engineering to Deliver SLAs
Bandwidth Brokering
Clearing House
2
Outline
•
•
•
•
•
Applications and Performance
Service Level Agreements
Traffic Engineering to Deliver SLAs
Bandwidth Brokering
Clearing House
3
Different Applications and
Network Requirements
Bandwidth
Requirements
High
Streaming
Video
Video
Conferencing
E-mail with
Attachments
Text
email
Low
Low
Internet/
intranet
E-commerce
Voice
ERP
Terminal Mode
Transactions
Latency Sensitivity
High
4
Quality of Service
• Application-level QoS
– How well user expectations are qualitatively satisfied
– Clear voice (mean opinion scoring), jitter-free video, etc.
– Implemented at application-level: end-to-end protocols (RTP/RTCP),
application-specific representations and encodings (FEC, interleaving)
• Network-level QoS
– Easier to quantify, measure, and control
– Metrics include available b/w, packet loss rates, etc.
– Elements of a Network QoS Architecture
» QoS Specification (CoS—high vs. best, guarantees)
» Resource management and admission control
» Service verification and traffic policing
» Packet forwarding mechanisms (filters, shapers, schedulers)
» QoS routing
5
Heterogeneous Traffic Behavior
and QoS Requirements
Applications
Traffic Behavior
QoS Requirements
Electronic Mail (SMTP)
Small, batch file
File Transfer (FTP)
transfers
Remote Terminal (Telnet)
Very tolerant of delay
B/w requirement: low
Best effort
HTML Web Browsing
Series of small,
bursty file xfer
Tolerant of moderate delay
B/w requirement: varies
Best effort
Client-Server
E-Commerce
Many small 2-way Sensitive to loss/delay
xacts
B/w requirement: low-mod
Must be reliable
IP-based Voice (VoIP)
Real Audio
Constant or variable bit rate
Very sensitive to delay/jitter
B/w requirement: low
Requires predictable delay/loss
Streaming Video
Variable bit rate
Very sensitive to delay/jitter
B/w requirement: High, variable
Requires predictable delay/loss
Chen-nee Chuah
6
Technical Strategies for
Achieving Better QoS
Application
Internet/Intranet
Terminal Mode
Transactions
Voice
over IP
E-mail
Video
Conferencing
Solution
Cache
Queue
Packet Shaping
Streaming
Video
Largely Unsolved
7
Outline
•
•
•
•
•
Applications and Performance
Service Level Agreements
Traffic Engineering to Deliver SLAs
Bandwidth Brokering
Clearing House
8
What is a Virtual Private
Network?
• Alternative to a private network; uses the
open, distributed infrastructure of the
Internet to transmit data between corporate
sites
• Requires support for:
– opaque packet transport
– data security
– Quality of Service Guarantees and/or SLAs
• Provided by a single ISP; methods to span
multiple ISPs not well developed
9
What is a Service Level
Agreement?
• Migration from managing corporate WAN to out-sourcing
connectivity & transport to 3rd-party carrier
• Informal contract between carrier and customer defining
terms of carrier’s responsibility and type and extent of
remuneration if those are not meet
–
–
–
–
–
Worst case/average r/t packet latencies (e.g., 100-300 ms)
Worst case/average packet loss rates
Worst case/average bandwidth
Expected up times between VPN end points (e.g., 99.5%/month)
Responsiveness to service complaints and outages
• Access availability more important than b/w guarantees
• Extensions: to services beyond transport and to services
among multiple service providers
10
QoS in VPNs
• Obtain differentiated & dependable QoS for
flows belonging to a VPN
• Performance service abstractions:
– PIPE: provides performance guarantees for traffic
between a specific origin and destination pair
– HOSE: provides performance guarantees between an
origin and a set of destinations, and between a node and a
set of origins, i.e. it’s characterized by the “aggregate”
traffic coming from or going into the VPN
11
Relationship Between
VPN and SLA
• SLA negotiated between customer & service
provider
– Traffic characteristics and QoS requirements
– In practice, negotiated parameters are coarse grained
• Support for different QoS classes within VPN
– Resources are managed on a VPN-specific basis;
SLAs negotiated for overall VPN rather than each specific
QoS class
» Schedule only at the edges
» Mark packets and schedule within the core
– Resources are managed on an individual QoS basis
12
SLA-VPN Summary
• Different choices for the implementation of
HOSES in VPNs
– Integrated service framework (controlled load,
guaranteed load) with signaling protocol like RSVP
– Differentiated service framework (DS byte of IP header)
– MPLS environment (LSP tree)
• Security
– IPSec is recommended with the VPN (secure tunnels)
– Only limitation is scalability
13
Outline
•
•
•
•
•
Applications and Performance
Service Level Agreements
Traffic Engineering to Deliver SLAs
Bandwidth Brokering
Clearing House
14
Overview
• Traffic Engineering
– Definitions
– Objectives
– Various approaches
• MPLS-based single-path traffic engineering
• Framework for MPLS-based traffic engineering in
a DiffServ network
15
Traffic Engineering
Definitions: Traffic Trunk
• Traffic Trunks
– Behavior aggregate
0
1
» Stream of packets equivalent
from a forwarding point of view
2
10 Mb/s,
delay<150ms
3
– Attributes for traffic engineering
» B/w requirements and traffic
characteristics
4
» QoS requirements
» Routing constraints
» Survivability requirements
6
5
16
Traffic Engineering Definitions:
Survivability
• Survivability (resilience): achieving a situation
in which capacity is available in some or all
failure conditions to restore (part of) the
affected traffic
• Protection type: 1:1
• Protection resource allocation:
– Dedicated
– Shared
• Restoration mechanism:
– Revertive (primary path can be used)
– Non-revertive (service rolls over when path fails)
17
Traffic Engineering Definitions:
Preemption, Release, Oversubscription
• Preemption
– In failure state, capacity made available by unaffected
trunks to reroute high-priority affected traffic
• Release
– In failure state, capacity allocated to affected trunks can
be released
• Oversubscription
– Capacity allocated on a link is less than the combined
demands of all trunks running over the link (statistical
multiplexing)
18
Traffic Engineering
Objectives
• Goal: efficiently map traffic onto an existing
network in such a way as to optimize
– Utilization of network resources: facilitate the operation
of the network
– Performance of the network: ensure that the network
offers its customers the QoS they purchased
• Requirements:
– Adaptability to changes in the network configuration
– Capability to evolve existing traffic engineering solutions
into new ones with a limited amount of service disruption
– Capability to adhere to administrator-defined policies
19
Traffic Engineering
Resource-oriented Objectives
• Link capacity is allocated
– Utilization is defined as the relative amount of link
capacity that has been allocated
• Load balancing
– Maximizing balance
– Balance is defined as (1 - maximum link utilization)
– Goal: avoiding congestion
• Minimizing capacity usage
– Capacity usage is defined as the sum of all allocated
capacity
20
Traffic Engineering
Traffic-oriented Objectives
• Trunks receive a share of the capacity
– Share: the absolute amount of b/w guaranteed to a trunk
in excess of its agreement with the operator
• Maximizing fairness
– Fairness: minimum share relative to a weight measuring
the expected excess bandwidth
– Goal: avoiding arbitrary discrimination of some of the
customers
• Throughput
– Maximizing the sum of all guaranteed bandwidths
– Goal: maximizing revenue
21
Traffic Engineering
Various Approaches
•
•
•
•
•
•
Manual traffic engineering by a team of experts
OSPF with optimized weights
Equal Cost Multi Path (ECMP)
Optimized Multi-Path (OMP)
MPLS label switching with constraint-based routing
MPLS label switching with offline traffic engineering
tool
22
MPLS-based Traffic Engineering
Problem Taxonomy
TE
(LP) Linear
program
Multi-Path TE
(MPTE)
Single-Path TE
(SPTE)
(MINLP) Mixed
integer nonlinear program
MPTE-RO
MPTE-TO
SPTE-TO
(MILP)
Mixed
integer LP
SPTE-RO
• Exact algorithm based on MILP
reformulation
• Path-fixing heuristics
23
MPLS-based Traffic Engineering
Network Description
• List of nodes
• List of links:
– Working/protection/total capacity
– Link color
– Utilization
• Failure state description
– Single link failures
– Single node or link failures
24
MPLS-based Traffic Engineering
Trunk Description
• List of source-destination pairs:
–
–
–
–
–
–
Demand (pipe model)
Protection type: shared, dedicated, ...
Protection level: support of partial protection
Preemption level: support of partial preemption
(Weighted) share
List of available paths
25
MPLS-based Traffic Engineering
Routing Description
• Path are selected from a
list of available paths
• Path list construction:
– No survivability:
» k shortest paths
– Survivability:
» overlap definition
» sorted k shortest paths
» paths are added that
minimize the overlap
with the existing set
0
1
4
8
3
2
3
1
7
4
6
2
5
26
MPLS-based Single-Path TE
Survivability
• Single node and link failures only
• Fast reroute:
– unique protection path
– no release
• No backup capacity allocation on single-point
of failure links
• Choice between shared/dedicated protection
27
MPLS Support of DiffServ
• DiffServ: Per-Hop Behaviors
– Expedited Forwarding: absolute bandwidth, delay & delay
jitter and packet loss guarantees
– Assured Forwarding: relative bandwidth, delay & delay
jitter and packet loss guarantees
– Best Effort: connectivity guarantee
• MPLS support:
– L-LSP: label inferred, different label per BA
– E-LSP: exp-inferred, different label per OA
28
DiffServ Requirements
• Bandwidth differentiation
–
–
–
–
bandwidth & capacity allocation model
traffic classes
traffic types & capacity allocation
setting the excess bitrates
• Delay and delay jitter differentiation
• Loss differentiation
29
Traffic Engineering Model
B/W & Capacity Allocation Model
bandwidth guarantee
to trunk k
0
committed
bitrate dk
guaranteed
bandwidth Dk+ yk
peak
bitrate
Dk
Dk+ Dk
share yk
excess bitrate
unconditional
conditional
Dk
partial conditional
guarantee
guarantee: no
guarantee
oversubscriptio oversubscriptio (fair allocation of remaining
capacity, oversubscription)
n
n factor sk
0
dedicated
capacity dk
allocated capacity
dk+sk-1(Dk-dk+yk)
capacity
allocation on link 30
Traffic Engineering Model
Traffic Classes
class of k
EF
AF 1/2/3/4
BE
characterisation of k
committed bit rate dk
peak bit rate Dk=dk
excess bit rate k=0
weight wk=0
oversubscription factor k=1
committed bit rate dk
peak bit rate Dk
excess bit rate k
weight wk
oversubscription factor k
committed bit rate dk=0
peak bit rate Dk=0
excess bit rate k
weight wk
oversubscription factor k
behaviour required from traffic engineering
algorithm
guaranteed bandwidth allocation
load balancing
minimising capacity usage
guaranteed bandwidth allocation
weighted fair bandwidth allocation
maximising throughput
weighted fair bandwidth allocation
maximising throughput
31
Traffic Engineering Model
Traffic Types & Capacity Allocation
working
capacity
soft guarantee
loose guarantee
protection
capacity
link capacity
hard guarantee
no oversubscription
oversubscription
oversubscription
nominal traffic
(non-excess)
burst traffic
(non-excess)
non
pre-emptible
excess traffic
pre-emptible
excess traffic
unused
spare
capacity
32
Traffic Engineering Model
Natural Setting of Excess Bitrates
access
network
A( sk )
x
ingress sk
e G ( sk
R( sk ) min xe , A( sk )
e ( s )
G k
e
)
effective
access rate
diffserv
domain
weighted fair
allocation of
unallocated
capacity
trunk k
egress tk
wk R( sk ) Dk ' d k ' k'1 1 Dk '
k ' K EF ( s k )
k ' K AF ( s k )
k
wk ' wk '
k ' K AF ( s k )
k ' K BE ( s k )
33
DiffServ Requirements
• Bandwidth differentiation
• Delay and delay jitter differentiation
–
–
–
–
Forwarding
Scheduling
EF delay
Non-EF delay
• Loss differentiation
34
Traffic Engineering Model
Forwarding
buffer
acceptance
scheduling
input
output
loss
queueing
35
Traffic Engineering Model
Scheduling
EF
R(EF)
SP
AF1
AF2
AF3
WTP
WFQ
R(AF)
AF4
BE
WFQ
R(BE)
36
DiffServ Requirements
Delay Differentiation: EF Delay
DEF Dqueue Dserial Dmin
• Delay of a single EF-hop
– Markov process, low signaling load
Dqueue, SH Dqueue, SH EF , R, MTU
• Delay of a series of EF-hops
– asymmetric EF-load: lightly-heavily loaded links
Dqueue,chain Dqueue,chain EF , R, MTU , H , H hl
• Busy periods of EF-traffic
• Conclusion
– Delay upper bound expressed as upper bound on EF-load
– Delay jitter < Dqueue
37
DiffServ Requirements
EF Delay Differentiation
(1-P) quantiles of the queuing delay of EF packets
queuing delay [ms]
20
0
decreasing P
0
50
EF load [%]
100
38
DiffServ Requirements
Delay Differentiation: Non-EF Delay
• WFQ with service
interruptions
– fluid flow assumption
– exponential distributions
• Conclusion:
low-priority
traffic
queues
class 1
– service differentiation
possible
– proportionality difficult to
ensure in all cases
b1
b2
class 2
b3
class 3
Service
interrupt of
low-priority
traffic
WFQ
R(1-Pint)
R
Pint
Tint
Busy periods
of highpriority
39
traffic
DiffServ Requirements
Non-EF Delay Differentiation
Average delay ratio
class 1 load
class 5 load
40
DiffServ Requirements
Loss Differentiation
• Loss calculations are made based on
class
in/out profile
EF
Always in
(edge discarding/shaping)
Green/yellow/red
(TrTcM)
Always in
(no profile)
AF
BE
buffer
acceptance
TAIL
Service rate
RIO
c R(1-EF)
TAIL
c R(1-EF)
• Buffer rejection prob
of class c with drop
precedence d
R
1-acd(q)
1
rcd
0
0
cdQc
’cdQc
Qc
q
41
Verifying that SLAs are
Satisfied
• Carrier-based reports
–
–
–
–
Interpretation of report and its statistics
Gaps in statistics gathering
Process for gathering data
Optimization of the network
» Capital investment to evolve the network
» Recurring transmissions costs
» Bandwidth growth caused by rogue users/apps
– Ability to warn users before performance degradation
becomes noticed
• Active monitoring may be necessary
42
Outline
•
•
•
•
•
Applications and Performance
Service Level Agreements
Traffic Engineering to Deliver SLAs
Bandwidth Brokering
Clearing House
43
Internet2 Research Project
• Quality of Service Backbone (QBone)
– Experimental deployment of DiffServ capabilities into a WAN
networking testbed to determine what works and doesn’t work
– DiffServ tenets:
» Aggregation into small # of DS behavior aggregates in core
» Bilateral service level agreements (SLAs) between domains
» Max flexibility in local resource management decisions
– Bandwidth Broker (BB) Architecture for cooperatively allocating
bandwidth among network flows
» Premium vs. best effort service
» Focus on inter-domain signaling, with separate schemes for
DiffServ implemented in each participating domain
44
Internet2 Research Project
• Bandwidth Broker Resource Managers
– Based on IETF DiffServ
– Service Level Specification/Negotiation left unaddressed
– Only kind of service currently managed is QBone Premium
Service (QPS)
» Quantitative, absolute b/w assurance within a domain,
intra-domain from edge to edge, or inter-domain
» No loss due to congestion, no latency guarantees,
worst-case jitter bounds (except for IP route
changes)
– Generalization to other kinds of services
» When/where will service be provided?
» How is desired level of service specified?
» How is provided service described? Quantitative vs.
qualitative
45
BB
RSVP
SLA
Transit Domain 2
ER
ER
SLA
BB
BB
Source Domain
BB
ER
Transit Domain 1
ER
ER
Data Flow
SLA
ER
ER
BB
Data Flow
Sink Domain
46
Bandwidth Brokers
• Brokers as “Oracles”
– Receive resource allocation request (RAR) from
» An element in the domain that the BB controls
» A request from a peer (adjacent) bandwidth broker
– Admission control: BB responds with confirmation or denial of service
via a Resource Allocation Answer (RAA)
– Input to BB: space-time coordinates of the service, kind of service
(its parameters), characteristics of the input
• SLAs in this context
– Bilateral, concluded between peered domains
– Guarantee traffic offered by (peer) customer domain, meeting
certain conditions, carried by the service provider domain to one or
more egress points with one or more particular service levels
– May be hard or soft, carry tariffs, and certain monetary or legal
consequences if not met
47
Bandwidth Brokers
• SLS in this context
– Contains technical details of the SLA
– Asserts traffic of a given class, meeting specific policing
conditions, entering the domain on a given link, will be treated
according to a particular PHB(s)—per hop behaviors (e.g.,
expedited forwarding)
– If traffic destination is not receiving domain, then pass it to
another domain (on path toward destination according to routing
tables) with similar (compatible and comparable) SLS specifying
an equivalent (set of) PHB(s)
• TCS: Traffic Conditioning Specification
– Specifies classifier rules, corresponding traffic profiles &
metering, marking, discarding, shaping rules applied to traffic
aggregates selected by the classifier
48
Bandwidth Brokers
• Reservations
– Actually committed resources, but not necessarily used
– Tracked by BB, shared with network management system
– Actual resource use tracked by routers, possibly
monitored by bandwidth broker
49
Bandwidth Brokers:
Nodal Architecture
50
Bandwidth Brokers
• Key Protocols
– User/application protocol:
resource allocation requests from
within BB's domain
– Intra-domain protocol:
communicate BB decisions to
routers within its domain as router
configuration parameters for QoS
operation/possibly communicate
with policy enforcement agent
within the router
– Inter-domain protocol: provide
mechanism for peering BBs to ask
for/answer with admission control
decisions for aggregates and
exchange traffic
• Data Interfaces
– Routing Tables: inter-domain info
determines egress router(s) &
downstream DS domains whose resources
committed before accepting RARs; may
require intra-domain info to determine
paths and resource allocation information
within the domain
– Data Repository: common info for BB
components:
» SLS info for ingress/egress routers
» Current reservations/resource
allocations
» Router configurations
» Service/DSCP mappings
» Policy info
» Network mgmt info
» Router Monitoring info
» Authorization/authentication DBs
for users & peers
51
Previous Work
• Static resource pre-partitioning
– E.g., PSTN trunking
– Pros: Dedicates resources for end-to-end flows
– Cons: Based on worst-case analysis, leading to inefficient
network utilization => Costly and not adaptive to dynamic
traffic fluctuation
• Int-Serv with RSVP
– On-demand, per-flow, end-to-end reservation and
admission control
– Pros: Provides end-to-end QoS assurance
– Cons: Requires per flow state information in the core
networks => Not scalable!
52
Previous Work (cont’d)
• Diff-Serv Bandwidth Brokers (BBs)
– Admission control only at the edge and BBs negotiate pairwise SLAs with neighboring domains
– Pros: Preserves scalability
– Cons:
» Admission control is based on local information
and the core supports per-hop behaviors =>
Unpredictable end-to-end QoS
» One centralized broker per domain may cause a single
point of congestion/failure in large domains
53
Outline
•
•
•
•
•
Applications and Performance
Service Level Agreements
Traffic Engineering to Deliver SLAs
Bandwidth Brokering
Clearing House
54
Clearinghouse
H.323
Gateway
PSTN
IP Based
Core
GSM
Wireless
Phones
Video conferencing,
Distance learning
Web surfing, emails,
TCP connections
VoIP
(e.g. Netmeeting)
Vision: data, multimedia (video, voice, etc.) and
mobile applications over one IP-network
Question: How to regulate resource allocation within
and across multiple domains in a scalable
manner to achieve end-to-end QoS?
55
Clearinghouse Goals
• Design/build distributed control architecture for
scalable resource provisioning
– Predictive reservations across multiple domains
– Admission control & traffic policing at edge
• Demonstrate architecture’s properties and performance
– Achieve adequate performance w/o edge per-flow state
– Robust against traffic fluctuations and misbehaving flows
• Prototype proposed mechanisms
– Min edge router overhead for scalability/ease of deployment
56
Clearinghouse Architecture
• Clearinghouse distributed architecture--each CHnode serves as a resource manager
• Functionalities
– Monitors network performance on ingress & egress links
– Estimates traffic demand distributions
– Adapts trunk/aggregate reservations within & across domains
based on traffic statistics
– Performs admission control based on estimated traffic matrix
– Coordinates traffic policing at ingress & egress points for
detecting misbehaving flows
57
Multiple-ISP Scenario
Host
ISP m
Ingress Router
ER
Host
ER IR
ISP 1
ISP n
ISP 2
Egress Router
IR
• Hybrid of flat and hierarchical structures
– Local hierarchy within large ISPs
» Distribute network state to various CH-nodes and reduces
the amount of state information maintained
– Flat structure for peer-to-peer relationships across
independent ISPs
58
Illustration
Host
CHo
ISP1
LD1
Edge CH
o
Router
LD0
LD0
CH1
•
•
A hierarchy of Logical domains (LDs)
–
e.g., LD0 can be a POP or a group of neighboring POPs
A CH-node is associated with each LD
– Maintains resource allocations between ingress-egress pairs
– Estimates traffic demand distributions & updates parent CH-nodes
59
Illustration
Host
Host
LD1
ISP n
CHo
ISP1
Edge CH
o
Router
ISP m
CH1
LD0
LD0
CH1
CH1
Peer-Peer
• Parent CH-node
– Adapt trunk reservations across LDs for aggregate traffic
within ISP
• Appears flat at the top level
– Coordinate peer-to-peer trunk reservations across multiple ISPs
60
Key Design Decisions
• Service model: ingress/egress routers as endpoints
– IE-Pipe(s,d) = aggregate traffic entering an ISP domain at IR-s,
and exits at ER-d
• Reservations set-up for aggregated flows on intraand inter-domain links
– Adapt dynamically to track traffic fluctuation
– Core routers stateless; edge maintain aggregate states
• Traffic monitoring, admission control, traffic
policing for individual flows performed at the edge
– Access routers have smaller routing tables; experience lower
aggregation of traffic relative to backbone routers
– Most congestion (packet loss/delay) happens at edges
61
Traffic-Matrix Admission Control
Host Network
Rnew
• Mods to edge routers
A
IR-s
Accept
or Reject
POP 1
CH
POP 2
– Traffic monitors passively
measure aggregate rate of
existing flows, M(s,d)
– IR-s forwards control messages
(Request/Accept/Reject)
between CH and host/proxy
– Estimate traffic demand
distributions, D(s,:), and report
to the CH
• CH
B
ER-d
Host Network
Traffic Monitor
– Leverages knowledge of
topology and traffic matrix to
make admission decisions
62
Group Policing for Malicious Flow
Detection
Request
IR-s
A
Accept
(with Fid)
POP 1
CH
Update
TBFs
POP 2
B
Host Network
ER-d
TBF Traffic Policer
• CH assigns Fid if the flow is
admitted
– Let FidIn = x, FidEg = y
x
y
x
a
x
b
TBF for group-x
Traffic Policer at IR-s aggregate flows
based on FidIn for group policing
x
y
t
y
w
y
TBF for group-y
Traffic Policer at ER-d aggregate flows
based on FidEg for group policing
* Traffic Policer at IR or ER only maintains
total allocated bandwidth to the group
(aggregate state) and not per-flow
reservation status
63