ppt - Stanford University Networking Seminar
Download
Report
Transcript ppt - Stanford University Networking Seminar
SYNERGISTIC
NETWORK OPERATIONS
Saqib Raza
University of California, Davis
A SNAPSHOT OF NETWORK OPERATIONS
Forwarding
Inter-domain TE
Scheduling
Intra-domain TE
Firewalls
Maintenance
Traffic
Policing
2
Power
Accounting
Diagnostics
Management
Forensics
Overlay Routing
EXAMPLE: INTER-OPERATION DYNAMICS
B
A
Initially, traffic between overlay nodes
A and D does not traverse ISP-A
D
ISP A
ISP-A alters link weights to direct
away from link (x,y).
C
Sensing reduced delay through ISP-A the
routing overlay starts sending traffic from
A to D through ISP-A
3
Overlay Routing
Intra-domain TE
THE HIPPOCRATIC OATH
FOR NETWORK OPERATIONS
Do No Harm
Operations should be cognizant of
any disruptive effects to other
operations.
Strive to do Good
Operations should seek to
enhance the efficacy of other
operations.
4
SUMMARY/OUTLINE
Interface-Split Forwarding for Finer-Grained Traffic
Engineering [Performance `07, Eval `07]
Cooperative Peer-to-Peer Repair of 3G Broadcast Losses
[Broadnets `08, ICC `08, ICME `07]
Network-level footprints of Online Social Network Applications
[IMC `09, IMC `08]
Graceful Network State Migration [Infocom `09]
MeasuRouting: A Framework for Routing Assisted Traffic
Monitoring [Infocom `10]
Future Directions
5
Do No Harm
Maintenance
Intra-domain TE
GRACEFUL NETWORK MIGRATION
6
minimizing performance disruption during
planned network maintenance …
Joint work with:
Yuanbo Zhu & Chen-Nee Chuah (UC Davis)
MOTIVATION
Network
Events
Performance
Disruption
Inadvertent
Premeditated
e.g. fiber-cuts,
router crashes
e.g. firmware
upgrades
Premeditated network tasks can be judiciously
scheduled to minimize performance disruption
7
GRACEFUL STATE MIGRATION (GSM)
GSM represent a class of problems
characterized by two essential
characteristics:
Network needs to transition from an
initial state to a final state
Sequence of atomic network
operations (e.g.
deactivating/activating a router or link)
8
SAMPLE APPLICATION
Link Maintenance Scheduling (LMS)
Maintenance activities account for more than 20% of
failures in backbone ISPs [Markopoulou ‘04].
Weekly maintenance windows: multiple
links need to be maintained in each window.
Each link needs to be deactivated
and then reactivated .
Link failures can disrupt intra-domain TE.
9
LMS: ILLUSTRATIVE EXAMPLE
b
Link Weights
1
2
1
a
c
e
1
3
1
Link Capacity = C
1
1
f
Flow Size = ½ C
g
Max Link Util = 50%
d
I need to repair links
(a,c) and (c,f)
Careful! Watch out for
the Maximum Link
Utilization (MLU)
10
b
1
2
c
a
e
1
3
1
b
2
1
1
1
g
f
a
a
1
1
3
e
1
1
g
f
1
2
1
1
3
b
100%
c
1
e
d
2
1
c
1
d
b
1
1
g
f
a
d
c
1
3
1
e
1
1
g
f
d
11
(a,c) ↓
(a,c) ↑
(c,f) ↓
(c,f) ↑
MLU = 100%
b
1
2
c
a
e
1
3
1
b
2
1
1
g
f
a
a
1
b
e
1
3
1
1
3
1
1
g
f
d
2
c
e
c
d
b
1
2
1
1
1
1
g
f
a
d
c
1
3
1
e
1
1
g
f
d
12
(a,c) ↓
(c,f) ↓
(c,f) ↑
(a,c) ↑
MLU = 50%
LMS: ILLUSTRATIVE EXAMPLE
Schedule 1
(a,c) ↓
(a,c) ↑
(c,f) ↓
(c,f) ↑
MLU = 100%
(c,f) ↓
(c,f) ↑
(a,c) ↑
MLU = 50%
Schedule 2
(a,c) ↓
The schedule with multiple links simultaneously
deactivated causes less disruption
13
s0
s1
sn
s3
THE GENERAL GSM PROBLEM
(s0,sn) = (sinitial,sfinal)
(si,si+1) ∈ A
n≤B
min C(s0,s1, …sn-1,sn)
Specify (sinitial,sfinal), A, B, & C to define
a concrete GSM problem, e.g., LMS
n
repaired
r
deactivated d
not repaired n
,
nr
r
dr
n
d
A
14
A GENERAL GSM SOLUTION FRAMEWORK
c2k(sx,sz)=miny(ck(sx,sy) + ck(sy,sz))
• The minimum cost of going from sx to sz in 2k
steps is equal to the minimum cost of going
from sx to sy in k steps plus the cost of going
from sy to sz in k steps.
15
COMPUTATIONAL COMPLEXITY
002
GSM is a
combinatorial
optimization
problem
011
001
122
101
000
010
212
020
100
220
110
200
222
Solution space
of LMS has
2n!/2n
solutions
16
ANTS COLONY OPTIMIZATION
f
n
f
n
f
n
Swarm
intelligence
metaheuristic
Near optimal
solutions for
the Traveling
Salesman
Problem
17
PERFORMANCE EVALUATION
Single-Failure Heuristic works
well generally
What about the worst case?
> 20 node/80 link topology
> 100 experiments per data point
> Report Cost Reduction (MLU)
over Single-Failure Heuristic
18
GST: APPLICATIONS
• Link Weight Assignment Scheduling
• Network Evolution & Upgrade
• MPLS Reroute Sequencing
Link Weight
Reassignment
Scheduling
19
OUTLINE
Graceful Network State Migration [Infocom `09]
MeasuRouting: A Framework for Routing Assisted Traffic
Monitoring [Infocom `10]
Future Directions
20
Strive to do Good
Measurements
Intra-domain TE
MEASUROUTING
21
a framework for routing assisted
network measurements…
Joint work with:
Guanyao Huang & Chen-Nee Chuah (UC Davis)
Srini Seetharaman & Jatinder Singh (DT Labs)
THE MONITOR PLACEMENT PROBLEM
Oops!
important
very important
?
?
An evolving universe
1. Measurement objectives change
2. New Traffic gets introduced
3. Traffic placement changes
22
Measurements
Intra-domain TE
PROBLEM STATEMENT
• Configure intra-domain routing
to route important traffic subpopulations across paths where
they could best be monitored,
while avoiding disruption to
default traffic engineering.
23
TE POLICY VIOLATION
Congestion
24
COMPLIANT REROUTING
Monitor
TE policy is defined for
aggregated flows
Sub-populations of aggregated flows,
indistinguishable from a TE perspective, can be
distinguishable from a measurement perspective
25
OTHER ENABLING FACTORS
Aggregate TE Objectives
• Aggregate traffic placement may be altered
without violating TE 0bjectives: e.g., links
with utilization below maximum utilization
have free capacity
TE-Measurement Tradeoff
• TE objectives may be violated to maximize
global network utility.
26
1. Aggregated TE Flows
e.g. OD pair traffic
2. Traffic placement given:
Γ(i,j)E
Measurement Flowsets
(micro-flowsets)
TE Flowset (macro-flowset)
1. TE flowset de-composes into
k measurement flowsets
2. A measurement flowset has:
a) Size
b) Importance
3. Decision variable:
(i,j)E
27
27
MEASUROUTING OBJECTIVE
1
2
b
yConstraints
ij
Network Flow Conservation
Flowset
Size
Flowset
Routing
i
p
Ensureijthat TE performance remains
y
pijijof
bythe
iy default TE
within some value
Link Sampling
Flowset
performance
Rate
Points gained for sampling
flowset y on link (i,j)
Importance
Maximize score across all measurement flowsets
across all links
28
THE LOOPING PROBLEM
Measurement-flowset can only
traverse links in a Directed
Acyclic Graph (DAG)
RSR: use DAG for the
associated OD pair
NRL: add additional
links to the original DAG
29
SYNTHETIC EXPERIMENTS
Select the number of Measurement Flowsets per OD pair (K)
Divide all flows between an OD pair into the K
measurement flowsets
Assign size and importance of the measurement flowsets
Choose the permissible TE violation parameter
Report improvement in Measurement Score over
default routing
30
NETWORK SIZE
AS1221
44 nodes
AS1239
52 nodes
K : 10
Importance : Pareto (=2)
Performance sensitive to number of multiple paths
31
DEGREES OF FREEDOM
AS1221
44 nodes
: 0.1
Importance : Pareto (=2)
Diminishing marginal returns of increasing k
32
A REAL APPLICATION
Trace capture infrastructure
selectively deployed
Increase representation of
interesting traffic in traces
Trace Capture for
Deep Packet
Inspection (DPI)
Abilene
9 nodes
Q(i)
P(i)
ln(1-|P(i)-Q(i)|)
33
REAL WORLD MEASUROUTING
Underlying Routing Substrates
• Configurable Routing: MPLS, OpenFlow
• IP Routing: Equal Cost Multipath
Applications
• Heterogeneous Sampling Algorithms
• Distributed Firewalls
34
OUTLINE
Graceful Network State Migration [Infocom `09]
MeasuRouting: A Framework for Routing Assisted Traffic
Monitoring [Infocom `10]
Future Directions
35
OPTIMAL STATES OF BEING
Graceful Network
State Migration
• Data Center Job Scheduling
• Data Center Load Distribution
36
DATA CENTER JOB SCHEDULING
Power Management
Scheduling
Power conserved by switching off data center
components, dynamic voltage scaling etc.
Jobs scheduled on different servers to optimize
performance (MapReduce, Dyrad).
Jointly optimize job scheduling and power
management decisions.
37
DATA CENTER LOAD DISTRIBUTION
Power Management
Inter-domain TE
Data center operation costs vary geographically due
to energy market price fluctuations [Qureshi `09]
Makes sense to operate data centers in diverse
energy markets.
Data center load can not be instantaneously
shifted from one location to another.
38
Chalk out optimal state trajectory of BGP route advertisements.
A CALCULUS FOR SYNERGISTIC
OPERATIONS
Revenue
Contribution
Network-wide
Security
Global
Utility
Each marginal unit of a
resource ought to be allocated
to the operation that derives
the highest marginal utility
from consuming it.
CPU Cycles
Bandwidth
Common
Resource
Pool
Power
39
Questions
wwwcsif.cs.ucdavis.edu/~raza
www.ece.ucdavis.edu/rubinet
40
MEASUREMENT UTILITY DIVERSITY
AS1221
44 nodes
k=10; M=3000
Importance: Pareto (=2)
Performance improves with variance in importance
41
LMS IN A SMALL NETWORK (ABILENE)
42
MEASUROUTING PATH INFLATION
43