Network Updates - CSE Labs User Home Pages

Download Report

Transcript Network Updates - CSE Labs User Home Pages

Consistent Network Updates
and
Dynamic Update Scheduling
SDN: Paradigm Shift in Networking
• Direct, centralized updates
of forwarding rules in
switches
Controller
• Many benefits
– Traffic engineering [B4, SWAN]
– Flow scheduling [Hedera, DevoFlow]
– Access control [Ethane, vCRIB]
– Device power management [ElasticTree]
– …
2
SDN Makes it Possible to
Program the Network
• Network virtualization
• User mobility and VM migration
• Server load balancing
• Dynamic access control
• Using multiple wireless access points
• Energy-efficient networking
• Adaptive traffic monitoring
• Denial-of-Service attack detection
• <Your app here!>
Network Control Loop
Compute Policy
Write
policy
Read
state
OpenFlow
Switches
Frenetic Language Abstractions
Module Composition
Query
languag
e
Consistent
updates
OpenFlow
Switches
Combining Many Networking Tasks
Monolithic
application
Monitor + Route + FW + LB
Controller Platform
Hard to program, test, debug, reuse, port, …
Modular Controller Applications
A module for
each task
Monitor
Route
FW
LB
Controller Platform
Easier to program, test, and debug
Beyond Multi-Tenancy
Each module controls a
different portion of the traffic
Slice 1
Slice 2
... Slice n
Controller Platform
Relatively easy to partition rule space, link
bandwidth, and network events across modules
Modules Affect the Same Traffic
Each module partially
specifies the handling of the
traffic
Monitor
Route
FW
LB
Controller Platform
How to combine into a complete application?
Parallel Composition
dstip = 1.2.3.4  fwd(1)
dstip = 3.4.5.6  fwd(2)
srcip = 5.6.7.8  count
Monitor on
source
+
Route on
destination
Controller Platform
srcip = 5.6.7.8, dstip = 1.2.3.4  fwd(1), count
srcip = 5.6.7.8, dstip = 3.4.5.6  fwd(2), count
srcip = 5.6.7.8  count
dstip = 1.2.3.4  fwd(1)
dstip = 3.4.5.6  fwd(2)
Example: Server Load Balancer
• Spread client traffic over server replicas
– Public IP address for the service
– Split traffic based on client IP
– Rewrite the server IP address
• Then, route to the replica
10.0.0.1
1.2.3.4
10.0.0.2
clients
load balancer
server replicas
Sequential Composition
srcip = 0*, dstip=1.2.3.4  dstip=10.0.0.1
srcip = 1*, dstip=1.2.3.4  dstip=10.0.0.2
Load
Balancer
>>
dstip = 10.0.0.1  fwd(1)
dstip = 10.0.0.2  fwd(2)
Routing
Controller Platform
srcip = 0*, dstip = 1.2.3.4  dstip = 10.0.0.1, fwd(1)
srcip = 1*, dstip = 1.2.3.4  dstip = 10.0.0.2, fwd(2)
Dividing the Traffic Over Modules
• Predicates
– Specify which traffic traverses which modules
– Based on input port and packet-header fields
dstport != 80
Monitor
+
Routing
dstport = 80
Load
Balancer
>>
Routing
Topology Abstraction
• Present an abstract topology
– Information hiding: limit what a module
sees
– Protection: limit what a module does
– Abstraction: present a familiar interface
Abstract view
Real network
14
High-Level Architecture
M1
M2
M3
Main
Program
Controller Platform
Reading State: Query Language
• Applications read state
– Traffic counters in switches
– Packets sent to the controller
• Minimize controller overhead
Learning Host Location
Select(packets)
GroupBy([srcmac])
SplitWhen([inport])
Limit(1)
– Filter using high-level patterns
– Limit the amount of data
• Controller platform
– Installs rules
– Reads counters
– Handles packets
Traffic Monitoring
Select(bytes)
Where(inport:2)
GroupBy([dstmac])
Every(60)
Frenetic Language-Based
Abstractions
Module Composition
Query
languag
e
Consistent
updates
OpenFlow
Switches
Network Update is Challenging
• Requirement 1: consistent
– No loop, no security hole, etc.
• Requirement 2: Minimize performance
impact: e.g.,
– no or minimal congestion
Controller
• Requirement 3: fast
– The agility of
control loop
Network
18
Why Consistent Network Update?
A
F2
C
B
F3
F1
If yes, what should be the
proper order of updates?
F4
D
Does the order we update the
rules in the switches matter?
E
Current State
A
B
F3
C
F2
F1
D
F4
E
Target State
19
Why Consistent Update?
• Transient Policy Violations
and Security Holes
Does the order we update the
rules in the switches matter?
If yes, what should be the
proper order of updates?
20
Writing Policy: Path for New Flow
• Rules along a path installed out of order?
– Packets reach a switch before the rules do
packets
Must think about all possible packet and event orderings.
21
Writing Policy: Avoiding Disruption
Invariants
• No forwarding loops
• No black holes
• Access control
• Traffic waypointing
Writing Policies: Consistent Updates
• Transition from policy P1 to P2
– Security: new access control lists
– Routing: new shortest paths
• Transient policy violations
– Packets in flight during policy change
– Loops, blackholes, unauthorized traffic
• Consistent update semantics
– Packets experience either P1 or P2
– … but never a mixture of the two
CHANGE
We Can Believe In
Writing Policy: Update Semantics
• Per-packet consistency
P1
– Every packet is processed by
– … policy P1 or policy P2
– E.g., access control, no loops
or blackholes
• Per-flow consistency
P2
– Sets of related packets are processed by
– … policy P1 or policy P2,
– E.g., server load balancer, in-order delivery,
…
Writing Policy: Policy Update
• Simple abstraction
– Update entire configuration at once
P1
• Cheap verification
– If P1 and P2 satisfy an invariant
– Then the invariant always holds
• Run-time system handles the rest
P2
– Constructing schedule of low-level updates
– Using only OpenFlow commands!
25
Writing Policy: Two-Phase Update
• Version numbers
– Stamp packet with a version number (e.g., VLAN tag)
• Unobservable updates
– Add rules for P2 in the interior
– … matching on version # P2
• One-touch updates
– Add rules to stamp packets
with version # P2 at the edge
• Remove old rules
– Wait for some time, then
remove all version # P1 rules
26
Writing Policy: Optimizations
• Avoid two-phase update
– Naïve version touches every switch
– Doubles rule space requirements
• Limit scope
– Portion of the traffic
– Portion of the topology
• Simple policy changes
– Strictly adds paths
– Strictly removes paths
27
Dynamic Scheduling of
Network Updates
Beyond Consistent Network
Update
• Performance Impact of Network Updates
• Speed of Network Updates
A
C
B
F2: 5
F1: 5
F3: 10
Current State
C
B
F2: 5
F3: 10
F1: 5
F4: 5
D
A
E
D
F4: 5
E
Target State
29
What is Consistent Network Update
A
F1: 5
B
F2: 5
D
F4: 5
C
F3: 10
A
C
B
F2: 5
F3: 10
F1: 5
E
D
F4: 5
E
Target State
Current State
A
F1: 5
B
F2: 5
D
F4: 5
C
F3: 10
E
Transient State
• Asynchronous updates can cause congestion
• Need to carefully order update operations
30
Existing Solutions are Slow
• Existing solutions are static [ConsistentUpdate’12, SWAN’13,
zUpdate’13]
– Pre-compute an order for update operations
Static
Plan A
A
F1: 5
B
F2: 5
D
F4: 5
F3
F2
F4
F1
C
F3: 10
A
C
B
F2: 5
F3: 10
F1: 5
E
Current State
D
F4: 5
E
Target State
31
Existing Solutions are Slow
• Existing solutions are static [ConsistentUpdate’12, SWAN’13,
zUpdate’13]
– Pre-compute an order for update operations
Static
Plan A
F3
F2
F4
F1
• Downside: Do not adapt to runtime conditions
– Slow in face of highly variable operation
completion time
32
Operation Completion Times
are Highly Variable
• Measurement on commodity switches
>10x
33
Operation Completion Times are
Highly Variable
• Measurement on commodity switches
• Contributing factors
– Control-plane load
– Number of rules
– Priority of rules
– Type of operations (insert vs. modify)
34
Static Schedules can be Slow
Static
Plan A
F3
F2
F4
F1
F1
F2
F3
F4
F1
F2
F3
F4
1
Static
Plan B
F3
F2
F4
2
5
Time
F1
F1 F2
F3
F4
B
F2: 5
1
2
3
4
1
2
3
4
C
F3: 10
2
3
4
5Time
A
B
F2: 5
C
F3: 10
No static
schedule is a clear winner
under all
F1: 5
F4: 5
D
E
conditions! D F4: 5 E
F1: 5
Current State
5
Time
F1
F2
F3
F4
1
A
4
3
Target State
35
5
Time
Dynamic Schedules are
Adaptive and Fast
Static
Plan A
F3
F2
F4
F1
F3
F2
Static
Plan B
Dynamic
Plan
F3
F2
F1
F4
F1
Adapts to actual
conditions!
F4
A
B
F2: 5
C
F3: 10
A
B
F2: 5
C
F3: 10
No static
schedule is a clear winner
under all
F1: 5
F4: 5
D
E
conditions! D F4: 5 E
F1: 5
Current State
Target State
36
Challenges of Dynamic Update
Scheduling
• Exponential number of orderings
• Cannot completely avoid planning
F5: 10
A
B
F2: 5
F1: 5
F3: 5
D
F5: 10
C
A
C
B
F2: 5
F1: 5
E
Current State
F4: 5
D
F3: 5
E
Target State
37
F4: 5
Challenges of Dynamic Update
Scheduling
• Exponential number of orderings
• Cannot completely avoid planning
F5: 10
A
F5: 10
C
B
F2: 5
F1: 5
D
F3: 5
A
C
B
F2: 5
F1: 5
E
Current State
F4: 5
D
F3: 5
E
Target State
F2
Deadlock
38
F4: 5
Challenges of Dynamic Update
Scheduling
• Exponential number of orderings
• Cannot completely avoid planning
F5: 10
A
F5: 10
C
B
F2: 5
F1: 5
A
C
B
F2: 5
F1: 5
D
F3: 5
E
Current State
F4: 5
D
F3: 5
E
Target State
F1
39
F4: 5
Challenges of Dynamic Update
Scheduling
• Exponential number of orderings
• Cannot completely avoid planning
F5: 10
A
F5: 10
C
B
F2: 5
F3: 5
E
Current State
F1
C
B
F2: 5
F1: 5
F1: 5
D
A
F4: 5
D
F3: 5
E
Target State
F3
40
F4: 5
Challenges of Dynamic Update
Scheduling
• Exponential number of orderings
• Cannot completely avoid planning
F5: 10
A
F5: 10
C
B
F2: 5
F3: 5
F4: 5
E
F3
D
F3: 5
E
Target State
Current State
F1
C
B
F2: 5
F1: 5
F1: 5
D
A
F2
Consistent update plan
41
F4: 5
Dionysus Pipeline
Current
State
Target
State
Consistenc
y
Property
Dependency Graph
Generator
Encode valid orderings
Update Scheduler
Determine a fast order
Network
42
Dependency Graph Generation
F5: 10
A
B
F2: 5
F1: 5
F3: 5
D
F5: 10
C
A
C
B
F2: 5
F1: 5
E
F4: 5
D
Current State
F3: 5
E
Target State
Dependency Graph Elements
Node
Operation
node
Resource node (link capacity, switch table
size)
Path node
Edge Dependencies between
nodes
43
F4: 5
Dependency Graph Generation
F5: 10
A
B
F2: 5
F1: 5
F3: 5
D
F5: 10
C
A
C
B
F2: 5
F1: 5
E
F4: 5
D
Current State
F3: 5
E
Target State
Mov
e
F2
Mov
e
F3
Mov
e
F1
44
F4: 5
Dependency Graph Generation
F5: 10
A
B
F2: 5
F1: 5
F3: 5
D
F5: 10
C
A
C
B
F2: 5
F1: 5
E
F4: 5
D
Current State
F3: 5
E
Target State
Mov
e
F2
A-E: 5
Mov
e
F3
Mov
e
F1
45
F4: 5
Dependency Graph Generation
F5: 10
A
B
F2: 5
F1: 5
F3: 5
D
F5: 10
C
A
C
B
F2: 5
F1: 5
E
F4: 5
D
Current State
F3: 5
E
Target State
Mov
e
F2
A-E: 5
5
Mov
e
F3
Mov
e
F1
46
F4: 5
Dependency Graph Generation
F5: 10
A
B
F2: 5
F1: 5
F3: 5
D
F5: 10
C
A
C
B
F2: 5
F1: 5
E
F4: 5
D
Current State
F3: 5
E
Target State
5
A-E: 5
Mov
e
F2
5
Mov
e
F3
Mov
e
F1
47
F4: 5
Dependency Graph Generation
F5: 10
A
B
F2: 5
F1: 5
F3: 5
D
F5: 10
C
A
C
B
F2: 5
F1: 5
E
F4: 5
D
Current State
F3: 5
E
Target State
5
A-E: 5
5
5
Mov
e
F3
Mov
e
F1
Mov
e
F2
48
F4: 5
Dependency Graph Generation
F5: 10
A
B
F2: 5
F1: 5
F3: 5
D
F5: 10
C
A
C
B
F2: 5
F1: 5
E
F4: 5
D
Current State
F3: 5
E
Target State
5
A-E: 5
5
5
Mov
e
F3
Mov
e
F1
5
Mov
e
F2
5
D-E: 0
49
F4: 5
Dependency Graph Generation
• Supported scenarios
– Tunnel-based forwarding: WANs
– WCMP forwarding: data center networks
• Supported consistency properties
–
–
–
–
Loop freedom
Blackhole freedom
Packet coherence
Congestion freedom
• Check paper for details
50
Dionysus Pipeline
Current
State
Target
State
Consistenc
y
Property
Dependency Graph
Generator
Encode valid orderings
Update Scheduler
Determine a fast order
Network
51
Dionysus Scheduling
• Scheduling as a resource allocation problem
5
A-E: 5
5
5
Mov
e
F3
Mov
e
F1
5
Mov
e
F2
5
D-E: 0
52
Dionysus Scheduling
• Scheduling as a resource allocation problem
Mov
e
F2
A-E: 0
5
5
Mov
e
F3
Mov
e
F1
5
Deadlock
!
5
D-E: 0
53
Dionysus Scheduling
• Scheduling as a resource allocation problem
5
A-E: 0
Mov
e
F2
5
Mov
e
F1
Mov
e
F3
5
5
D-E: 0
54
Dionysus Scheduling
• Scheduling as a resource allocation problem
5
A-E: 0
Mov
e
F2
5
Mov
e
F3
5
D-E: 5
55
Dionysus Scheduling
• Scheduling as a resource allocation problem
5
A-E: 0
Mov
e
F2
5
Mov
e
F3
56
Dionysus Scheduling
• Scheduling as a resource allocation problem
5
A-E: 5
Mov
e
F2
57
Dionysus Scheduling
• Scheduling as a resource allocation problem
Mov
e
F2
Done!
58
Dionysus Scheduling
• Scheduling as a resource allocation problem
• NP-complete problems under link capacity and switch
table size constraints
• Approach
– DAG: always feasible, critical-path scheduling
– General case: covert to a virtual DAG
– Rate limit flows to resolve deadlocks
59
Critical-Path Scheduling
• Calculate critical-path length
(CPL) for each node
Move
F1
CPL=3
5
A-B:5
CPL=2
5 5
CPL=1
Move
F2
– Extension: assign larger weight to operation
nodes if we know in advance the switch is slow
Move
F3
CPL=2
5
• Resource allocated to operation
nodes with larger CPLs
60
C-D:0
5
CPL=1
Move
F4
CPL=1
Critical-Path Scheduling
Move
F1
• Calculate critical-path length
(CPL) for each node
CPL=3
5
A-B:0
CPL=2
5
CPL=1
Move
F3
Move
F2
CPL=2
5
– Extension: assign larger weight to operation
nodes if we know in advance the switch is slow
• Resource allocated to operation
nodes with larger CPLs
61
C-D:0
5
CPL=1
Move
F4
CPL=1
Handling Cycles
Mov
e
F2
5
• Convert to virtual DAG
– Consider each strongly connected
component (SCC) as a virtual node
A-E: 5
5
5
Mov
e
F3
Mov
e
F1
5
5
D-E: 0
• Critical-path scheduling on virtual
DAG
CPL=1
CPL=3
– Weight wi of SCC: number of operation nodes
weight = 2
62
Mov
e
F2
weight = 1
Handling Cycles
Mov
e
F2
5
• Convert to virtual DAG
– Consider each strongly connected
component (SCC) as a virtual node
A-E: 0
5
Mov
e
F1
Mov
e
F3
5
5
D-E: 0
• Critical-path scheduling on virtual
DAG
CPL=1
CPL=3
– Weight wi of SCC: number of operation nodes
weight = 2
63
Mov
e
F2
weight = 1
Evaluation: Traffic Engineering
50th Percentile Update Time
Update Time (second)
3.5
3
2.5
2
OneShot Not congestion-free
DionysusCongestion-free
1.5
SWAN
1
0.5
0
WAN TE
Data Center TE
64
Congestion-free
Evaluation: Traffic Engineering
50th Percentile Update Time
Update Time (second)
3.5
3
2.5
2
OneShot Not congestion-free
DionysusCongestion-free
1.5
SWAN
1
0.5
0
WAN TE
Data Center TE
Improve 50th percentile update speed by 80%
compared to static scheduling (SWAN), close to
OneShot
65
Congestion-free
Evaluation: Failure Recovery
99th Percentile
Update Time
5
3
4
2.5
Update Time (second)
Link Oversubscription (GB)
99th Percentile
Link Oversubscription
3
2
1
0
2
1.5
1
0.5
0
OneShot
Dionysus
SWAN
OneShot Dionysus
66
SWAN
Evaluation: Failure Recovery
99th Percentile
Update Time
5
3
4
2.5
Update Time (second)
Link Oversubscription (GB)
99th Percentile
Link Oversubscription
3
2
1
0
2
1.5
1
0.5
0
OneShot
Dionysus
SWAN
Reduce 99th percentile link
oversubscription by 40% compared to
static scheduling (SWAN)
OneShot Dionysus
SWAN
Improve 99th percentile update speed by
80% compared to static scheduling
67
(SWAN)
Conclusion
• Dionysus provides fast, consistent network
updates through dynamic scheduling
– Dependency graph: compactly encode orderings
– Scheduling: dynamically schedule operations
Dionysus enables more agile SDN control loops
68