VeriCon-ndb - CSE Labs User Home Pages

Download Report

Transcript VeriCon-ndb - CSE Labs User Home Pages

Verifying & Testing
SDN Data & Control Planes:
Header Space Analysis
SDN Stack
•
•
•
•
•
•
•
State layers hold a
representation of the
network’s configuration.
Code layers implement
logic to maintain the
mapping between
two state layers
Code Layers
State Layers
Policy
App
App
App
Logical View
Network Hypervisor
Physical View
Network OS
Device State
Firmware
Hardware
HW
Firmware
Firmware
HW
HW
Troubleshooting Workflow
Tools for Finding
The Code Layer
•
A: Actual Behavior ~ Policy?
•
•
B: Device State ~ Policy?
•
•
OFRewind
D: Device State ~ Hardware?
•
•
Anteater, Header Space Analysis, VeriFlow
C: Physical View ~ Device State?
•
•
Automatic Test Packet Generation, Network Debugger
SOFT
E: Logical View ~ Physical View?
•
Corresponding Checking
Tools for Localizing
within a Code Layer
•
Within the controller (control plane)
•
•
OFRewind, Retrospective Casual Inference, Synoptic,
NICE, VeriCon, …
Within the switch firmware (data plane)
•
Network Debugger, Automatic Test Packet Generation,
SOFT, …
 Systematic troubleshooting


from design (policy specification & verification),
configuration analysis, runtime dynamic analysis,
trouble-shooting, …
Verification vs. testing vs. debugging vs. troubleshooting, …
 Summarize tools and reveal missing tools
Some Existing Approaches to
SDN Verification & Testing
Specification
Policy
Language,
Semantics
NetKAT, NetCore
CTL, NoD, Klee Assert
Testing
NICE, ATPG
Verification
(e.g. reachabilty)
Data Plane: Anteater, VeriFlow, HSA, Atomic Predicates,
First-order + Transitive Closure, local checks
Control Plane: VeriCon, BatFish
Synthesis
(e.g. forwarding)
One big switch, VLAN, NetCore, NetKAT, FlowLog
SDN: Verification & Testing
• Data Plane Verification:
– given data plane FIBs f, every packet p satisfies designed
network spec/property
– If not, generate one counterexample or all
counterexamples
– Or provides optimizations or help adding functionality
• Data Plane Testing:
– detour via Symbolic Testing
• Control Plane Verification:
– given configuration c, for every packet & network environment,
verify control plane generates correct data plane states meeting
desired network specs/properties
• Control Plane Testing:
• Synthesis:
– design a Control Plane CP that will ∀packet p, s (network state)
satisfy Φ (policy specs), e.g., reachability, no loop, waypoint
traversal ), by construction
7
Verification:
Values and Obstacles
Hardware
Software
Networks
Chips
Devices (PC, phone)
Service
Bugs are:
Burned into
silicone
Exploitable,
workarounds
Latent, Exposed
Dealing
with
bugs:
Costly recalls
Online updates
Live site incidents
Obstacles
to
eradication:
Design Complexity
Code churn, legacy,
false positives
Topology, configuration
churn
Value
proposition
Cut time to market
Safety/OS critical
systems,
Quality of code base
Meet SLA,
Utilize bandwidth,
Enable richer policies
VeriCon: Towards Verifying
Controller Programs in SDNs
Goal: Guaranteeing network invariants to
ensure correctness, safety, etc.
• Network should always satisfy some invariants
• Difficult to write an SDN application
that always guarantees such invariants
9
Limitations of Existing Approaches
1. Establish existence, but not absence, of
bugs
– NICE (finite-state model checking): unexplored
topologies may cause bugs to be missed
– HSA (check network snapshots): snapshots may
not capture situations in which bugs exist
2. Runtime overhead
– VeriFlow & NetPlumber (check in real-time):
bugs only identified when app is actually running
10
VeriCon
Verifies network-wide invariants for any
event sequence and all admissible
topologies
Guarantee
SDN
application in
Core SDN
+
Topology
constraints &
invariants in
first order
logic
invariants
are
satisfied
OR
Concrete
counterexample
Verify
conditions
using the Z3
theorem prover
11
Example: Stateful Firewall
Trusted Hosts
1
2
• Always forward from trusted to
untrusted hosts
• Only forward from untrusted to
trusted hosts if a trusted host
previously sent a packet to the
untrusted host
Untrusted Hosts
12
Core SDN (CSDN) Language
• Define and initialize relations
– Topology:
link (S, O, H)
link(S1, I1, I2, S2)
– Forwarding: S.ft(Src → Dst, I → O)
S.sent(Src → Dst, I → O)
• Write event handlers: pktIn(S, Pkt, I)
–
–
–
–
Update relation
Install rule (insert into ft)
Forward packet (insert into sent)
If-then-else
13
CSDN: Built-In Relations
Describing Network States
14
First-Order Formulas & Invariants
Three types of invariants:
• topo: defines admissible topology
• safety: hold initially & preserve after event executions
• trans: hold after an event execution
Example: no black hole
15
Example: Learning Switch
Controller Code
16
Learning Switch Controller Code:
Some Invariants
17
Stateful Firewall in CSDN
1
2
rel tr(SW, HO) = {}
pktIn(s, pkt, prt(1)) →
s.forward(pkt, prt(1), prt(2))
tr.insert(s, pkt.dst)
s.install(pkt.src → pkt.dst, prt(1), prt(2))
pktIn(s, pkt, prt(2)) →
if tr(s, pkt.src) then
s.forward(pkt, prt(2), prt(1))
s.install(pkt.src→pkt.dst, prt(2), prt(1))
18
Invariants
• Topology: define
admissible topologies
assumed to hold initially
• Safety: define the
required consistency of
network-wide states
• Transition: define the effect
of executing event handlers
checked initially &
after each event
19
Stateful Firewall Invariants
• Topology: At least one switch with two
ports, prt(1) & prt(2); a packet P is forwarded
from
an untrusted host U to a trusted host T
U , T : HO , S : SW , P : PK .
link ( S , prt (2), U )  link ( S , prt (1), T ) 
P.src  U  P.dst  T  S .sent ( P, prt (2), prt (1))
• Safety: For every packet sent from a host U
to a host T there exists a packet sent to T’
from U
I1 
S .sent ( P, prt (2), prt (1)) 
P': PK .P'.dst  P.src  S .sent ( P' , prt (1), prt (2))
20
Counterexample
pkt.dst
prt(0)
prt(3)
HO:0
SW:0
prt(1)
prt(2)
Src
pkt.src
out
in
s
flow-table
Dst
In
HO:0 HO:0
*
Out
*
I1 is not inductive—not all executions starting
from an arbitrary state satisfy the invariant
21
Additional Firewall Invariants
• Flow table entries only contain
forwarding rules from trusted hosts
I2 
S . ft( Src  Dst , prt (2), prt (1)) 
P': PK .P'.dst  Src  S .sent ( P' , prt (1), prt (2))
• Controller relation tr records the correct hosts
I3 
tr( S , H ) 
P : PK .P.dst  H  S .sent ( P, prt (1), prt (2))
• I1 ˄ I2 ˄ I3 is inductive
22
Non-buggy Verification Examples
Program
LOCs
Topo Safety +
Inv. Trans Inv.
1
3+0
1
2+0
Time
(sec)
0.12
0.06
Firewall
Stateless Firewall
8
4
Firewall + Host Migration
Learning Switch
Learning Switch + Auth
9
8
15
0
1
2
3+0
4+2
5+3
0.16
0.16
0.21
Resonance (simplified)
Stratos (simplified)
93
29
6
12
5+2
3+0
0.21
0.09
23
Buggy Verification Examples
Benchmark
Auth: Rules for unauth host not removed
Firewall: Forgot part of consistency inv
Firewall: No check if host is trusted
Firewall: No inv defining trusted host
Learning: Packets not forwarded
Resonance: No inv for host to have one state
StatelessFW: Rule allowing all port 2 traffic
Counterex
Host + Sw
3+2
5+3
6+4
6+4
1+1
11 + 4
4+2
CSDN: Abstract Syntax
From Formulas to Theorems & Models
• Includes a separate utility for inferring inductive invariants
using iterated weakest preconditions by Dijkstra’s alg.
• Automatic theorem proving (verification) by Z3
Further Work Needed
• Assume events are executed atomically
– Enforceable using barriers, with performance
hit
– Consider out-of-order rule installs
• Rule timeouts
– App handles timeout events to update
its ft relation and check invariants
– Need to reason about event ordering
27
Summary of VeriCon
• Verifies network-wide invariants for any
event sequence and all admissible
topologies
• Guarantees invariants are satisfied, or
provides a concrete counterexample
• Application with 93 LOC and
13 invariants is verified in 0.21s
http://agember.com/go/vericon
28
NDB: Debugging SDNs
• Bugs can be anywhere in the SDN stack
– Hardware, control plane logic, race conditions
• Switch state might change rapidly
• Bugs might show up rarely
How can we exploit the SDN
architecture to systematically track down
the root cause of bugs?
29
Bug Story: Incomplete Handover
A
Switch
X
WiFi AP
Y
B
WiFi AP
Z
30
ndb: Network Debugger
Goal
– Capture and reconstruct the sequence of events
leading to the errant behavior
Allow users to define a Network Breakpoint
– A (header, switch) filter to identify the errant
behavior
Produce a Packet Backtrace
– Path taken by the packet
– State of the flow table at each switch
31
Debugging software programs
Function A():
i = …; j = …;
u = B(i, j)
Breakpoint
“line 25, w = abort()”
Function B(x, y):
k = …;
v = C(x, k)
Backtrace
File “A”, line 10, Function A()
File “B”, line 43, Function B()
File “C”, line 21, Function C()
Function C(x, y):
…
w = abort()
32
Debugging Networks
Breakpoint
A
“ICMP packets A->B,
arriving at X,
but not Z”
Switch
X
WiFi AP
Y
Backtrace
WiFi AP
Z
B
Switch X: {
inport: p0,
outports: [p1]
mods: [...]
matched flow: 23 [...]
matched table version: 3
}
Switch Y: {
inport p1,
outports: [p3]
mods: ...
...
}
X
Y
Using ndb to Debug
Common Issues
Reachability
– Symptom: A is not able to talk to B
– Breakpoint: “Packet A->B, not reaching B”
Isolation
– Symptom: A is talking to B, but it shouldn’t
– Breakpoint: “Packet A->B, reaching B”
Race conditions
– Symptom: Flow entries not reaching on time
– Breakpoint: “Packet-in at switch S, port P”
34
How Does ndb Work?
Match
ACT Plane
Control
Match ACT
Flow Table
State Recorder
Breakpoint
Switch =
IP src = , IP dst =
TCP Port = 22
Postcard
Collector
35
Control Plane
1. <Match,
Action>
2. <Match,
Action>
3. <Match,
Action>
4. <Match,
Action>
5. <Match,
Action>
6. …
7. …
1. <Match,
Action>
2. <Match,
Action>
3. <Match,
Action>
4. <Match,
Action>
5. <Match,
Action>
6. …
7. …
Flow Table State Recorder
1. <Match,
Action>
2. <Match,
Action>
3. <Match,
Action>
4. <Match,
Action>
5. <Match,
Action>
6. …
7. …
1. <Match,
Action>
2. <Match,
Action>
3. <Match,
Action>
4. <Match,
Action>
5. <Match,
Action>
6. …
7. …
Postcard
Collector
36
Control Plane
Flow Table State Recorder
<Datapath ID, Packet ID,
Version>
Postcard
Collector
<Flow Table State,
Version>
37
Who Benefits
Network developers
– Programmers debugging control programs
Network operators
– Find policy errors
– Send error report to switch vendor
– Send error report to control program vendor
38
Performance and Scalability
Control channel
– Negligible overhead
– No postcards
– Extra flow-mods
Postcards in the datapath
– Single collector server for the entire Stanford
backbone
– Selective postcard generation to reduce overhead
– Parallelize postcard collection
39
Summary
• ndb: Network Breakpoint + Packet
Backtrace
• Systematically track down root cause
of bugs
• Practical and deployable today
40