A Value-based Framework for Internet Peering Agreements

Download Report

Transcript A Value-based Framework for Internet Peering Agreements

FlowRoute: Inferring Forwarding Table
Updates Using Passive Flow-level
Measurements
Amogh Dhamdhere (CAIDA/UCSD)
[email protected]
with
Lee Breslau, Nick Duffield, Cheng Ee, Alexandre Gerber,
Carsten Lund and Shubho Sen (AT&T Labs-Research)
Motivation
• Routing protocol performance during routing
events can affect end-to-end performance
• Transient loops and packet losses may occur
during routing reconvergence
• Network operators need to monitor routing
protocol performance
• Do routers respond as expected?
– Update their forwarding tables in a timely manner?
– Update their forwarding tables to the expected state?
7/20/2015
IMC 2010, Melbourne Australia
2
Monitoring Routing Events
• Control plane monitors (e.g., OSPFmon, BGPmon)
– Monitor the control plane
– cannot measure when a router implemented a change in its
forwarding table
• Active probing
– Can only monitor paths that are probed
– Spatial and temporal resolution limited by placement of
probes and probing frequency
7/20/2015
IMC 2010, Melbourne Australia
3
FlowRoute
• A data-plane monitoring tool to work in
conjunction with control plane monitors
• Infer forwarding table updates using flow-level
measurements
• Works offline, for after-the-fact forensics and
analysis
• No additional overhead on routers
– Uses flow-level measurements (e.g., Netflow) that are
already collected
7/20/2015
IMC 2010, Melbourne Australia
4
Basic Method
T1: f1
N1
R
T2: f2
N2
• Single packet flows f1 and f2
towards D
• f1 seen at N1: R is previous
hop at time T1
• N1 is R’s next hop towards D
at T1
• f2 seen at N2: R is previous
hop at time T2
• N2 is R’s next hop towards D
at T2
R’s next hop towards D changed in [t1,t2]
7/20/2015
IMC 2010, Melbourne Australia
5
Routing Flow Records
o
i
Rp
R
δ
Rn
R sees flow towards destination D from tf to tl
Netflow: (R, i, o, tf, tl, D)
Map outgoing
interface
Duplicate
first o to next
hop
router
packet
timestamp
Map incoming Subtract link
interface i to propagation
previous hop delays
router
(Rp, tf-δ, tl- δ,D,R)
(R, tf, tf, D, Rn)
One flow record at R produces two routing flow
records, giving the routing state of R and Rp
7/20/2015
IMC 2010, Melbourne Australia
6
Inferring Forwarding Table Updates
• Collect netflow records
from all routers
• Convert to Routing
Flow Records (RFRs)
for offline processing
(R, T1, T2, N1, D)
(R, T3, T4, N2, D)
T2 < T3
N1
T1
7/20/2015
N2
T2
T3
T4
R changed next hop
towards D in the time
window [t2,t3]  “range”
of forwarding table
update
IMC 2010, Melbourne Australia
7
Inferring Forwarding Table Updates
(R, T1, T2, N1, D)
(R, T3, T4, N2, D)
T2 > T3
N2
N1
T1
7/20/2015
T3
T2
T4
• Collect netflow records
from all routers
• Convert to Routing
Flow Records (RFRs)
for offline processing
Routing flow records
overlap  could be due
to Equal Cost Multi-Path
(ECMP)
IMC 2010, Melbourne Australia
8
ECMP
[T1,T2]: f1
N1
R
[T3, T4]: f2
7/20/2015
D
N2
• Router R can forward
flows destined to D to
either N1 or N2
• RFRs generated at N1
and N2 can overlap 
inconsistency
• Non-overlapping RFRs
can appear as a routing
change for every flow
IMC 2010, Melbourne Australia
9
Filtering ECMP
• Observation: In 99% of next hop changes due to
ECMP, a router routes fewer than 20 flows
towards one next hop, before routing a flow
towards an equal-cost next hop
• Filtering heuristic: Declare routing change only if
>20 flows were routed to the old next hop before a
flow is routed to new next hop
• Conservative: May miss routing changes before
20 flows are forwarded to the old next hop
7/20/2015
IMC 2010, Melbourne Australia
10
Sampling
• Both packet and flow sampling in high-speed
networks
• Sampling does not affect correctness of inferred
ranges
• Sampling affects the width of ranges; more
sampling  lower temporal resolution
• More discussion in the paper
7/20/2015
IMC 2010, Melbourne Australia
11
Timely Forwarding Table Updates
Forwarding table
update ranges
OSPF event
“cluster”
All ranges overlap with
OSPF event cluster
7/20/2015
IMC 2010, Melbourne Australia
12
Delayed Forwarding Table Updates
Forwarding table
updates
consistent with
OSPF events
Forwarding table
updates delayed
w.r.t OSPF events
Such behavior is not
detectable using a
control plane monitor
alone!
7/20/2015
IMC 2010, Melbourne Australia
13
Delayed Forwarding Table Updates
• Used FlowRoute on a 2-month dataset
• 2666 OSPF event clusters
• 97010 time ranges consistent with OSPF event
clusters
• 117 ranges that showed delayed forwarding table
updates
• Two routers showed delayed updates 14 times in
the 2-month dataset
– Subsequently retired from the network
7/20/2015
IMC 2010, Melbourne Australia
14
Loops
• Delayed forwarding table updates can cause
transient loops
– Example in the paper of how this can happen
• 392 instances of 1-hop loops during 2-month
dataset
• Mostly short-lived (sub-second)
• A few loops lasted 10s of seconds
– Long-lived loops were due to delayed updates by one or
more routers
7/20/2015
IMC 2010, Melbourne Australia
15
Summary
• FlowRoute: A data plane monitor to work in
conjunction with control plane monitors for
forensics and analysis of forwarding table updates
• Used to study forwarding table updates in a tier-1
ISP network
• Found cases of delayed forwarding table updates
due to buggy routers
• Also found transient loops during routing
convergence and spikes in link utilization
7/20/2015
IMC 2010, Melbourne Australia
16
Thanks!
[email protected]
www.caida.org/~amogh
7/20/2015
IMC 2010, Melbourne Australia
17
Practical Issues
• What should be the destination? Can be either
destination IP address, prefix, or MPLS tunnel
endpoint
– Need to observe sufficient flow volume
– We choose MPLS tunnel endpoint
• Sampling
– Both packet and flow sampling occur in high-speed
networks
– Sampling does not affect correctness of inferred ranges
– Affects the width of the ranges; more sampling  lower
temporal resolution
7/20/2015
IMC 2010, Melbourne Australia
18
Existing Approaches
• Control plane monitors (e.g., OSPFmon, BGPmon)
– Monitor the control plane, cannot measure when a router
implemented a change in its forwarding table
• Collect and process router logs
– Large volume of data, transporting and processing is hard
– Limited by polling frequency, e.g., 5 minutes with SNMP
• Active probing
– Spatial and temporal resolution limited by placement of
probes and probing frequency
7/20/2015
IMC 2010, Melbourne Australia
19
Delayed Forwarding Table Updates
• Used FlowRoute on a 2-month dataset -- 2666
OSPF event clusters
• 97010 time ranges consistent with OSPF event
clusters
• 58 clusters, 117 ranges that showed delayed
forwarding table updates
• Two routers showed delayed updates 14 times in
the 2-month dataset
– Subsequently retired from the network
7/20/2015
IMC 2010, Melbourne Australia
20