PPT - Shivkumar Kalyanaraman

Download Report

Transcript PPT - Shivkumar Kalyanaraman

A Case Study in
Understanding OSPFv2 and
BGP4 Interactions Using
Efficient Experiment Design
David Bauer†, Murat Yuksel‡, Christopher Carothers† and Shivkumar
Kalyanaraman‡
†Department of Computer Science
‡Department of Electrical, Computer and Systems Engineering
Rensselaer Polytechnic Institute
Parameter Space: fixed
inputs, protocol timers,
decision algorithm
Design Complexity
Problem Statement
Computational Complexity
Models: BGP4, OSPFv2, TCP-Reno, IPv4
ROSS.Net built and utilized to address both parts of the problem
 Goal: “good results fast” leading to an understanding of the
system under test (make sense of the results)

Response Surface
OSPFv2
Understand protocol interactions through
UPDATE messages generated by and
between protocols
 OO: OSPF caused OSPF Updates
 BO: BGP caused OSPF Updates
INTERACTION
BGP4

 BB: BGP caused BGP Updates
 OB: OSPF caused BGP Updates
INTERACTION
Why Are Feature Interactions
Harmful?
Network protocol weaknesses are not fully
understand until implemented / simulated in the
large-scale
 Are decisions made to efficiently route data
within a domain adversely affecting our ability to
efficiently route data across the domain?

 Hot-potato routing: small degree of unstable information
affects large portion of traffic
 Cold potato routing
AS 0
AS 1
AS 2
Local Policy:
Global
Policy:optimize
optimizerouting
routingbetween
within
withinAS
and
ASes
(OSPFv2)
between
(BGP4)ASes
Large-scale Simulation


Topology from Rocketfuel data
Network Hierarchy:
–
–
–
–
–
–
EBONE: AS 1755
iBGP: 16,384
Level 0 routers: 9.92 Gb/sec and 1 ms delay
Level 1 routers: 2.48 Gb/sec and 2 ms delay
Level 2 routers: 620 Mb/sec and 3 ms delay
Level 3 routers: 155 Mb/sec and 50 ms delay
Level 4 routers: 45 Mb/sec and 50 ms delay
Level 5 routers and below: 1.55 Mb/sec and 50
ms delay
12
EXODUS: AS 3967
iBGP: 50,176
OSPFv2:
Routers: 438
Links: 1,192
eBGP: 53
OSPFv2:
Routers: 688
Links: 2,166
12
ABOVENET: AS 6461
iBGP: 2,500
26
LEVEL 3: AS 3356
iBGP: 7,921
OSPFv2:
Routers: 2,064
Links: 8,669
eBGP: 210
12
9
eBGP: 199
Tiscali: AS 3257
iBGP: 441
OPSFv2:
Routers: 843
Links: 2,667
11
OSPFv2:
Routers: 618
Links: 839
eBGP:
Experiment Design and Analysis

Three classes of protocol
parameters:
– OSPF timers, BGP timers, BGP
decision

RRS was allowed 200 trials to
optimize (minimize) response
surface
– Heuristic search algorithm

Applied multiple linear regression
analysis on the results
Response Plane


Intra-domain routing decisions can
effect inter-domain behavior, and
vice versa.
OB Update
Destination
All updates belong to either of four
categories:
–
–
–
–
OSPF-caused OSPF (OO) update
OSPF-caused BGP (OB) update – interaction
BGP-caused OSPF (BO) update – interaction
BGP-caused BGP (BB) update
Link failure or cost increase
(e.g. maintenance)
8
10
Response Plane


Intra-domain routing decisions can
effect inter-domain behavior, and
vice versa.
All updates belong to either of four
categories:
–
–
–
–
OSPF-caused OSPF (OO) update
OSPF-caused BGP (OB) update eBGP connectivity
becomes available
BGP-caused OSPF (BO) update
BGP-caused BGP (BB) update
These interactions cause route changes to thousands of
IP prefixes, i.e. huge traffic shifts!!
BO Update
Destination
High Level Characterization
~15% improvement when BGP
 Optimized with respect to OB+BO response surface.
timers included in search
space
 BGP timers play the major role, i.e. ~15% improvement
in the
optimal response.
– BGP KeepAlive timer seems to be the dominant parameter.. – in contrast
to expectation of MRAI!

OSPF timers effect little, i.e. at most 5%.
– low time-scale OSPF updates do not effect BGP.
Design 1: Mgt Perspectives
Minimize
total BO+OB
to 15-25%
Important
optimize
better than other metrics
OSPF




Varied response surfaces -- equivalent to a particular management approach.
Importance of parameters differ for each metric.
OB: ~50% of total updates
For minimal total updates:
– Local perspectives are 20-25% worse than the global.
BO: ~0.1% of total updates
For minimal total interactions:
–

15-25% worse can happen with other metrics
OB updates are more important than BO updates (i.e. ~0.1% vs. ~50%)
Global perspective 20-25% better
than local perspectives
Design 2: Hot- v Cold-Potato Routing

Q: Can we use this
approach to provide
guidance for network
routing policies?
No major impact
regardless of search
 Performed full factorial of
performed
RRS searches, turning
Hot-, Cold-potato routing
ON/OFF
 Provide quantitative
results from which
Majority
of UPDATEs
were
qualitative
stmts can
be
made by LOCAL-Pref
generated
andVerified
AT&T
and Sprint
AS Path
length
measurements
MED was << 1% of
UPDATEs
Hot Potato was 0.8%
Larger question: Which steps in the
BGP decision making algorithm are
most important?
Design 3: Network Robustness
Response tied to
link stability
BGP parameters
had greatest impact

Q: Can we use this approach to
provide network admins with
guidance for network
configurations?

Link status varied with uniform
random probability over simulation
runtime
Link weights varied with uniform
random probability over simulation
runtime
Response: BO + OB, Global Persp,
and Default network settings
Search consistently provides better
results



By maximizing link failure detection
times, UPDATEs most effectively
minimized
Conclusions
– Number of experiments were reduced by many orders of
magnitude in comparison to Full Factorial
– Experiment design and statistical analysis enabled rapid
elimination of insignificant parameters
– Several qualitative statements and system characterizations
could be obtained with few experiments.
– Provided validation of network measurement community
results, and called into question importance of premises
– Search algorithms do not always find desired behaviour
! Allowed me to complete my thesis and graduate!