GPF_2_2007_peer_cariden

Download Report

Transcript GPF_2_2007_peer_cariden

Peering Planning Cooperation without
Revealing Confidential Information
Aaron Hughes
[email protected]
GPF 2.0 March 25-30, 2007
Miami, Cozumel & Belize
www.cariden.com
GPF 2.0 2007
(c) cariden technologies
Failover Matrices–Cariden
1
The Issue
• Multi-Homed Neighbor, 2 or more links > 50%
• Example
– 1000Mbps connections to Peer X in 3 locations
– SJC-to-Peer = 600Mbps, NYC = 100, WDC = 600
– SJC-to-Peer link fails
• Are we in trouble?
... or ...
?
SEA
BOS
CHI
SEA
?
BOS
CHI
NYC
SJC
WDC
KCY
HST
SJC
ATL
WDC
KCY
HST
LAX
ATL
LAX
Peer X
GPF 2.0 2007
NYC
MIA
Failover Matrices–Cariden
Peer X
MIA
2
Capacity Planning Utopia
• Uniform capacity links
• Diverse connections
(unlikely double failures at Layer 3)
• Upgrade at 50%
(planning objective is to be resilient to single
failures)
GPF 2.0 2007
Failover Matrices–Cariden
3
Capacity Planning Reality
• Range of capacities
• Multiple Layer 3 failures
• Upgrade impediments (money, cable plant, ...)
GPF 2.0 2007
Failover Matrices–Cariden
4
IGP Different from BGP
• Data is more accessible
• Failure behavior is predictable
• Established process for within AS planning
– Gather Data
• Topology (OSPF, IS-IS, ...)
• Traffic matrix
[1]
• Estimate growth
– Simulate for failures
– Perform traffic engineering (optional)[2]
– Upgrade as necessary
• Commercial and free tools
GPF 2.0 2007
Failover Matrices–Cariden
5
The Trouble with BGP
• Data is larger and harder to access
• BGP decision process complicated
• Planning practices not well established
• Failure behavior often depends on someone
else’s network!
subject of
– e.g., incoming traffic from a peer
GPF 2.0 2007
Failover Matrices–Cariden
this talk
6
BGP Path Decision Algorithm[1]
1. Reachable next hop
2. Highest Weight
3. Highest Local Preference
4. Locally originated routes
5. Shortest AS-path length
6. IGP > EGP > Incomplete
7. Lowest MED
Respect MEDs
8. EBGP > IBGP
Shortest Exit Routing
9. Lowest IGP cost to next hop
10. Shortest route reflection cluster list
11. Lowest BGP router ID
12. Lowest peer remote address
[1] Junos algorithm shown here. Cisco IOS uses a slightly different algorithm.
GPF 2.0 2007
Failover Matrices–Cariden
7
Common Routing Policies
• Shortest Exit
– Often used for sending to peers
– Get packet out of network as soon as possible
– Local Prefs used to determine which neighbor,
IGP costs used to determine which exit
• Respect MEDs
– Often used for customers who buy transit
– Deliver packets closest to destination
– Neighbor forwards IGP costs as MEDs
(multi-exit discriminators)
GPF 2.0 2007
Failover Matrices–Cariden
8
Blind Spots
• Cannot predict behavior when routing depends
on other network (see 3 cases below).
Relationship
to Remote AS
Routing To
Remote AS
Shortest Exit in
Peer known network
Respect MEDs
Customer from unknown
Transit Shortest Exit in
Provider known network
GPF 2.0 2007
Failover Matrices–Cariden
Routing From
Remote AS
Shortest Exit in
unknown network
Shortest Exit in
unknown network
Respect our MEDs
9
Failover Matrices
• Solution to peering planning blind spots
• Procedure
– Gather data
• Topology, Traffic, Routing Configurations
– Simulate knowable effects
• Generate Failover Matrices
– Share Failover Matrices for unknowables
• e.g., peer gives failover matrix for traffic it delivers, we
provide peer failover matrix for traffic we deliver
• Both sides benefit from cooperating
• AS-Internal information is kept confidential
GPF 2.0 2007
Failover Matrices–Cariden
10
Failover Matrix Example
Traffic:
%Traffic:
no failure fail_SJC
ar1.sjc:Gig3/2
600
Node:Interface
ar1.nyc:ge-2/1
100
48%
(388)
ar2.wdc:ge-2/2
600
52%
(912)
%Traffic: %Traffic:
fail_nyc
fail_wdc
10% (610)
1% (606)
70%
(670)
95%
(670)
-
Note: 388Mbps=100Mbps+(0.48*600Mbps), 912=600+(0.52*600), ...
GPF 2.0 2007
Failover Matrices–Cariden
11
No Scale
Failover Example (from real network)
No Scale
Peer Circuit 1: Traffic levels at five minute intervals
No Scale
Peer Circuit 2: Traffic levels at five minute intervals
No Scale
Peer Circuit 3: Traffic levels at five minute intervals
Peer Circuit 4: Traffic levels at five minute intervals
• Circuit 2 fails.
Traffic shifts to circuit 4.
GPF 2.0 2007
Failover Matrices–Cariden
12
No Scale
Failover Example (from real network)
No Scale
Peer Circuit 1: Traffic levels at five minute intervals
No Scale
Peer Circuit 2: Traffic levels at five minute intervals
No Scale
Peer Circuit 3: Traffic levels at five minute intervals
Peer Circuit 4: Traffic levels at five minute intervals
• Circuit 1 fails. Some traffic shifts to 2 & 4
• Some “leaks” to other AS’s
GPF 2.0 2007
Failover Matrices–Cariden
13
Questions
• How do I calculate a failover matrix?
• How do I use a failover matrix from a peer?
• What if my peer does not cooperate?
• What if a substantial amount of traffic “leaks”
to another AS?
GPF 2.0 2007
Failover Matrices–Cariden
14
Calculating Failover Matrices
• Accurate and Detailed[1,2]
– Per prefix routing and traffic statistics
– Full BGP simulation
• Simple and Good Enough[3]
– Traffic matrix based on ingress-egress pairs
• e.g., Peer1.LAX-AR1.CHI (measure and/or estimate)
instead of 192.12.3.0/24-208.43.0.0/16
– Limited simulation model
• Shortest Path, Respect MEDs
• “Our” AS plus immediate neighbors
[1] “Modeling the routing of an Autonomous System with C-BGP,” B. Quoitin and S. Uhlig, IEEE Network, Vol
19(6), November 2005.
[2] “Network-wide BGP route prediction for traffic engineering,” N. Feamster and J. Rexford, in Proc. Workshop
on Scalability and Traffic Control in IP Networks, SPIE ITCOM Conference, August 2002.
[3] Cariden MATE, available at http://www.cariden.com.
GPF 2.0 2007
Failover Matrices–Cariden
15
Using Failover Matrix from Peers
• Peer calculates failover matrix
• Peer exports failover matrix
using IP addresses of peering
links
• We import failover matrix
• We include in a representative
model of peer network
• Use Failover Matrix in
simulation
GPF 2.0 2007
Failover Matrices–Cariden
16
Estimate if Peer not Cooperate
bbs1
sel2
sea1
bhx1
lba1
min1
sfo1
nrt4
man1
det1
cph1
nqt1
roc1
bos1
ewr1
nrt1
chi1
osa1
sjc2
jfk1
ham1
lon3
sac1
pao2
kcy1
cle1
nyc2
wsh1
ams2
fra2
snv2
tpe3
kul1
lax1
cdg2
den2
dal1
phi1
 Group own
sources based on
exit location
(4 groups here)
ams1
wdc2
sin1
lin1
bru1
hkg1
atl1
phx1
sna1
hou1
tpa1
mia1
bbs1
sel2
pty1
mex1
bhx1
gru1
syd1
sea1
lba1
eze1
min1
sfo1
nrt4
 Quantify shift (to 3
groups) after failure
Assume similar for
other side
man1
det1
cph1
nqt1
roc1
bos1
ewr1
nrt1
chi1
osa1
sjc2
jfk1
ham1
lon3
sac1
pao2
kcy1
cle1
nyc2
wsh1
ams2
fra2
snv2
tpe3
kul1
lax1
cdg2
den2
dal1
phi1
ams1
wdc2
sin1
bru1
hkg1
lin1
atl1
phx1
sna1
hou1
tpa1
mia1
pty1
mex1
gru1
syd1
eze1
• Valid if topology and traffic distributions are similar
GPF 2.0 2007
Failover Matrices–Cariden
17
Leaks to Other AS’s
x
A
INTERNET
B
• Simple option
– Leaks between peers relatively small
• Ignore
– Shifts between transit providers can be large
• Equal AS-path length to most destinations:
• Assume complete shift (easy to model)
• Accurate option
– Extend model to more than one AS away
– Add columns in traffic matrix to designate extra
traffic in case of other network failures
GPF 2.0 2007
Failover Matrices–Cariden
18
Work in Progress
• Evaluating goodness of models
– Compare actual failures to models
• Evaluating goodness of failover estimates
– Work with both sides of a peering arrangement,
compare failover estimates to simulations
– Compare estimated failover matrices to actual
failures
• Streamlining sharing of information
• Looking for more participants
Contact me to participate in the above
GPF 2.0 2007
Failover Matrices–Cariden
19
Summary
• Peering/transit links are some of the most
expensive and difficult to provision links
• We can improve capacity planning on such
links by modeling the network
• BGP modeling can be much more complex
than IGP modeling
– Some required information is not even available
• Failover Matrices provide a simple way to
share information without giving away details
• Failover Matrices can be estimated using one’s
own network details
GPF 2.0 2007
Failover Matrices–Cariden
20
Acknowledgments
• Jon Aufderheide (Global Crossing)
• Clarence Filsfils (Cisco)
GPF 2.0 2007
Failover Matrices–Cariden
21