What is Routing Instability?

Download Report

Transcript What is Routing Instability?

CS 6390 Advanced Computer Networks
Dr. Ravi Prakash
End-to-End Routing Behavior in the Internet
Internet Routing Instability
Presented by
Carlos Flores
Gaurav Jain
May 31st. 2000
1
Topics of Presentation
I. Introduction
II. Routing Behavior in the Internet
III. Routing Instability
IV. Conclusions
2
Introduction
• Purpose of Studies
 Analyze the routing behavior in the Internet for
pathological conditions, routing stability and routing
symmetry for end-to-end measurements.
 Analyze BGP routing messages to examine
Internet routing instability.
3
Routing Behavior
• Main questions
* What pathologies and failures occur in routing?
* Stable or unstable routes?
* Symmetric or Asymmetric routes?
• Terms
AS’s - Autonomous Systems. Set of routers and hosts
unified by a single administrative authority.
BGP - Border Gateway Protocol. Protocol used for
transmission among different AS’s.
Flapping - Frequent change of routes between AS’s.
4
Methodology
Number of Internet sites: 37
Tools: - Traceroute
- NPD (Network Probe Deamon)
- npd_control program.
Time: D1 dataset - collected Nov - Dec ‘94.
D2 dataset - collected Nov - Dec ‘95.
Size: D1 - 6991 Measurements.
D2 - 37,097 Measurements.
5
Routing Pathologies
1) Routing loops:
A) Forwarding Loops: Packets forwarded by a router
return to the router.
B) Information Loops: Router acts on connectivity
info. derived by information it itself propagated earlier.
C) Traceroute Loops: measurement reports the same
sequence of routers multiple times.
Results:
D1 - 10 traceroute loops (0.13%)
D2 - 50 traceroute loops (0.16%)
Loops Duration: 1) < 3 hours
2) > half day
6
Routing Pathologies
2) Erroneous Routing:
D1 - 1 Packet routed to Israel instead of London!
No safe assumption can be made of correct routing.
3) Connectivity Altered Midstream
Results:
Routes lost or altered: D1 - 10 traces
D2 - 155 traces
Conclusion:
Recovery time bimodal: 1) <= 1 second
2) Order of 1 minute.
7
Routing Pathologies
4) Fluttering: Rapid oscillating routing.
D2 - Very little fluttering observed.
Problems:
- Unstable Network paths
- Occur in one direction (asymmetry)
- Roundtrip time difficult to estimate.
Advantages:
Balance network load.
5) Infrastructure failure. “host unreachable” deep inside the
network.
Results:
D1 - 99.8% Availability
D2 - 99.5% Availability
8
Routing Pathologies
6) Unreachable due to too many hops.
* Hop count not always proportional to geographic
distance:
A) End-to-end route 1500 Km: 3 hops.
B) End-to-end route 3 Km: 11 hops.
* Operational diameter of the Internet grown beyond
default value of 30 hops.
* Longer initial value of TTL needed.
9
Routing Pathologies
7) Temporary Outages.
Sequence of consecutive traceroute packets lost.
No
losses
1–5
Losses
>5
losses
D1
55 %
44 %
0.96%
D2
43 %
55 %
2.2 %
10
Routing Pathologies
8) Time of day patterns.
Temporary outages
D2 - Minimum: 0.4%. Outages between 01:00 - 02:00 hrs.
Maximum: 8.0%. Outages between 15:00 - 16:00 hrs.
Infrastructure failure
Minimum: 1.2%. 09:00 - 10:00 hrs.
Maximum: 9.3%. 15:00 - 16:00 hrs.
11
Routing Pathologies
Summary
Pathology
Probability
Trend
Notes
Persistent loops
0.13 – 0.16%
Some lasted hours.
Erroneous
routing
0.004 - 0.004%
No instances in D2
Mid-stream
change
0.16% | 0.44%
Worse
Rapidly varying
routes
Infraestructure
failure
0.21% | 0.48%
Worse
No dominant link
Outage >= 30
secs
0.96% | 2.2%
Worse
Duration
exponentially
distributed
Total Pathologies
1.5% | 3.3%
Worse
12
Routing Symmetry
•Goal: Assess the degree to which routes are symmetric or
asymmetric.
•Effects of network asymmetries:
•Complicate network measurements, troubleshooting,
accounting and routers’ anticipatory flow state.
•Sources:
- Link asymmetric costs (bandwidth, payment scheme).
- Configuration errors, inconsistencies.
- “hot potato”, “cold potato” routing.
13
Routing Symmetry
Analysis
D2: 49% of measurements showed an asymmetric path
visiting at least one different path.
Size of asymmetries:
- Majority of asymmetries confined to a single hop
(only one city or AS different).
14
End-to-End Routing Stability
Objective
Do routes change often or are routes stable over time?
* Views of routing stability:
A) Prevalence – likeliness of observing the same route
in the future.
B) Persistence – How long a route will remain the
same.
* Routes level of granularity:
- Internet granularity (host granurality)
- City granularity
- AS’s granularity
15
End-to-End Routing Stability
* Routing Prevalence
- Host granularity: For half of virtual paths measured,
same route observed 82% or more of the time.
Internet paths strongly dominated by a single route.
-City granularity: 97%
-AS granularity: 100%
* Internet paths very strongly dominated by same set of cities
and same AS’s, but significant site-to-site variation.
16
End-to-End Routing Stability
* Routing Persistence
How long a route is likely to endure before changing?
Rapid Route Alternation: No high-frequency routing
oscillation for measurements of less than 1 hour.
Medium Scale Route Alternation: Observation of
virtual paths spaced 1 hour apart not likely to suffer a
route change.
Large scale Route Alternation: 90% chance of
observing a route with a duration of at least a week.
17
End-to-End Routing Stability
* Summary of routing persistence:
-Route changes occur over a wide range of time scales
(seconds to days)
- 2/3 of Internet paths have stable routes lasting from days
to weeks.
Time Scale
%
Notes
10’s of minutes
9%
Mainly route changes inside the
network
Hours
4%
Usually intra-network changes
6+hours
19%
Intra-network changes
Days
68%
50% less than a week
50% more than a week
18
Internet Routing Instability
•Analysis based on data collected from BGP routing messages
(interdomain routing).
•What is Routing Instability?
•Rapid change of network reachability and topology
information.
•Origins:
•Router configuration errors.
•Physical and data link problems.
•Software bugs.
19
Internet Routing Instability
•Effects:
•Increase packet loss.
•Delays in time for network convergence.
•Resource overhead (memory, CPU) within Internet
Infrastructure.
•Terminology:
•Prefixes: Destination IP addresses blocks.
•ASPATH: List of AS’s numbers in a particular route.
20
Internet Routing Instability
•
Routing forms:
1. Announcements.
2. Withdrawals.
•
•
Types of interdomain routing updates:
•
Forwarding instability.
•
Routing policy fluctuation.
•
Redundant pathological updates.
Instability: Forwarding Instability + Routing policy
fluctuation.
21
Internet Routing Instability
Methodology
* Time of study: 9 months. 1996
* Data: Logged BGP routing messages at 5 major U.S. Network
exchange points.
* Purpose:
- Analyze the BGP data in attempt to characterize and
understand the origins and operational impact of routing
instability.
22
Internet Routing Instability
Analysis of pathological routing information
* Update categories:
A = Announcement W = Withdrawal
- WADiff: route withdrawn and replaced with an
alternative route.
- AADiff: route implicitly withdrawn and replaced
by a preferred alternative path.
- WADup: route explicitly withdrawn and then
reannounced as reachable.
- AADup: route implicitly withdrawn and replaced
with a duplicate of original.
- WWDup: repeated transmission of BGP
withdrawals for a prefix currently unreachable.
23
Internet Routing Instability
Analysis of pathological routing information
* Update categories:
Instability
Pathological behavior
AADiff
WADiff
WWDup
WADup
5%
AADup
95%
24
Internet Routing Instability
Results
1) BGP updates dominated by WWDup.
2) AADup and WADup consistently dominate the remaining
categories.
3) Only a small portion of BGP updates contribute to
AADiff and WADiff.
25
Internet Routing Instability
Results
* All pathological routing incidents caused by small service
providers.
* Some WWDups caused by a vendor’s router
implementation decision.
* Instability: AADiff + WADiff + WADups.
* Trends:
Peaks of updates in the afternoons.
Little instability in the weekend.
* Routing instability closely related to bandwidth usage and
packet loss.
26
Internet Routing Instability
Results
* Plot of time of day vs. no. of updates --> bell shaped curve
(peak afternoon).
* Weekends --> less instability
* Rigorous approach to identify instability frequency - peak
at 24 hrs. and 7 days.
* In a day, periodicity observed at 30 s. and 60 s.
* NO SINGLE ROUTE DOMINATES INSTABILITY.
* NO SINGLE AS DOMINATES INSTABILITY.
27
Internet Routing Instability
Possible origins of routing pathologies
* Stateless BGP implementations.
* Each withdrawal induces some short lived pathological
network oscillation.
* Oscillations due to misconfigured CSUs.
* Jittered timer to coalesce multiple routing updates.
* Unjittered timers in periodic message model.
* Improper configuration of the interaction between interior
gateway protocols and BGP.
28
Internet Routing Instability
Results...
* 99% of routing information is pathological (redundant)
and many not reflect real network topological changes.
* Although redundant updates are quickly discarded by
routers, they consume router resources and high rates of
them (300 updates per second) can crash a router.
* Forwarding instability highly present:
* 3-10% of routes have 1 or more WADiff per day.
* 5-20% of routes have 1 or more AADiff per day.
* 10-50% 1 or more WADup per day.
29
Conclusions
No “typical” Internet site or path.
Likelihood of encounter a major routing
pathology more than doubled from 1994-1995.
Internet paths heavily dominated by a prevalent
route, but routes persistence show wide
variation of time (seconds to days).
2/3 of Internet routes have routes persisting
from days or weeks.
30
Conclusions
Internet routing instability still poorly
understood.
By 1995, half of virtual paths differ by >=1 city
in a two way path.
How can we make it better?
31