Transcript Slide
The Sprint IP Monitoring Project
and
Traffic Dynamics at a Backbone POP
Supratik Bhattacharyya
Sprint ATL
http://www.sprintlabs.com
The IP Group at Sprintlabs
Charter :
Investigate IP technologies for robust,
efficient, QOS-enabled networks
Anticipate and evaluate new services and
applications
Major Projects :
Monitoring Sprint’s IP Backbone
Service Platform
Talk Overview
The
IPMon Project
Routing
and Traffic Dynamics
IP Backbone : POP-to-POP view
POP
OC-48
OC-12
OC-3
POP : Point of Presence,
typically a metropolitan area
Motivation: Need for Monitoring
Current network is over-provisioned, overengineered, best-effort…
Diagnosis:
detect and report problems at IP level
Management
configuration problems, traffic engineering
resource provisioning, network dimensioning
Value-added
service
feedback to customers (performance, traffic
characteristics)
Detect attacks and anomalies
Existing Measurement Efforts
Passive measurements
SNMP-based tools
Netflow (Cisco proprietary)
OC3MON, OC12MON
Active Measurements
ping, traceroute, NIMI, MINC, Surveyor
Skitter, Keynote, Matrix
Integrated Approach
AT&T Netscope
• Network topology and routes
• Traffic at flow level granularity
• Delay and loss statistics
Our approach
Passive monitoring
Capture header (44 bytes) from every packet
full TCP/IP headers, no http information
Use GPS time stamping - allows accurate
correlating of packets on different links
Day long traces
Simultaneously monitor multiple links and sites.
Collect routing information along with packet
traces.
Traces archived for future use
Applications
Data
from a commercial Tier-1 IP backbone
Applications of data:
traffic modeling
traffic engineering
provisioning
pricing, SLAs
hardware design in collaboration with vendors
denial-of-service
Measurement Facilities
IPMON
System
Collects packet traces by passively tapping onto
the fiber using optical splitters
supports OC-3 to OC-48 data rates
Data
Repository
Large tape library to archive data
Analysis
Platform
Initially 17 nodes computing cluster
SAN under deployment
IPMON Architecture
GPS
clock
IPMON system
DAG Card
disk
array
SONET
optical
splitter
OC-3/12/48
link
main memory buffer
Linux PC with multiple PCI buses
Monitoring links at a POP
Backbone links
Backbone
Router
Access
Router
Access
Router
customer
customer
Peering points
Access
Router
customer
Current Status of IPMONs
Currently
operational in one major west
coast POP on OC3 links
Under way in two major east coast POPs
for OC3 and OC12 -- (we hope by July
2001)
OC48 in preparation for 1 east coast POP
and 1 west coast POP -- summer 2001
Future: Sprint Dial-Up Network, more
POPs, European network
Practical Constraints
Difficult to monitor operational network :
Complex procedure for deploying equipment
POPs evolve too fast
Too costly to be ubiquitous
Technology limitations (PCs, disks, etc.)
Only off-line analysis is possible
Are 44 bytes enough?
Ongoing Projects
Routing
Delay
TCP
and Traffic Dynamics
measurement across a router
flow analysis
Denial
of service
Bandwidth
provisioning and pricing
Routing and Traffic Dynamics Project
Part
1: what are the traffic demands
between pairs of POPs?
How stable is this demand?
Part
2: what are the paths taken by those
demands?
Are link utilizations levels similar throughout
the backbone?
Part
3: is there a better way to spread the
traffic across paths?
At what level of traffic granularity should
traffic be split up?
Motivation
Understand traffic demands between POP pairs
POP-to-POP Traffic Matrix
For every ingress POP :
Identify total traffic to each egress POP
Further analyze this traffic
City A City B City C
City A
City B
City C
Measure traffic over
different timescales
Divide traffic per
destination prefix,
protocol, etc.
Applications
Intra-domain
Analyzing
Verify
routing anomalies
BGP Peering
Capacity
POP
routing
planning and dimensioning
architecture
Generating POP-POP traffic matrices
The Mapping Problem
What is the egress POP for a packet entering
the a given ingress POP?
Mapping BGP destinations to POPs
BGP table
(Dst,Next-Hop)
Find best
Next-Hop
(Next-Hop, POP map)
Get Unique
Next-Hops
Unique
Next-Hops
Map Dst to
POP
Map to POP
(BGP Dst,POP)
Recursive BGP
lookup to find
last Sprint hop
(Next-Hop, Last Sprint Hop)
Data Processing
Step
1: Use BGP tables to generate
[prefix, egress POP] map
Step
2: Run IP lookup software on packet
trace using above map
Output : single trace file for each egress-POP,
e.g. all packets headed to POP k from monitored
POP
Step
3: Use our traffic analysis tool for
statistics evaluation.
Monitored links at a single POP
Peer 2
Peer 1
Core
Core
Access
Access
Core
Access
Access
web hosting
ISP
Data
5
traces collected on Aug 9, 2000
Access Link Type
Trace Length (hours)
Webhost 1
19
Webhost 2
13
Peer 1
24
Peer 2
15
ISP
8
Traffic Fanout: POP level granularity
Fanout: web host links
Time-of-Day for POP level granularity
Day-Night Variation : Webhost #1
% reduction at night between 20-50% depending upon access link
Summary
Wide
disparity in “traffic demands” among
egress POPs
POPs can be roughly categorized as : small,
medium, large; and they maintain their rank
during the day.
Traffic is heterogeneous in space yet stable
in time.
Traffic varies by (access link, egress POP
pair)
Hard to characterize time-of-day behaviour
20-50% reduction at night
Routing and Traffic Dynamics Project
Part
1: what are the traffic demands
between pairs of POPs?
How stable is this demand?
Part
2: what are the paths taken by those
demands?
Are link utilizations levels similar throughout
the backbone?
Part
3: is there a better way to spread the
traffic across paths?
At what level of traffic granularity should
traffic be split up?
IS-IS Routing Practices
Is backbone traffic balanced?
What we’ve seen so far
Wide disparity in traffic demands between
(ingress, egress) POP pairs
+
Wide disparity in link utilization levels, plus
many underutilized routes
+
Routing Policies concentrate traffic on few
paths
Question: Can we divert some traffic to the
lightly loaded paths?
Routing and Traffic Dynamics Project
Part
1: what are the traffic demands
between pairs of POPs?
How stable is this demand?
Part
2: what are the paths taken by those
demands?
Are link utilizations levels similar throughout
the backbone?
Part
3: is there a better way to spread the
traffic across paths?
At what level of traffic granularity should
traffic be split up?
Creating traffic aggregates
To
address issues of splitting traffic over
multiple paths, need to define “streams”
within traffic
How should packets be aggregated into
streams?
Coarse granularity: POP-to-POP
Very fine granularity: use 5-tuple
Initial criterion : destination address
prefix
Elephants and Mice among /8 streams
Traffic grouped by
egress POPs
Stream : all packets in a
group with same /8
destination address prefix
Ingress : Webhost Link
Stability of prefix-based aggregates
Observations about prefix-based streams
Recursive : /8 elephant has a few /16 elephants and
many mice, likewise at /24 level
Phenomenon is less pronounced at /24 level
Qn : Are elephants stable?
Definition:
Ri(n) = the rank of flow i at time slot n
Di,n,k= | Ri(n) - Ri(n+k) |
each time slot corresponds to 30 minutes
Frequency of Rank Changes
Conclusion :
For load balancing,
route elephants along
different paths
Conclusions
Monitoring
and measurement is key to
better network design
IPMon : a passive monitoring system for
packet-level information
We have used our data to build components
of traffic matrices for traffic engineering
Backbone traffic can be better loadbalanced : destination-prefix is a possible
(simple) criterion
Ongoing Work
Intra-domain
Routing :
Choosing ISIS link weights
Load balancing in the backbone
Flow
Characterization
Building
Traffic Matrices
POP modeling