Transcript Slide

The Sprint IP Monitoring Project
and
Traffic Dynamics at a Backbone POP
Supratik Bhattacharyya
Sprint ATL
http://www.sprintlabs.com
The IP Group at Sprintlabs
Charter :
Investigate IP technologies for robust,
efficient, QOS-enabled networks
Anticipate and evaluate new services and
applications
Major Projects :
Monitoring Sprint’s IP Backbone
Service Platform
Talk Overview
 The
IPMon Project
 Routing
and Traffic Dynamics
IP Backbone : POP-to-POP view
POP
OC-48
OC-12
OC-3
POP : Point of Presence,
typically a metropolitan area
Motivation: Need for Monitoring
Current network is over-provisioned, overengineered, best-effort…
 Diagnosis:
detect and report problems at IP level
 Management
configuration problems, traffic engineering
resource provisioning, network dimensioning
 Value-added
service
feedback to customers (performance, traffic
characteristics)
Detect attacks and anomalies
Existing Measurement Efforts

Passive measurements
 SNMP-based tools
 Netflow (Cisco proprietary)
 OC3MON, OC12MON

Active Measurements
 ping, traceroute, NIMI, MINC, Surveyor
 Skitter, Keynote, Matrix

Integrated Approach
 AT&T Netscope
• Network topology and routes
• Traffic at flow level granularity
• Delay and loss statistics
Our approach


Passive monitoring
Capture header (44 bytes) from every packet
 full TCP/IP headers, no http information





Use GPS time stamping - allows accurate
correlating of packets on different links
Day long traces
Simultaneously monitor multiple links and sites.
Collect routing information along with packet
traces.
Traces archived for future use
Applications
 Data
from a commercial Tier-1 IP backbone
 Applications of data:
traffic modeling
traffic engineering
provisioning
pricing, SLAs
hardware design in collaboration with vendors
denial-of-service
Measurement Facilities
 IPMON
System
Collects packet traces by passively tapping onto
the fiber using optical splitters
supports OC-3 to OC-48 data rates
 Data
Repository
Large tape library to archive data
 Analysis
Platform
Initially 17 nodes computing cluster
SAN under deployment
IPMON Architecture
GPS
clock
IPMON system
DAG Card
disk
array
SONET
optical
splitter
OC-3/12/48
link
main memory buffer
Linux PC with multiple PCI buses
Monitoring links at a POP
Backbone links
Backbone
Router
Access
Router
Access
Router
customer
customer
Peering points
Access
Router
customer
Current Status of IPMONs
 Currently
operational in one major west
coast POP on OC3 links
 Under way in two major east coast POPs
for OC3 and OC12 -- (we hope by July
2001)
 OC48 in preparation for 1 east coast POP
and 1 west coast POP -- summer 2001
 Future: Sprint Dial-Up Network, more
POPs, European network
Practical Constraints

Difficult to monitor operational network :
 Complex procedure for deploying equipment 
 POPs evolve too fast 
Too costly to be ubiquitous
 Technology limitations (PCs, disks, etc.)
 Only off-line analysis is possible
 Are 44 bytes enough?

Ongoing Projects
 Routing
 Delay
 TCP
and Traffic Dynamics
measurement across a router
flow analysis
 Denial
of service
 Bandwidth
provisioning and pricing
Routing and Traffic Dynamics Project
 Part
1: what are the traffic demands
between pairs of POPs?
How stable is this demand?
 Part
2: what are the paths taken by those
demands?
Are link utilizations levels similar throughout
the backbone?
 Part
3: is there a better way to spread the
traffic across paths?
At what level of traffic granularity should
traffic be split up?
Motivation
Understand traffic demands between POP pairs
POP-to-POP Traffic Matrix
For every ingress POP :
Identify total traffic to each egress POP
Further analyze this traffic
City A City B City C
City A
City B
City C
Measure traffic over
different timescales
Divide traffic per
destination prefix,
protocol, etc.
Applications
 Intra-domain
 Analyzing
 Verify
routing anomalies
BGP Peering
 Capacity
 POP
routing
planning and dimensioning
architecture
Generating POP-POP traffic matrices
The Mapping Problem
What is the egress POP for a packet entering
the a given ingress POP?
Mapping BGP destinations to POPs
BGP table
(Dst,Next-Hop)
Find best
Next-Hop
(Next-Hop, POP map)
Get Unique
Next-Hops
Unique
Next-Hops
Map Dst to
POP
Map to POP
(BGP Dst,POP)
Recursive BGP
lookup to find
last Sprint hop
(Next-Hop, Last Sprint Hop)
Data Processing
 Step
1: Use BGP tables to generate
[prefix, egress POP] map
 Step
2: Run IP lookup software on packet
trace using above map
Output : single trace file for each egress-POP,
e.g. all packets headed to POP k from monitored
POP
 Step
3: Use our traffic analysis tool for
statistics evaluation.
Monitored links at a single POP
Peer 2
Peer 1
Core
Core
Access
Access
Core
Access
Access
web hosting
ISP
Data
5
traces collected on Aug 9, 2000
Access Link Type
Trace Length (hours)
Webhost 1
19
Webhost 2
13
Peer 1
24
Peer 2
15
ISP
8
Traffic Fanout: POP level granularity
Fanout: web host links
Time-of-Day for POP level granularity
Day-Night Variation : Webhost #1
% reduction at night between 20-50% depending upon access link
Summary
 Wide
disparity in “traffic demands” among
egress POPs
 POPs can be roughly categorized as : small,
medium, large; and they maintain their rank
during the day.
 Traffic is heterogeneous in space yet stable
in time.
 Traffic varies by (access link, egress POP
pair)
 Hard to characterize time-of-day behaviour
 20-50% reduction at night
Routing and Traffic Dynamics Project
 Part
1: what are the traffic demands
between pairs of POPs?
How stable is this demand?
 Part
2: what are the paths taken by those
demands?
Are link utilizations levels similar throughout
the backbone?
 Part
3: is there a better way to spread the
traffic across paths?
At what level of traffic granularity should
traffic be split up?
IS-IS Routing Practices
Is backbone traffic balanced?
What we’ve seen so far
Wide disparity in traffic demands between
(ingress, egress) POP pairs
+
Wide disparity in link utilization levels, plus
many underutilized routes
+
Routing Policies concentrate traffic on few
paths
Question: Can we divert some traffic to the
lightly loaded paths?
Routing and Traffic Dynamics Project
 Part
1: what are the traffic demands
between pairs of POPs?
How stable is this demand?
 Part
2: what are the paths taken by those
demands?
Are link utilizations levels similar throughout
the backbone?
 Part
3: is there a better way to spread the
traffic across paths?
At what level of traffic granularity should
traffic be split up?
Creating traffic aggregates
 To
address issues of splitting traffic over
multiple paths, need to define “streams”
within traffic
 How should packets be aggregated into
streams?
 Coarse granularity: POP-to-POP
 Very fine granularity: use 5-tuple
 Initial criterion : destination address
prefix
Elephants and Mice among /8 streams
Traffic grouped by
egress POPs
Stream : all packets in a
group with same /8
destination address prefix
Ingress : Webhost Link
Stability of prefix-based aggregates
Observations about prefix-based streams
Recursive : /8 elephant has a few /16 elephants and
many mice, likewise at /24 level
Phenomenon is less pronounced at /24 level
Qn : Are elephants stable?
 Definition:
Ri(n) = the rank of flow i at time slot n
 Di,n,k= | Ri(n) - Ri(n+k) |
each time slot corresponds to 30 minutes
Frequency of Rank Changes
Conclusion :
For load balancing,
route elephants along
different paths
Conclusions
 Monitoring
and measurement is key to
better network design
 IPMon : a passive monitoring system for
packet-level information
 We have used our data to build components
of traffic matrices for traffic engineering
 Backbone traffic can be better loadbalanced : destination-prefix is a possible
(simple) criterion
Ongoing Work
 Intra-domain
Routing :
Choosing ISIS link weights
Load balancing in the backbone
 Flow
Characterization
 Building

Traffic Matrices
POP modeling