Your title here - IIT RTC Conference

Download Report

Transcript Your title here - IIT RTC Conference

Of maps and costs: Aggregating large-scale broadband
measurements for the Application Layer Traffic
Optimization (ALTO) protocol
IIT RTC Conference
October 15 - 17, 2013
David Goergen1
Vijay K. Gurbani2
Radu State1
SnT – Interdisciplinary Centre for Security, Reliability and Trust
2 Bell Laboratories, Alcatel-Lucent
1
OUTLINE
•
•
•
•
•
29.03.2016
Premise
ALTO: background
FCC dataset
Processing
Evaluation and discoveries
IIT RTC conference
2
Premise
• Essential to study trends and derive
network analytics
• Two extremes exist
– Complete and highly details raw data
• Users lost in details
• High amount of data
– Highly aggregated and summerized reports
• Human readable format
– i.e. charts, presentations, reports
• Often cannot be further investigated
•  There is a need for an intermediate way
– ALTO Protocol seems a good choice.
29.03.2016
IIT RTC conference
3
ALTO Introduction
• ALTO solves the general rendezvous problem: Given a choice of
resources, which one is the best candidate?
• Recurring pattern in many domains:
- Peer-to-peer (BitTorrent)
- Which peers are close to me? Which peers have high upload bandwidth?
- Content delivery networks (CDN)
- Rendezvous me with nearest surrogate
- Network routing and distance calculation
- Shortest path computation
- Data centers and cloud computing
- Where is my nearest data center? Which server is lightly loaded? Which data
center has the lowest network utilization?
29.03.2016
IIT RTC conference
4
ALTO Introduction
• History
- Circa 2008 --- Comcast and BitTorrent
- P2P traffic dominates the Internet
- Internet Service Providers wanted a well-behaved network
- ISPs wanted to reduce transit costs.
- BitTorrent traffic exhibits greedy behaviour to optimize local maxima at
the expense of other time-sensitive traffic.
- May 2008 IETF Workshop on P2P Infrastructure held in MIT to arrive at
mitigating solutions
- Outcome: 2 Working Groups
- LEDBAT: Low Effort Extra Delay Background Transport
- ALTO: Application Layer Traffic Optimization
29.03.2016
IIT RTC conference
5
• ALTO is:
ALTO Introduction
- An Application Layer Traffic Optimization Protocol
- An IETF Working Group
- An IETF (soon-to-be) standard RFC
- A restful API that provides topology maps and cost maps to clients
- A restful API that provides building blocks to construct:
- Ranking service
- Endpoint cost service
- Endpoint property service
- Map Filtering service
- What is an endpoint?
- An IP address, a MAC address, an aggregation of IP addresses, ...
29.03.2016
IIT RTC conference
6
ALTO Introduction ALTO Architecture
ISPProvisioning
Routing protocols
policies
Dynamic network
information
ALTO client
ALTO server
ALTO service
discovery
External interfaces
Standardized protocol
Not subject to standardization
Third parties, content providers, ...
29.03.2016
IIT RTC conference
7
ALTO Introduction
• 2 main abstractions:
- Network Map
- Cost Map
• Network specified in terms of Partition/Provider ID (PID): aggregation of
endpoints identified by a provider-defined network location identifier.
• Costs are normalized and have two attributes:
- Type: What does the cost represent? Air-miles, hop count, ...
- Mode: How to interpret the cost.
- Numerical (mathematical operations)
- Ordinal(position-based preferences)
• These abstractions help!
- IT, meet NOC. NOC, meet IT!
29.03.2016
IIT RTC conference
8
ALTO Introduction: Maps (Network and cost)
Network map
Datacenter 2
Datacenter 1
Datacenter 3
Problem: Complexity and
network structure exposed.
Graphics sources:
http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg
29.03.2016
IIT RTC conference
9
ALTO Introduction: Maps (Network and cost)
Datacenter 2
Network map
Datacenter 1
PID 2
PID 1
Datacenter 3
PID 3
 Hides complexity behind “partition IDs”
Graphics sources:
http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg
29.03.2016
IIT RTC conference
10
ALTO Introduction: Maps (Network and cost)
Datacenter 2
Cost map
Datacenter 1
PID 2
PID 1
20
1
10
30
Datacenter 3
22
5
PID 3
 Network cost of linking the partitions
Graphics sources:
http://pubs.vmware.com/vi301/intro/images/Introduction_chapter.3.2.1.jpg
29.03.2016
IIT RTC conference
11
ALTO Introduction: Example ALTO
maps
Cost map
Network map
29.03.2016
IIT RTC conference
12
FCC Dataset specification
• One country
• Time Period: 01.01.2012 to 31.12.2012
• 7,782 anonymised volunteers spread
across the country
• Each hourly triggers a defined set of
common web sites
– i.e. Google, YouTube, CNN, …
• 75-78 million records per month
• 6-7 GB of data per month
29.03.2016
IIT RTC conference
13
FCC Dataset specification
• Consists of several files organized per
month
– Linked together through unit_id field
• For our first evaluation we use curr_dns
file
–  extract distinct unit_id which are
consistent over a certain period
– Use these to create a topology map for the
ALTO protocol
29.03.2016
IIT RTC conference
14
FCC Dataset specification
29.03.2016
IIT RTC conference
15
Processing
• Find a stable set of unit_id
– DNS resolver appears in every file
– Location is fixed.
• Location is resolved using geo-ip
database
• Unit_id close to DNS resolver location
29.03.2016
IIT RTC conference
16
Hadoop cluster specs
• Hadoop 2.0.0-cdh 4.3.0
• 4 nodes
– hexacore 2.4GHz Xeon
• 120 GB RAM
• HDFS 27.54 TB
• 2 x 1GB Ethernet bonded
29.03.2016
IIT RTC conference
17
Hadoop job process
29.03.2016
IIT RTC conference
18
Outcome
• Output contains
– unit_id
– DNS Resolver IP
– Occurrence
– Geo. location
• Post process
– Filter all non stable unit_id
• Occurrence < 12 month
29.03.2016
IIT RTC conference
19
Interesting Observation
• Some unit_id are located outside US
– Assume user has manually configured DNS
resolver
• OpenDNS and Google DNS resolvers were
ignored
• Large convergence to single point (Potwin,KS)
– Potwin is the geographical center of the US
– ISPs generally locate their primary or secondary
DNS name servers
– continue to further investigate on minimizing the
impact
• Some unit_id change ISP and/or location
29.03.2016
IIT RTC conference
20
Stable unit_id
29.03.2016
IIT RTC conference
21
Next steps
• Attempt to create network map
– Rough PID groupings accomplished by unit
IDs belonging to same ISP.
– More formal PID groupings for further study
(e.g., group by bandwidth speed irrespective
of ISP, lowest jitter, …).
• Attempt to create a cost map
– Different cost maps for different applications
(e.g., use udp latency or jitter as a cost metric
for VoIP applications).
• Cross-reference with other dataset
(e.g., US Census Dataset).
29.03.2016
IIT RTC conference
22
Next steps
• Using stable unit IDs as landmarks in a
virtual coordinate system.
29.03.2016
IIT RTC conference
23
THANK YOU FOR YOUR ATTENTION
QUESTIONS?
29.03.2016
IIT RTC conference
24