Internet Topology Data - Computer Science and Engineering

Download Report

Transcript Internet Topology Data - Computer Science and Engineering

Comparative Analysis of
Internet Topology Data Sets
Jay Thom
Outline
 Introduction
– Why is this important?
 Background
– History
– Internet Topology Measurement
 Related Works
– Data Sources
– Papers
 Project Goal
– Data Collection and Preparation
– Analysis
 Conclusion
Why Is This Important?
 My motivation:
– Internet Topology Measurement
• Our project
• Challenges with data collection
• Dealing with large amounts of data
• What can be learned?
• Project goals
Some History…
Circuit Switching
1877 - First commercial circuit switched networks appear
Some History…
Packet Switching
• First packet-switched network (ARPANET) sends a message from UCLA
to Stanford, 1969
• Attempted to send the word “LOGIN”, but crashed, sending the message
“LO”
History of the Internet
• 1970 – First network protocol introduced: Network Control Protocol (NCP)
• 1970 – First network applications begin to appear
• 1972 – First ’hot’ application introduced; email
• 1982 – TCP/IP protocol introduced
• 1986 – NSFNET interconnects computer centers at several universities
• 1990 – First commercial connections to the network appear
• 1991 – World Wide Web goes live, widespread access to the network begins
• 2016 - http://www.internetlivestats.com/
Why is all of this important?
The Network of Networks
• Internet is made up of 55,483 autonomous systems (Ases)
• Each system is managed independently
• Cooperation within the network is voluntary
• ASes seek to improve their own performance, maintain competitive relationships
• Topological details of each AS is undisclosed, proprietary
How do we know what the Internet actually looks like?
What tools do we have to monitor it?
Measurement Tools – Ping
Measurement Tools - Traceroute
Measurement Platforms
• CAIDA – Archipelago (Ark)
• Measurement Lab (M-Lab)
• University of Washington Information Plane (iPlane)
• Ripe NNC Atlas
• University of Southern California ISI Ant Census
• PlanetLab
Ark Statistics
•
CAIDA – Center for Applied Internet Data Analysis
•
University of California San Diego
•
Measurement and data curation (archives)
•
165 monitors in 57 countries
•
Growing by 766 million traces per month
•
Growing by 316 GB per month stored
•
41 billion traces performed (total)
•
Data stored in a binary format (.warts file)
•
Began running in 1998
Ark Raspberry Pi Monitor
Ark Monitor Locations
Ripe Atlas
• Ripe NNC – Reseaux IP Europeens Network
Coordination Centre
• Regional Internet Registry (RIR) for Europe, Middle
East, and Central Asia
• Based in Amsterdam, Netherlands
• 13,554 probes (small network devices)
• 208 anchors (rack-mounted servers)
• Traces are performed regularly from probes to anchors
• Anchors can also perform user-defined traces to any
IP address
• Traces are stored at anchors, and can be downloaded
in .json format
• Established in 1992
Ripe Atlas Hardware
Rack mounted anchor
Small probe (connected anywhere)
Ripe Atlas Anchors
Ripe Atlas Probes
PlanetLab
• Nodes hosted by corporate/academic institutions
• 1090 nodes in 507 countries
• Affiliated users are granted a ”slice” to run experiments on
most* available nodes
• UNR has 2 PlanetLab nodes in operation
• Used by M-Lab and iPlane as vantage points for
measurements
• Slowly dying…only about 176 nodes currently in operation
* Some nodes are reserved for exclusive use by M-Lab
PlanetLab
M-Lab
• Utilizes PlanetLab nodes to generate traces
• Supports a number of tools, and archives all data making it accessible to the
public. such as:
•
•
•
•
•
•
•
•
Glasnost - detects prioritization or censorship of network traffic
Network Diagnostic Tool (NDT) – measures TCP performance
Neubot – for studying broadband performance
NPAD – diagnoses issues in a network plan to improve performance
OONI – censorship, surveillance, traffic manipulation.
Paris Traceroute – network topology mapping
SideStream – TCP state information
Mlab-collectd – monitors M-Lab slices on PlanetLab
A collaborative effort involving Google, New America, Princeton University, and others.
iPlane
• Utilizes PlanetLab nodes to perform traceroutes to infer router level topology.
• Clusters interfaces into PoPs: iPlane clusters interfaces that are in the same
Point of Presence (PoP). For this, every interface in the router atlas is probed
using UDP and ICMP packets. Interfaces that respond with the same source
address or have similar return TTLs to all the vantage points are clustered
together.
• Measures link attributes: loss rate, bandwidth capacity of all inter-cluster
links.
• Performs route prediction: iPlane composes segments of observed Internet
paths to predict the end-to-end path between any pair of end-hosts, and uses
this prediction to estimate end-to-end performance for overlay services.
Ant Census
• Started in 2003
• Scans IPv4 space every 42 days
• 4.3 billion addresses
• Utilizes ICMP ping, looks for response
• Shows ownership of all 256 /8 subnets
• Brighter – more replies
• About 6% of addresses respond
Problems
Each platform storing data in a different format on a different file system
• Ripe – uncompressed .json format, data stored at each anchor
• Ark – compressed .warts files (binary), nested file system, password protected
• iPlane – compressed binary files, custom format (C++), password protected
• M-Lab – Google Cloud storage, compressed files within files
• Ant Census – released once every two months
Extract all files and convert to a common format
1 month = 1TB of data
Related Work
Pietro Marchetta et al, “Topology discovery at the router level: a new hybrid
tool targeting ISP Networks”, 2011
• An attempt to find a new way to collect measurements without traceroute
and alias resolution.
• Introduce a new tool, Merlin, a central server controls distributed vantage
points to probe a targeted AS.
• Uses MRINFO, a ping-like tool to monitor active multicast groups, utilizes
IGMP messages ASK_NEIGHBORS and NEIGHBORS_REPLY.
• Deployed recursively to all routers in an AS and gives a listing of each
router’s multicast neighbors.
• Performs worse than traceroute for inter-AS measurements, but much better
for mapping the core of an AS.
Related Work
Benoit Donnet and Timur Friedman, “Internet Topology Discovery: A Survey”, 2007
• Discusses the four levels of Internet topology: IP interface level, Router level,
PoP level, and AS level
• Seek to build a formal graph from network measurement data
• From this perspective, study characteristics of the network in terms of average
degree, degree distribution, clustering coefficient, and between-ness centrality
• Would like to build a visualization of the network based on these factors
• No actual data collection.
Related Work
Hakan Kardes, Mehmet Gunes, Talha Oz, “Cheleby: A subnet-level Internet
Topology Mapping System”, 2012
• Divide available PlanetLab nodes into 7 teams based on geographic location
• Each node is assigned a block of IP addresses from other sources, plus the
first address from each /24 subnet (to reach all subnets)
• Each monitor probes 4 destination blocks at a time, and each block is only
probed by 1 monitor
• Each process is independent, and eventually all blocks are probed by all
monitors
• Amassed data set is analyzed to discover features in the network topology
(i.e. number of nodes, edges, alias resolution statistics)
Related Work
Kimberly Claffy et al, “Internet Mapping: from Art to Science”, 2009
• Seeking to improve on previous tool (SKITTER)
• This was the paper written to introduce the Ark project by CAIDA
• Intend to amass largest data set
• Plan to build repository for measurement data from their study as well as others
• Will make data available to research community
Related Work
Bradley Huffaker et al, “Internet Topology Data Comparison”, 2012
• Studies topology at IP, Router, and AS level
• Notes that different topology studies are producing conflicting results
• Discusses metrics;
• Average node degree
• Graph size
• Number of edges
• Node degree distribution
• Clustering
• Mean local clustering
• Discusses possible inaccuracies in previous works because of a lack of IP
alias resolution leading to an over-estimation of the number of routers
Related Work
Vaibhav Bajpai and Jorgen Schownwilder, “A Survey on Internet
Performance Measurement Platforms and Related Standardization Efforts”,
2015
• Survey of network performance tools
• Focus on performance metrics rather than topology (bandwidth, reachability,
censorship, throttling, etc.
• Consider some of the same platforms, but from another perspective
Related Work
John Heidemann et al, “A Survey of the Visible Internet”, 2008
• Many hosts are hidden (firewalls, private IP space), but there is much to be
learned from the visible address space
• Census: walk the entire address space and look for responsive hosts
• Survey: frequently sample a fraction of that space
Some results:
• 3.6% of the allocated space are actually occupied by visible hosts
• ¼ of responsive /24 subnet blocks are less than 5% filled
• 9% of responsive /24 subnet blocks are more than ½ filled
• 16% (34 million IPs) are responsive and stable
• Estimated from this, 60 million Internet-accessible computers exist
Conclusion
• History
• Why is this important?
• Measurement Platforms
• Problems
• Related Work
• Project Goals