slides - Control and Dynamical Systems (CDS)

Download Report

Transcript slides - Control and Dynamical Systems (CDS)

Optimization-Based Approaches
to Understanding and Modeling
Internet Topology
David Alderson
California Institute of Technology
INFORMS Telecom Conference 2004
March 8, 2004
Acknowledgments
• This talk represents joint work with John Doyle
(Caltech), Lun Li (Caltech), and Walter Willinger
(AT&T—Research)
• Many others have contributed to this story
–
–
–
–
–
–
–
Reiko Tanaka (Caltech)
Steven Low (Caltech)
Ramesh Govindan (USC)
Neil Spring (U.Washington)
Stanislav Shalunov (Abilene)
Heather Sherman (CENIC)
John Dundas (Caltech)
Today’s Agenda
• Review recent work of empiricists and theoreticians to
understand the router-level topology of the Internet
• Understand the causes and implications of “heavy tails”
in the complex structure of the Internet
• Illustrate how recently popular “scale-free” models of
Internet topology are not just wrong, but wildly so
• Describe the importance of optimization in the
development of explanatory models of Internet topology
• Present the HOT framework as an alternative means to
understanding the “robust, yet fragile” structure of the
Internet and other complex engineering systems
• Highlight some open research problems and areas where
contributions can be made by the OR/MS community
First Question:
Why should we care about modeling
the topology of the Internet?
Understanding the topology of the current (and
future) Internet is important for many reasons
• Design and evaluation of networking protocols
– Topology affects performance, not correctness
• Understanding large-scale network behavior
– Closed-loop feedback: topology design vs. protocols
• Is the current design a result of the dominant routing protocols?
• Or are the presently used routing protocols the result of some
prevailing network design principles?
– Ability to study what-if scenarios
• Operating policy shifts
• Economic changes in the ISP market
• Implications for tomorrow’s networks
– Provisioning requirements, Traffic engineering, Operating
and management policies
Performance Evaluation via Simulation
(traditional) topology
generators provide
only connectivity
information!
network
topology
info
• connectivity
• bandwidths
• delays
protocol
info
network
simulator
(e.g. ns2)
traffic demand
info
• application-specific
performance
measures
Performance Evaluation via Simulation
protocol
info
annotated
network
graph!
• connectivity
• bandwidths
• delays
network
simulator
(e.g. ns2)
traffic demand
info
performance
measures
The Internet as an Infrastructure
As the Internet has grown in capability and importance,
we have become increasingly dependent on it.
• Directly: communication (email, instant messenger,
VOIP), information and entertainment, e-commerce
• Indirectly: business, education, government have
(permanently) replaced physical/manual methods
with electronic processes, many of which rely on the
Internet.
The central importance, open architecture, and
evolving technology landscape make the Internet an
attractive target for asymmetric attack.
The Internet as a Case Study
The Internet is a great starting point for the study of
other highly engineered complex network systems:
• To the user, it creates the illusion of a simple, robust,
homogeneous resource enabling endless varieties and
types of technologies, physical infrastructures, virtual
networks, and applications (heterogeneous).
• Its complexity is starting to approach that of simple
biological systems
• Our understanding of the underlying technology
together with the ability to perform detailed
measurements means that most conjectures about its
large-scale properties can be unambiguously
resolved, though often not without substantial effort.
Question Two:
Why is research on Internet topology
interesting/difficult?
A Challenging Problem
Since the decommissioning of the NSFNet in 1995, it has
been difficult to obtain comprehensive knowledge
about the topology of the Internet
• The network has grown dramatically (number of hosts,
amount of traffic, number of ISPs, etc.)
• There have been economic incentives for ISPs to
maintain secrecy about their topologies
• Direct inspection usually not allowed
Qwest US Fiber Map – June 2001
(to Japan)
(to Hawaii)
(to Europe)
Source: www.qwest.com
Sprint backbone
Measuring Topology
• Marketing documents not very helpful
• The task of “discovering” the network has been left to
experimentalists
– Must develop sophisticated methods to infer this topology
from appropriate network measurements.
– Many possible measurements that can be made.
Router-Level Topology
Routers
Hosts
• Nodes are machines
(routers or hosts)
running IP protocol
• Measurements taken
from traceroute
experiments that
infer topology from
traffic sent over
network
• Subject to sampling
errors and bias
• Requires careful
interpretation
AS Topology
• Nodes are entire
networks (ASes)
• Links = peering
relationships
between ASes
• Relationships
inferred from
Border Gateway
Protocol (BGP)
information
• Really a measure
of business
relationships, not
network structure
AS
1
AS
3
AS
2
AS
4
Measuring Topology
• Marketing documents not very helpful
• The task of “discovering” the network has been left to
experimentalists
– Must develop sophisticated methods to infer this topology
from appropriate network measurements.
– Many possible measurements that can be made.
– Each type of measurement has its own strengths, weaknesses,
and idiosyncrasies, and results in a distinct view of the
network topology.
• Hard to know what “matters”…
“Standard” Approach
1. Choose a sequence of well-understood metrics or
observed features of interest, such as
– hierarchy
– node-degree distributions
– clustering coefficients
2. Develop a method that matches these metrics
Pros:
– Always possible to obtain a good “fit” on a chosen metric.
Cons:
– Hard to choose the “right” metric. What is “right” is apt to vary,
depending on the intended use of the topology.
– A method that does a good job of matching the chosen metric often
does not fit other metrics well.
– No predictive power.
We call this approach descriptive (evocative) modeling.
An Alternative Approach
1.
2.
Identify the causal forces at work in the design and
evolution of real topologies.
Develop methods that generate and evolve topologies in a
manner consistent with these forces.
Pros:
–
–
–
–
Ability to generate (and design!) topologies at different levels of
hierarchy
More realistic topology
Greater predictive power
Possibly reveal some relationship with routing protocols
Cons:
–
–
Difficult to identify the causal forces
Requires careful development and diligent validation
We call this approach explanatory modeling.
Our Approach: Focus on the ISP
• Capture and represent realistic drivers of Internet
deployment and operation at the level of the single ISP
• Many important networking issues are relevant at the level
of the ISP (e.g. configuration, management, pricing,
provisioning)
• Common topologies represented in terms of ISPs
– Router-level graphs  connectivity within the ISP
– AS graphs  connectivity between ISPs
• First-Order Objective: the ability to generate a “realistic, but
fictitious” ISP topology at different levels of hierarchy
ISP Driving Forces
• Economic Factors:
– Cost of procuring, installing, and maintaining the necessary
facilities and equipment
– Limited budget for capital expenditures
– Need to balance expenditures with revenue streams
– Need to leverage investment in existing infrastructure
– Location of customers
• Technological Factors:
– Hardware constraints (e.g. router speeds, limited # interfaces
or line cards per router)
– Level 2 Technologies (Sonet, ATM, WDM)
– Existing legacy infrastructure
– Location and availability of dark fiber
Mathematical Framework
1. Use combinatorial optimization to represent the
problem and its constraints
– Objectives (min cost, max profitability, satisfy demand)
– Constraints (equipment costs/capacities, legacy
infrastructure)
– Parameters (pricing, provisioning, facility location)
2. Study and explore how this framework allows for
a range of ISP behavior
– Effect of objectives and constraints are the most
important
3. Part of a more general framework
–
Highly Optimized Tolerance (HOT), Carlson and Doyle,
1999
Question Three:
How is research on Internet topology different
from what OR/MS researchers are used to doing
on other complex engineering networks?
A Different Type of Problem
• Researchers in network optimization are used to
problems in which the objectives and constraints are
well-defined
• Here, we are trying to uncover the “most significant”
drivers of topology evolution so that we can create
“fictitious, yet realistic” network counterparts, and
also so that we can design and build improved
networks
• We will need to iterate between modeling,
measurement, and analysis to get it right
• In this sense, this is a bit more like biology
• Recent progress has given hope that a comprehensive
theory for the Internet may be possible
Heterogeneity of the Internet
• Full of “high variability”
–
–
–
–
–
Link bandwidth: Kbps – Gbps
File sizes: a few bytes – Mega/Gigabytes
Flows: a few packets – 100,000+ packets
In/out-degree (Web graph): 1 – 100,000+
Delay: Milliseconds – seconds and beyond
• Diversity in the technologies that comprise the
physical and link layers
• Diversity in the applications and services that are
supported
• This heterogeneity has evolved organically from an
architecture that was designed to be robust to
changes (failures or innovation) and is permanent
The Internet hourglass
Applications
Web
FTP
Mail
News
Video
Audio
ping
Kazaa
Transport protocols
TCP SCTP UDP
ICMP
IP
Ethernet 802.11
Power lines ATM
Optical
Link technologies
Satellite Bluetooth
The Internet hourglass
Applications
Web
FTP
Mail
News
Video
Audio
ping
Kazaa
TCP
IP
Ethernet 802.11
Power lines ATM
Optical
Link technologies
Satellite Bluetooth
The Internet hourglass
Applications
Web
FTP
Mail
News
Video Audio
Everything
on IP
ping
Kazaa
TCP
IP
Ethernet 802.11
IP on
Power lines ATM Optical
everything
Link technologies
Satellite Bluetooth
A Theory for the Internet?
Applications
TCP/
AQM
IP
Link
• Vertical decomposition of the
protocol stack allows for the
treatment of layers in isolation (a
separation theorem)
• Assume that layers not considered
perform in a near-optimal manner
• Use an engineering design-based
perspective in two ways:
– Analysis: explain the complex
structure that is observed
– Synthesis: suggest changes or
improvements to the current
design
A Theory for the Internet?
Applications
TCP/
AQM
IP
Link
How to design an
application that “performs
well” in meeting user
demands subject to the
resources/constraints made
available by TCP/IP?
A Theory for the Internet?
Applications
TCP/
AQM
IP
Link
How to design a network
that “performs well” and
satisfies traffic demands
subject to the physical
resources/constraints?
A Theory for the Internet?
Applications
TCP/
AQM
?
If TCP/AQM is the
answer, what is the
question?
max
xs  0
subject to
U ( x )
s
s
s
Rx  c
IP
Primal/dual model of TCP/AQM
congestion control…
Link
A Theory for the Internet?
Applications
TCP/
AQM
IP
?
Link
If the current topology of
the Internet is the answer,
what is the question?
Next Question:
What’s been done to try and understand the
large-scale structure of the Internet?
How should we think about the Internet’s
router-level topology?
Trends in Topology Modeling
Observation
Modeling Approach
• Long-range links are expensive
(router-level).
• Random graph models
(Waxman, 1988) generate
connectivity-only topologies
• Real networks are not random, but
have obvious hierarchy (routerlevel).
• Structural models (GT-ITM
Calvert/Zegura, 1996) generate
connectivity-only topologies
with inherent hierarchy
• Router-level and AS graphs exhibit
heavy-tailed distributions (power
laws) in characteristics such as
node-degree.
• Degree-based models (including
popular “scale-free” models)
generate connectivity-only
topologies with inherent power
laws in node degree distribution
Power Laws and Internet Topology
A few nodes have lots of connections
number of connections
Source: Faloutsos et al (1999)
rank
rank
Most nodes have few connections
Observed scaling in node degree and other statistics:
– Autonomous System (AS) graph
– Router-level graph
How to account for high variability in node degree?
Power Laws in Topology Modeling
• Recent emphasis has been on whether or not a
given topology model/generator can reproduce the
same types of macroscopic statistics, especially
power law-type degree distributions
• Lots of degree-based models have been proposed
– All of them are based on random graphs, usually with
some form of preferential attachment
– All of them are connectivity-only models and tend to
ignore engineering-specific system details
• Examples: BRITE, INET, Barabasi-Albert, GLP,
PLRG, CMU-generator
Models of Internet Topology
• These topology models are merely descriptive
– Measure some feature of interest (connectivity)
– Develop a model that replicates that feature
– Make claims about the similarity between the real system
and the model
– A type of “curve fitting”?
• Unfortunately, by focusing exclusively on node degree
distribution, these models that get the story wrong
• We seek something that is explanatory
– Consistent with the drivers of topology design and
deployment
– Consistent with the engineering-related details
– Can be verified through the measurement of appropriate
system-specific details
Our Perspective
• Must consider the explicit design of the Internet
– Protocol layers on top of a physical infrastructure
– Physical infrastructure constrained by technological
and economic limitations
– Emphasis on network performance
– Critical role of feedback at all levels
• Consider the ability to match large scale statistics
(e.g. power laws) as secondary evidence of
having accounted for key factors affecting design
Trends in Topology Modeling
Observation
Modeling Approach
• Long-range links are expensive
(router-level).
• Random graph models
(Waxman, 1988) generate
connectivity-only topologies
• Real networks are not random, but
have obvious hierarchy (routerlevel).
• Structural models (GT-ITM
Calvert/Zegura, 1996) generate
connectivity-only topologies
with inherent hierarchy
• Router-level and AS graphs exhibit
heavy-tailed distributions (power
laws) in characteristics such as
node-degree.
• Degree-based models (including
popular “scale-free” models)
generate connectivity-only
topologies with inherent power
laws in node degree
• Physical networks have hard
technological (and economic)
constraints.
• Optimization-driven models
generate annotated topologies
consistent with design tradeoffs of
network engineers
HOT
Highly
Heavily
Heuristically
Optimized
Organized
Tolerance
Tradeoffs
• Based on ideas of Carlson and Doyle (1999)
• Complex structure (including power laws) of highly
engineered technology (and biological) systems is viewed
as the natural by-product of tradeoffs between systemspecific objectives and constraints
• Non-generic, highly engineered configurations are
extremely unlikely to occur by chance
• Result in “robust, yet fragile” system behavior
Heuristic Network Design
What factors dominate network design?
• Economic constraints
– User demands
– Link costs
– Equipment costs
• Technology constraints
– Router capacity
– Link capacity
Connection Speed (Mbps)
Internet End-User Bandwidths
1e4
1e3
POS/Ethernet
1-10Gbps
academic
and corporate
1e2
1e1
1
1e-1
high
performance
computing
Ethernet
10-100Mbps
a few users have
very high speed
connections
most users
have low speed
connections
residential and
small business
Broadband
Cable/DSL
~500Kbps
How to build a
network
that
1e-2
1e6
1e4
satisfies 1these end1e2
user demands? Rank (number of users)
Dial-up
~56Kbps
1e8
Economic Constraints
• Network operators have a limited budget to
construct and maintain their networks
• Links are tremendously expensive
• Tremendous drive to operate network so that
traffic shares the same links
– Enabling technology: multiplexing
– Resulting feature: traffic aggregation at edges
– Diversity of technologies at network edge (Ethernet,
DSL, broadband cable, wireless) is evidence of the
drive to provide connectivity and aggregation using
many media types
Heuristically Optimal Network
Mesh-like core of fast,
Coresrouters
low degree
High
degree
Edges
nodes are at
the
edges.
Hosts
Heuristically Optimal Network
Claim: economic considerations alone suggest
a structure having
– Mesh-like core of high-speed, low degree routers
– High degree, low-speed nodes at the edge
• Is this consistent with technology capability?
• Is this consistent with real network design?
Cisco 12000 Series Routers
• Modular in design, creating flexibility in configuration.
• Router capacity is constrained by the number and speed of
line cards inserted in each slot.
Chassis
Rack size
Slots
Switching
Capacity
12416
Full
16
320 Gbps
12410
1/2
10
200 Gbps
12406
1/4
6
120 Gbps
12404
1/8
4
80 Gbps
Source: www.cisco.com
Cisco 12000 Series Routers
Technology constrains the number and capacity of line cards
that can be installed, creating a feasible region.
Cisco 12000 Series Routers
Pricing info: State of Washington Master Contract, June 2002
(http://techmall.dis.wa.gov/master_contracts/intranet/routers_switches.asp)
$2,762,500
$1,667,500
$932,400
$560,500
$602,500
$381,500
$212,400
$128,500
Technological
advance
160Gb
bandwidth
10Gb
Technically
feasible
2.5Gb
625Mb
155Mb
log/log
1
16
256
degree
Technologically Feasible Region
Core
backbone
Bandwidth (Mbps)
1000000
High-end
gateways
100000
cisco 12416
cisco 12410
10000
cisco 12406
1000
cisco 12404
100
cisco 7500
10
cisco 7200
cisco 3600/3700
1
1
10
0.1
0.01
Older/cheaper
technology
100
degree
1000
10000
cisco 2600
linksys 4-port router
Edge
Shared media
(LAN, DSL,
Cable, Wireless,
Dial-up)
uBR7246 cmts
(cable)
cisco 6260 dslam
(DSL)
cisco AS5850
(dialup)
Intermountain
GigaPoP
Front Range
GigaPoP
Northern Lights
U. Memphis
Indiana GigaPoP
U. Louisville
Great Plains
Merit
OARNET
Qwest Labs
Arizona St.
WiscREN
OneNet
NCSA
U. Arizona
Iowa St.
StarLight
MREN
Oregon
GigaPoP
Pacific
Northwest
GigaPoP
NYSERNet
Pacific
Wave
UNM
Denver
Kansas
City
WPI
Indianapolis
Chicago
Northern
Crossroads
Seattle
SINet
U. Hawaii
New York
ESnet
AMES NGIX
Wash
D.C.
Sunnyvale
WIDE
CENIC
SURFNet
Rutgers
Los Angeles
MANLAN
UniNet
Houston
TransPAC/APAN
Abilene Backbone
Physical Connectivity
(as of December 16, 2003)
OC-3 (155 Mb/s)
OC-12 (622 Mb/s)
GE (1 Gb/s)
OC-48 (2.5 Gb/s)
OC-192/10GE (10 Gb/s)
North Texas
GigaPoP
SFGP/
AMPATH
Texas
GigaPoP
Miss State
GigaPoP
UT Austin
UT-SW
Med Ctr.
Atlanta
SOX
Texas Tech
LaNet
Tulane U.
GEANT
Florida A&M
U. So. Florida
MAGPI
PSC
DARPA
BossNet
UMD NGIX
Mid-Atlantic
Crossroads
Drexel
U. Florida
U. Delaware
NCNI/MCNC
CENIC Backbone (as of January 2004)
Corporation for Education
Network Initiatives in
California (CENIC) runs the
educational backbone for the
State of California
OC-3 (155 Mb/s)
OC-12 (622 Mb/s)
GE (1 Gb/s)
OC-48 (2.5 Gb/s)
10GE (10 Gb/s)
Cisco 750X
COR
Cisco 12008
dc1
Cisco 12410
dc1
OAK
Abilene
Sunnyvale
dc2
hpr
SAC
hpr
dc1
dc2
FRG
dc2
Backbone topologies for
both Abilene and CENIC
are built as a mesh of high
speed, low degree routers.
dc1
SVL
dc3
FRE
dc1
SOL
dc1
BAK
dc1
SLO
dc1
As one moves from the
core out toward the edge,
connectivity gets higher,
and speeds get lower.
dc1
hpr
hpr
hpr
LAX
dc2
Abilene
Los Angeles
dc1
dc3
TUS
SDG
dc1
hpr
dc3
dc1
CENIC Backbone for Southern California
to Sunnyvale
to Fremont
to Soledad
SLO
dc1
LA CCD, LA City, LA Harbor,
LA Mission, LA Pierce, LA
Southwest, LA Trade Tech, LA
UC Santa
Valley, Moorpark, Mt. San
Barbara
Antonio, Oxnard
Antelope Valley CC,
Cerritos, Citrus, College of
the Canyons, Compton,
East LA, El Camino CC,
Glendale, Long Beach City
College, Pasadena CC,
Santa Monica, Ventura
College
to Sacramento
BAK
hpr
dc1
Caltech
UCLA
LAX
dc2
Los
Nettos
Abilene
hpr
UC Irvine
dc1
UCSSN
(Las Vegas)
dc3
LAAP
San Bernardino CSS
Riverside COE
dc1
CUDI Peer,
ESNet Peer
UC Riverside
SDG
Orange COE
TUS
Monrovia
USD Gigaman
Los Angeles COE
San Diego CC,
Soutwestern CC,
Grossmont, Cuyamaca,
Imperial Valley, Mira
Costa CC, Palomar
College
Chaffey, Crafton Hills,
Cypress, Fullerton CC,
Mt. San Jacinto, Rio
Hondo, Riverside, San
Bernardino CCD, San
Bernardino Valley, N.
Orange Cty CCD, Santa
Ana College
hpr
dc3
dc1
San Diego COE
SDSC
UC San Diego
LA USD
Johnson & Johnson
Chaffey Joint USD
Heuristically Optimal Network
• Mesh-like core of high-speed, low degree routers
• High degree, low-speed nodes at the edge
• Claim: consistent with drivers of topology design
– Economic considerations (traffic aggregation)
– End user demands
• Claim: consistent with technology constraints
• Claim: consistent with real observed networks
Question: How could anyone imagine anything else?
Two opposite views of complexity
Physics:
Engineering and math:
• Constraints
• Pattern formation by
reaction/diffusion
• Tradeoffs
• Edge-of-chaos
• Structure
• Order for free
• Organization
• Self-organized Principle
criticalityDifference:
• Optimality
Random
vs.• Robustness/fragility
Designed
• Phase transitions
• Scale-free networks
• Verification
• Equilibrium, linear
• Far from equilibrium
• Nonlinear, heavy tails as • Nonlinear, heavy tails
exotica
as tool
Models of Internet Topology
• To physicists, scaling relationships are suggestive of
critical phenomenon and phase transitions
• The starting assumption is that of randomness, and one
looks for emergent behaviors
Random Networks
Two methods for generating random networks having
power law distributions in node degree
• Preferential attachment (“scale-free” networks)
– Inspired by statistical physics
– Barabasi et al.; 1999
• Power Law Random Graph (PLRG)
– Inspired by graph theory
– Aiello, Chung, and Lu; 2000
Summary of “Scale-Free” Story
• Fact: “Scale-free” networks have roughly power law
degree distributions
• Claim:
– If the Internet has power law degree distribution
– Then it must be “scale-free” (oops)
– Therefore, it has the properties of a “scale-free” network
• Characteristic features of “scale free” networks
– High degree central “hubs”
– Network connectivity is robust to loss of random nodes,
but fragile to attack on central hubs
– Highly likely to result from various random constructions
One of
the most-read
papers ever on
the Internet!
Scientists spot Achilles heel of the
Internet
• "The reason this is so is because there are a couple
of very big nodes and all messages are going
through them. But if someone maliciously takes
down the biggest nodes you can harm the system
in incredible ways. You can very easily destroy the
function of the Internet."
• These scientists compared the structure of the
Internet to the airline network of the United States.
Key Points
• The scale-free story is based critically on the implied
relationship between power laws and a network
structure that has highly connected “central hubs”
• Not all networks with power law degree
distributions have properties of scale free networks.
(The Internet is just one example!)
• Building a model to replicate power law data is no
more than curve fitting (descriptive, not explanatory)
• The ubiquity of heavy-tailed (power-law)
relationships in highly variable phenomena is to be
expected for statistical reasons alone and requires no
“special” or “exotic” explanation
End Result
The “scale-free” claims of the Internet are not
merely wrong, they suggest properties that are
the opposite of the real thing.
Fundamental difference:
random vs. designed
Internet
topologies
nodes=routers
edges=links
25 interior routers
818 end systems
“scale-rich” vs. “scale-free”
rank
1
10
Low degree
mesh-like core
0
10
High degree hublike core
identical
power-law
degrees
1
2
degree these
How to characterize 10
/ compare
two networks?
10
Network Performance
Given realistic technology constraints on routers,
how well is the network able to carry traffic?
Step 1: Constrain to
be feasible
Step 2: Compute traffic demand
Bj
Bandwidth (Mbps)
1000000
xij
100000
10000
1000
100
Bi
Abstracted
Technologically
Feasible Region
Step 3: Compute max flow 
max  xij  max  Bi B j

10
degree
1
10
100
1000
i, j
s.t.
i, j
x
i , j:krij
ij
 Ck , k
Network Likelihood
How likely is a particular graph (having given
node degree distribution) to be constructed?
• Notion of likelihood depends on defining an
appropriate probability space for random graphs.
• Many methods (all based on probabilistic
preferential attachment) for randomly generating
graphs having power law degree distributions:
– Power Law Random Graph (PLRG) [Aiello et al.]
– Random rewiring (Markov chains)
d
d

In both cases, LogLikelihood (LLH) 
i
i, j
connected
j
Why such striking
differences with same
node degree distribution?
Fast
Performance
Slow
Low
High
Likelihood
Fast
Slow
Low
High
Likelihood
Performance
Likelihood
Bandwidth (Mbps)
Fast core
100000
100000
10000
10000
1000
1000
100
100
High-degree
edge
10
Slow
core
Slower
edge
10
1
1
1
10
100
degree
1
10
100
degree
HOT “scale-rich”
•
•
•
•
Core: Mesh-like, low degree
Edge: High degree
Robust to random 
Robust to “attack”
“Scale-free”
•
•
•
•
Core: Hub-like, high degree
Edge: Low degree
Robust to random 
Fragile to “attack”
+ objectives and constraints
•
•
•
•
•
High performance
Low link costs
Unlikely, rare, designed
Destroyed by rewiring
Similar to real Internet
•
•
•
•
•
Low performance
High link costs
Highly likely, generic
Preserved by rewiring
Opposite of real Internet
HOT
Low Likelihood
Low Performance
Random
Hierarchical
Scale-Free (HSF)
Most Likely
Key Points
1. High performance networks are extremely
unlikely to be found by a random process.
2. Models that focus on “highly likely”
constructions will result in graphs that are
poorly performing and are not representative
of highly engineered networks.
Recap
• We do not claim that our “heuristically optimal topology” is
an accurate representation of the real Internet, simply that it
captures some basic features observed in real networks.
• But it goes a long way to dispelling much of the confusion
about heavy-tailed node degree distributions (i.e. “scale-free”
models are fundamentally inconsistent with engineering
design despite their ability to match macro-statistics).
• “Scale-free” models may be good representations of other
systems, simply not the router-level of the Internet.
• This highlights the importance of “random vs. designed”.
• It is remarkable how even simple models based on
fundamental technological and economic tradeoffs can go a
long way to explaining large-scale network features.
• These models are a only starting point for a more detailed
investigation of Internet topology.
Final Question:
What still needs to be done to understand
the large-scale structure of the Internet?
How can researchers in OR/MS help to
solve this problem?
(to be done)
• Understand the relationship between
optimization drivers and topology
– Example: Papadimitriou’s HOT
• Description of more detailed optimization
models that account for real economic and
technological considerations (?)
• Tie-in to WDM network optimization
currently in vogue
• (Need help thinking this through)
Today’s Agenda
• Review recent work of empiricists and theoreticians to
understand the router-level topology of the Internet
• Understand the causes and implications of “heavy tails”
in the complex structure of the Internet
• Illustrate how recently popular “scale-free” models of
Internet topology are not just wrong, but wildly so
• Describe the importance of optimization in the
development of explanatory models of Internet topology
• Present the HOT framework as an alternative means to
understanding the “robust, yet fragile” structure of the
Internet and other complex engineering systems
• Highlight some open research problems and areas where
contributions can be made by the OR/MS community
Thank You
[email protected]
Appendix
Recent Measurement Experiments
• R. Govindan and H. Tangmunarunkit. Heuristics for Internet
Map Discovery, Proceeding of IEEE INFOCOM (2000) [often
known as Mercator Project]
• L. Gao. On inferring autonomous system relationships in the
Internet, in Proc. IEEE Global Internet Symposium, November
2000.
• Route Views, University of Oregon Route Views Project,
Available at http://www.antc.uoregon.edu/route-views/.
• A. Broido and k. Claffy. Internet Topology: Connectivity of IP
Graphs, Proceeding of SPIE ITCom WWW Conf. (2001)
[often known as Skitter Project]
• N. Spring, R. Mahajan, and D.Wetherall. Measuring ISP
Topologies with Rocketfuel, Proc. ACM SIGCOMM (2002)
• L. Subramanian, S. Agarwal, J. Rexford, and R. Katz.
Characterizing the Internet Hierarchy from Multiple Vantage
Points, Proc. IEEE INFOCOM (2002)
• H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger.
Towards Capturing Representative AS-Level Internet Topologies
Proc. Of ACM SIGMETRICS (2002)
Models based on preferential attachment:
• INET: use both curve fitting and preferential attachment. It first calculates the
frequency-degree and rank-degree distributions. It then assigns degrees to each
node according to these distributions. Finally it matches these degrees
according to the linear preferential model.
• BRITE: BRITE incorporates recent preferential attachment and observations
of skewed network placement and locality in network connections on the
Internet.
• BA: preferential attachment
• AB: preferential attachment + adding a third rewiring operation consisting of
choosing links randomly and re-wiring each end of them according to the same
linear preference rule as used in BA generator.
• GLP (Generalized Linear Preference): change preferential probability from
d_i/sum(d_i) to (d_i-beta)/(sum(d_i-beta)), and beta is a tunable paparmenter
between (-\infinity, 1). With some probabilty, add links preferentially and with
some other probability, add nodes preferentially.
Other models:
• PLRG: given N nodes with expected degree distribution, assign links among
links with probability proprotional to the product of the expected degree of the
two end points.
• CMU Power-law generator: two ways: 1. assign a power-law degree
distribution to nodes and then place links in the adjacency matrix such that
every node obtains the assigned degree. 2. Recusive way: define a probability
distribution function that randomly selects a pair of nodes and use it to produce
a network graph with real-valued edge weights.