A Framework for Highly-Available Cascaded Real

download report

Transcript A Framework for Highly-Available Cascaded Real

Internet-Scale Research at
Universities
Panel Session
SAHARA Retreat, Jan 2002
Prof. Randy H. Katz,
Bhaskaran Raman,
Z. Morley Mao,
Yan Chen
Problem Statement
Internet
Destination
Source
Peering:
exchange
perf. info.
Service cluster: compute
cluster capable of running
services
• Overlay network for service
composition
• Want to study recovery
algorithms
• Lots of client sessions
• Methodology for evaluation of
design?
– Simulation?
• Slow, does not scale with
#nodes, #client sessions
• Does not bring out
processing bottlenecks
– Real testbed?
• Cannot be large; setup and
management problems
• Non-repeatable, not good for
controlled design study
Our approach so far…
• Emulation platform
– Real implementation of software, but emulation of
n/w parameters
– Inspired by NistNET
– Developed our own user-level implementation
• Gave us better control
– Runs on the Millennium cluster of workstations
– Central bottleneck: 20,000 pkts/sec
App
Node 1
Emulator
Rule for 12
Rule for 13
Lib
Rule for 34
Node 2
Rule for 43
Node 3
Node 4
Parameters modeled
• Overlay topology:
– Generate 6,510-node physical network using GT-ITM
– Choose subset of nodes for overlay network
• Latency modeling:
– Base latency according to edge weight
– Variation in accordance with: RTT spikes are isolated
• Outage period:
– Using traces
– Collected UDP-based measurements across 12 host pairs
– Berkeley, Stanford, UNSW (Australia), UIUC, TU-Berlin
(Germany), CMU
– CDF of outage periods, used to model outage periods
My experience in Internet
measurement
• Goal
– collect client-Local DNS server associations
– to evaluate DNS-based server selection
• Built a measurement infrastructure
• Three components
– 1x1 pixel embedded transparent GIF image
• <img src=http://xxx.rd.example.com/tr.gif height=1
width=1>
– A specialized authoritative DNS server
• Allows hostnames to be wild-carded
– An HTTP redirector
• Always responds with “302 Moved Temporarily”
• Redirect to a URL with client IP address embedded
My experience in Internet
measurement
1. HTTP GET request for the image
Client
[10.0.0.1]
2. HTTP redirect to
IP10-0-0-1.cs.example.com
Redirector for
xxx.rd.example.com
Content server for the image
4. Request to resolve IP10-0-0-1.cs.example.com
Local DNS server
5. Reply: IP address of content server
Name server for
*.cs.example.com
My lessons
• Common myths about Internet measurements
– Measurements done from University sites are
representative of the Internet
– The following are good proximity metrics:
• AS hop count
• Router hop count
– I can just quote some measurement results from previous
papers
• W/o carefully considering its applicability
• A scalable measurement methodology helps ease
of adoption
Content Distribution Network (CDN)

Dynamic clustering for efficient Web contents replication



Network Topology:



Use greedy algorithm for replica placement to reduce the
response latency of end users
Trace-driven simulation to find optimal granularity of replication
Pure-random & transit-Stub models from GT-ITM
A real AS-level topology from 7 widely-dispersed BGP peers
Real world traces:
Web Site
Period
Duration
Total Requests
Requests/day
MSNBC
8-10/1999
10–11am
10,284,735
1,469,248 (1 hr)
NASA
7/1995
All day
3,461,612
56,748
WorldCup
5-7/1998
All day
1,352,804,107
15,372,774
-- Cluster MSNBC Web clients with BGP prefix
- BGP tables from a BBNPlanet router on 01/24/2001
- 10K clusters left, chooses top 10% covering >70% of requests
-- Cluster NASA Web clients with domain names
Wide-area Network Distance Estimation
• Problem formulation:
Given N end hosts that belong to different administrative domains, how
to select a subset of them to be probes and build an overlay distance
estimation service without knowing the underlying topology?
• Solution: Internet Iso-bar
– Cluster of hosts that perceive similar performance to Internet &
select a monitor for each cluster for active and continuous probing
– Clustering with congestion/path outage correlation
– Evaluate the prediction accuracy and stability
• Evaluation Methodology (I)
– NLANR AMP data set
• 119 sites on US (106 after filtering out most off sites)
• Traceroute between every pair of hosts every minute
• Clustering uses daily geometric mean of round-trip time (RTT)
• Raw data: 6/24/00 – 12/3/01
Evaluation Methodology (II)
• Keynote Website Perspective benchmarking
– Measure Web site performance from more than 100 agents
– Heterogeneous core network: various ISPs
– Heterogeneous access network:
• Dial up 56K, DSL and high-bandwidth business connections
– Agents locations
•
•
•
•
America (including Canada, Mexico): 67 agents in 29 cities from 15 ISPs
Europe: 25 agents in 12 cities from 16 ISPs
Asia: 8 agents in 6 cities from 8 ISPs
Australia: 3 agents in 3 cities from 3 ISPs
– 40 most popular Web servers for benchmarking
• Side problem: how to reduce the number of agents and/or
servers, but still represent the majority of end-user
performance for reasonable long period?
Discussion: Difficulties of Internet
measurement
• Results vary greatly depending on your measurement
methodology
– The number and identity of sites you measure
• Commercial vs. educational sites
– Your measurement location
• Well-connected site vs. dialup site
• Backbone vs. access network, server vs. client
– Time when measurement is taken
• Time of day, day of year
• Transient effects
– E.g., Network congestion, flash crowd
– Frequency of measurements (for correlation studies)
– Intrusiveness of the measurement
• Does the measurement affect what you are measuring
Discussion: Issues with Emulation
• Emulation platform: modeling correlations in n/w
behavior
– What happens in one part of the Internet may have nonzero correlation with behavior of another part
• Scale of topology
– We have O(100) machines in department
– O(1500) machines on campus
– Is this believable?