ppt - Duke Computer Science

Download Report

Transcript ppt - Duke Computer Science

Scalability and Accuracy in a LargeScale Network Emulator
Amin Vahdat, Ken Yocum, Kevin Walsh, Priya
Mahadevan, Dejan Kostić, Jeff Chase,
and David Becker
Duke University
Proceedings of 5th Symposium on Operating Systems Design and
Implementation (OSDI 2002)
2005/02/23
1/27
Introduction
Evaluate Internet-scale distributed
systems
●
●
E.g. peer-to-peer, overlay, wide-area replication
Realistic scenarios: real world
●
●
●
Difficult to deploy and administer
Results not reproducible or not necessarily representative of future behaviour
Simulations: e.g. NS
●
●
●
More control
May miss important system interactions
Emulation
●
●
●
●
Run unmodified code on target platforms
More control: can subject system traffic to constraints (bandwidth, latency, loss rate,
topology,…)
Thus far limited to small and static systems
 ModelNet
2005/02/23
2/27
Goal of ModelNet
●
Environment should support:
●
●
●
●
2005/02/23
Unmodified applications
Reproducible results
Experimentation under broad range of network
topologies and dynamically changing network
characteristics
Large-scale experiments with large number of
nodes and high traffic
3/27
ModelNet Architecture
Scalable Internet
emulation environment
●
Based on dummynet, extended to
improve accuracy and include
multi-hop and multi-core emulation
●
Edge nodes running
user-specified OS and
applications
●
Each instance is a virtual edge node (VN)
with unique IP in emulated topology
Route traffic through core routers
●
●
Core nodes emulate
behaviour of configured
target network
●
●
●
2005/02/23
Captures effects of congestion and cross-traffic
Uses emulated links or pipes
4/27
ModelNet Phases
CREATE
●
●
●
●
2005/02/23
Generate network
topology  GML graph(*)
Can use Internet traces,
BGP dumps, synthetic
topology generators
User can annotate graph
to specify packet loss
rates, failure distribution,
etc.
(*)
GML – graph modeling language
5/27
ModelNet Phases
DISTILL
●
Transform GML graph to
pipe topology to model
target network
Simplify network
●
●
●
2005/02/23
Trade accuracy for
reduced emulation cost
6/27
ModelNet Phases
ASSIGN
●
Map distilled topology to core
nodes, load balancing
Ideal assignment NPcomplete problem
●
●
●
Use simple greedy k-clusters
assignment
●
●
2005/02/23
Mapping pipes to cores depends
on routing, link properties and
traffic load
Randomly pick one node in the
topology for each core node,
then cores greedily select from
connected nodes in round-robin
7/27
ModelNet Phases
BIND
●
Assign VNs to edge nodes
●
●
●
●
●
2005/02/23
Can have multiple VNs per
physical edge node
Bind each physical node to a
single core
Install sets of pipes in
distilled topology and routing
tables with shortest-path
between VN pairs
Configure edge nodes with
IP addresses for each VN
8/27
ModelNet Phases
RUN
●
●
2005/02/23
Execute target
applications on edge
nodes
9/27
The Core
●
Principal tasks (in steady state)
Receive packets from network interface
Move packets
●
●
●
●
●
Pipe to pipe
Pipe to final destination
Moving packets is strictly higher priority than
receiving packets
Preferentially emulate packets already in core
 core CPU saturation results in dropped packets
at physical level rather than emulation
●
2005/02/23
10/27
The Core
Traffic routing
●
●
●
●
●
2005/02/23
Emulate links as pipes
Pre-computed shortestpath for all VN pairs
 requires O(n2) space
Route is ordered list of
pipes
Move packets through
pipes by reference
(packet descriptor)
11/27
The Core
Packet
scheduling
●
Heap of pipes sorted by
earliest deadline (exit time for
first packet in queue)
Scheduler executes once per
clock tick (10KHz), runs at
kernel’s highest priority
●
●
●
●
●
2005/02/23
Finds heaps with deadline later
than current time
Move packets to next destination
(tail of next pipe or VN)
Calculate new deadlines and
reinsert pipes into heap
12/27
The Core
Multi-core configuration
●
●
●
●
2005/02/23
Next pipe may be on
different core node
Transfer packet
descriptor to next node
Packet contents buffered
at entry core node and
forwarded to destination
upon delivery of packet
13/27
Scalability Issues
Bandwidth limitation
●
●
Memory requirement
●
●
ModelNet must buffer up to full bandwidth-delay product
of target network
Routing protocol
●
●
●
2005/02/23
Traffic through ModelNet core is limited to cluster’s
physical internal bandwidth
Assumes perfect routing protocol: shortest path between
all pairs of host
Instantaneous discovery of new shortest path upon node
or link failure
14/27
Setup for Experiments
●
Core routers:
●
●
●
●
Edge nodes:
●
●
●
2005/02/23
1.4 GHz Pentium-IIIs w/ 1 GB memory
FreeBSD-4.5-STABLE
Connected via 1GB switch
1 GHz Pentium-IIIs w/ 256 MB memory
Linux 2.4.17
Connected via 100Mb/s Ethernet
15/27
Baseline Accuracy
●
Accurately emulate target packet
characteristics on hop-by-hop basis
●
Use kernel logging to track performance and
accuracy
Run ModelNet scheduler at highest kernel
priority
●
Results:
●
●
●
Future improvement:
●
●
2005/02/23
Each hop accurately emulated to granularity of hardware timer (100μs)
Maintains accuracy up to 100% CPU utilization
in subsequent hops use packet dept handling to correct for emulation errors
16/27
Capacity
●
Quantify as
function of load
and # of hops
●
Single core
●
1-5 edge nodes
●
●
●
Each with up to 24 netperf
senders (24 VNs) and 24 receivers
1 Gb/s Ethernet connection
For 1 hop:
●
●
●
2005/02/23
1 Gb/s link
At 120 flows CPU is 50% used
Network link is bottleneck
17/27
Additional Cores
●
Deliver higher throughput
●
increasing probability of packet’s path
crossing node boundary
 cross-core traffic
●
Ability to scale depends on
●
●
●
2005/02/23
Introduces communication overhead
Application communication characteristics
Partitioning of topology (minimize cross-core traffic)
18/27
VN Multiplexing
●
Mapping of VNs to physical edge nodes
Enables larger-scale emulations
Affects emulation accuracy and scalability
●
●
●
●
●
2005/02/23
Context switch overhead
Scheduling behaviour
Resource contention at edge nodes
19/27
Tradeoff:
Accuracy vs. Scalability
●
●
●
●
2005/02/23
Impractical to model every packet and link for
large portion of Internet
Create controlled Internet-like execution
context for applications
Reduce overhead by making approximations
that minimally impact application behaviour
Ideally automate tradeoff to satisfy resource
conditions and report degree of inaccuracy to
user
20/27
Distillation
Hop-by-hop emulation
●
●
●
End-to-end emulation
●
●
●
●
●
2005/02/23
Distilled topology isomorphic to target network
Accurate but highest per packet cost
Collapse each path to single pipe  full mesh
Lowest overhead
Can capture raw network latency, bandwidth and loss rate
Cannot emulate link contention among competing flows
21/27
Distillation
Walk-in
●
Preserve first walk-in links, replace interior by full mesh
●
●
●
Cannot model contention in interior
●
Walk-out
●
Model under-provisioned core
Extend walk-in algorithm to preserve inner core
●
●
●
●
2005/02/23
Breadth-first traversal to find successive frontier sets (first frontier
set is set of all VNs)
Each packet traverses at most (2*walk-in)+1 pipes
Find “topological center” by generating successive frontiers until
one of size one or zero is found
Collapse paths between walk-in and walk-out
22/27
Distillation
Ring topology
20 routers
●
●
Interconnected at 20 Mb/s
●
●
●
20 VNs connected to each
router by 2 Mb/s links
VNs partitioned into
generator and receiver sets
●
●
●
●
Each generator sends to random
receiver
Hop by hop: 419 pipes
End to end: 79,800 pipes
Last-mile only: 400 edge links
and 190 interior links
2005/02/23
23/27
Changing Network Characteristics
Evaluation of adaptive Internet systems
User can
●
●
directly incorporate generators for competing traffic
●
accurate for emulation of “background” cross traffic
consumes resources at edge nodes and bandwidth at core
●
●
modify pipe parameters during emulation
●
inject cross traffic by dynamically low overhead, scales
independently of traffic rate
●
●
●
●
2005/02/23
does not capture all details of Internet packet dynamics (e.g. slow start,
bursty traffic)
not responsive to congestion
 emulation error grows with link utilization level
Fault injection
24/27
Case Studies
Network of Gnutella clients
●
●
10,000 nodes (100 VNs for each of the 100 edge nodes)
Support for emulation of ad hoc wireless
environments
●
●
Implemented but not presented in this paper
CFS(1)
●
●
Able to reproduce results from CFS implementation running on RON(2) testbed
(published by another group)
Replicated web services
●
●
●
●
Replay of trace to IBM’s main website
Able to show that one additional replica improves latency, third replica only marginally
beneficial
Ability to emulate contention on interior links crucial for obtaining these results
Adaptive overlays
●
●
●
ACDC: overlay that adapts to changing network conditions
Similar experiment results obtained by ModelNet and ns2
(1)
2005/02/23
CFS - Cooperative File System
(2) RON - Resilient Overlay Network (MIT)
25/27
Related Work
Many other efforts on emulation
●
●
Mostly focus on specific, static and small-scale systems
Netbed (Emulab)
●
●
●
Similar to ModelNet, except that ModelNet focuses on
scalable emulation of large-scale networks
Will integrate ModelNet efforts into Netbed
Competing research by WASP(1) project
●
●
●
●
Emulate network characteristics at end host
Requires emulation software on all edge nodes
Cannot capture congestion of multiple flows on single
pipe
(1)
2005/02/23
WASP – Wide Area Server Performance
(J.-Y. Pan, H. Bhanoo, E. Nahum, M. Rosu, C, Faloutsos, and S. Seshan)
26/27
Summary
ModelNet designed to support
●
●
●
●
●
Unmodified applications
Reproducible results
Broad range of network topologies and dynamically
changing characteristics
Large-scale experiments
●
Provided means of balancing accuracy and
cost
●
Presented case studies to show generality of
approach
2005/02/23
27/27