PlanetLab Design - Computer Science & Engineering

Download Report

Transcript PlanetLab Design - Computer Science & Engineering

PlanetLab:
Catalyzing Network Innovation
Larry Peterson
Princeton University
October 2, 2007
Timothy Roscoe
Intel Research at Berkeley
Challenges
• Security
– known vulnerabilities lurking in the Internet

DDoS, worms, malware
– addressing security comes at a significant cost


federal government spent $5.4B in 2004
estimated $50-100B spent worldwide on security in 2004
• Reliability
– e-Commerce increasingly depends on fragile Internet



much less reliable than the phone network (three vs five 9’s)
risks in using the Internet for mission-critical operations
barrier to ubiquitous VoIP
– an issue of ease-of-use for everyday users
Challenges (cont)
• Scale & Diversity
– the whole world is becoming networked

sensors, consumer electronic devices, embedded processors
– assumptions about edge devices (hosts) no longer hold

connectivity, power, capacity, mobility,…
• Performance
– scientists have significant bandwidth requirements

each e-science community covets its own wavelength(s)
– purpose-built solutions are not cost-effective

being on the “commodity path” makes an effort sustainable
Two Paths
• Incremental
– apply point-solutions to the current architecture
• Clean-Slate
– replace the Internet with a new network architecture
• We can’t be sure the first path will fail, but…
– point-solutions result in increased complexity



making the network harder to manage
making the network more vulnerable to attacks
making the network more hostile to new applications
– architectural limits may lead to a dead-end
Architectural Limits
• Minimize trust assumptions
– the Internet originally viewed network traffic as fundamentally
cooperative, but should view it as adversarial
• Enable competition
– the Internet was originally developed independent of any
commercial considerations, but today the network architecture
must take competition and economic incentives into account
• Allow for edge diversity
– the Internet originally assumed host computers were connected to
the edges of the network, but host-centric assumptions are not
appropriate in a world with an increasing number of sensors and
mobile devices
Limits (cont)
• Design for network transparency
– the Internet originally did not expose information about its internal
configuration, but there is value to both users and network
administrators in making the network more transparent
• Enable new network services
– the Internet originally provided only a best-effort packet delivery
service, but there is value in making processing capability and
storage capacity available in the middle of the network
• Integrate with optical transport
– the Internet originally drew a sharp line between the network and the
underlying transport facility, but allowing bandwidth aggregation and
traffic engineering to be first-class abstractions has the potential to
improve efficiency and performance
Barriers to Second Path
• Internet has become ossified
– no competitive advantage to architectural change
– no obvious deployment path
• Inadequate validation of potential solutions
– simulation models too simplistic
– little or no real-world experimental evaluation
• Testbed dilemma
– production testbeds: real users but incremental change
– research testbeds: radical change but no real users
Recommendation
It is time for the research community, federal
government, and commercial sector to jointly
pursue the second path. This involves
experimentally validating new network
architecture(s), and doing so in a sustainable way
that fosters wide-spread deployment.
Approaches
• Revisiting definition & placement of function
–
–
–
–
naming, addressing, and location
routing, forwarding, and addressing
management, control, and data planes
end hosts, routers, and operators
• Designing with new constraints in mind
–
–
–
–
selfish and adversarial participants
mobile hosts and disconnected operation
large number of small, low-power devices
ease of network management
Deployment Story
• Old model
– global up-take of new technology
– does not work due to ossification
• New model
– incremental deployment via user opt-in
– lowering the barrier-to-entry makes deployment plausible
• Process by which we define the new architecture
– purists: settle on a single common architecture

virtualization is a means
– pluralists: multiplicity of continually evolving elements

virtualization is an ends
• What architecture do we deploy?
– research happens…
Validation Gap
Deployment
Analysis
(models)
Simulation / Emulation
(results)
(measurements)
Experiment At Scale
(code)
With Real Users
What is PlanetLab?
• An open, shared testbed for
– Developing
– Deploying
– Accessing
- planetary-scale services.
What would you do if you had Akamai’s
infrastructure?
PlanetLab
Motivation
• New class of applications emerging that spread over
sizable fraction of the web
• Architectural components starting to emerge
• The next Internet will be created as an overlay on the
current one
• It will be defined by services, not transport
• There is NO vehicle to try out the next n great ideas
in this area
PlanetLab
Guidelines (1)
• Thousand viewpoints on “the cloud” is what
matters
– not the thousand servers
– not the routers, per se
– not the pipes
PlanetLab
Guidelines (2)
• and you must have the vantage points of the
crossroads
– co-location centers, peering points, etc.
PlanetLab
Guidelines (3)
• Each service needs an overlay
covering many points
– logically isolated
• Many concurrent services and applications
– must be able to slice nodes => VM per service
– service has a slice across large subset
• Must be able to run each service / app over long period to
build meaningful workload
– traffic capture/generator must be part of facility
• Consensus on “a node” more important than “which node”
PlanetLab
Guidelines (4)
• Test-lab as a whole must be up a lot
– global remote administration and management
– redundancy within
• Each service will require own management capability
• Testlab nodes cannot “bring down” their site
– not on forwarding path
• Relationship to firewalls and proxies is key
PlanetLab
Guidelines (5)
• Storage has to be a part of it
– edge nodes have significant capacity
• Needs a basic well-managed capability
PlanetLab
Initial core team:
Intel Research:
David Culler
Timothy Roscoe
Brent Chun
Mic Bowman
Princeton:
Larry Peterson
Mike Wawrzoniak
University of Washington:
Tom Anderson
Steven Gribble
PlanetLab
PlanetLab
• 1000+ machines spanning 500 sites and 40 countries
• Supports distributed virtualization
each of 600+ network services running in their own slice
Requirements
1) It must provide a global platform that supports
both short-term experiments and long-running
services.
– services must be isolated from each other
– multiple services must run concurrently
– must support real client workloads
Requirements
2) It must be available now, even though no one
knows for sure what “it” is.
– deploy what we have today, and evolve over time
– make the system as familiar as possible (e.g., Linux)
– accommodate third-party management services
Requirements
3) We must convince sites to host nodes running
code written by unknown researchers from other
organizations.
– protect the Internet from PlanetLab traffic
– must get the trust relationships right
Requirements
4) Sustaining growth depends on support for site
autonomy and decentralized control.
– sites have final say over the nodes they host
– must minimize (eliminate) centralized control
Requirements
5) It must scale to support many users with minimal
resources available.
– expect under-provisioned state to be the norm
– shortage of logical resources too (e.g., IP addresses)
Design Challenges
• Minimize centralized control without violating
trust assumptions.
• Balance the need for isolation with the reality
of scarce resources.
• Maintain a stable and usable system while
continuously evolving it.
Key Architectural Ideas
• Distributed virtualization
– slice = set of virtual machines
• Unbundled management
– infrastructure services run in their own slice
• Chain of responsibility
– account for behavior of third-party software
– manage trust relationships
Implementation Research Issues
•
•
•
•
Sliceability: distributed virtualization
Isolation and resource control
Security and integrity: exposed machines
Management of a very large, widely dispersed
system
• Instrumentation and measurement
• Building blocks and primitives
PlanetLab
Slice-ability
• Each service runs in a slice of PlanetLab
– distributed set of resources (network of virtual machines)
– allows services to run continuously
• VM monitor on each node enforces slices
– limits fraction of node resources consumed
– limits portion of name spaces consumed
• Issue: global resource discovery
– how do applications specify their requirements?
– how do we map these requirements onto a set of nodes?
29
Slices
Slices
Slices
User Opt-in
Client
Server
http://coblitz.org/www.princeton.edu/podcast.mp4
Per-Node View
Node
Mgr
Local
Admin
VM1
VM2
…
Virtual Machine Monitor (VMM)
VMn
Global View
…
PLC
…
…
Exploit Layer 2 Circuits
Deployed in NLR & Internet2 (aka VINI)
Circuits (cont)
Supports arbitrary virtual topologies
Circuits (cont)
Exposes (can inject) network failures
Circuits (cont)
BGP
BGP
BGP
BGP
Participate in Internet routing
Distributed Control of Resources
• At least two interested parties
– service producers (researchers)

decide how their services are deployed over available nodes
– service consumers (users)

decide what services run on their nodes
• At least two contributing factors
– fair slice allocation policy

both local and global components (see above)
– knowledge about node state

freshest at the node itself
40
Unbundled Management
• Partition management into orthogonal services
–
–
–
–
–
resource discovery
monitoring node health
topology management
manage user accounts and credentials
software distribution
• Issues
– management services run in their own slice
– allow competing alternatives
– engineer for innovation (define minimal interfaces)
41
Application-Centric Interfaces
• Inherent problems
– stable platform versus research into platforms
– writing applications for temporary testbeds
– integrating testbeds with desktop machines
• Approach
– adopt popular API (Linux) and evolve implementation
– eventually separate isolation and application interfaces
– provide generic “shim” library for desktops
42
Virtual Machines
• Security
– prevent unauthorized access to state
• Familiar API
– forcing users to accept a new API is death
• Isolation
– contain resource consumption
• Performance
– don’t want to be apologetic
43
Virtualization
Node Owner
VM1
Mgr
VM
VM2
…
VMn
Auditing service
Monitoring services
Brokerage services
Provisioning services
Virtual Machine Monitor (VMM)
Linux kernel (Fedora Core)
+ Vservers (namespace isolation)
+ Schedulers (performance isolation)
+ VNET (network virtualization)
Resource Allocation
• Decouple slice creation and resource allocation
– given a “fair share” (1/Nth) by default when created
– acquire/release additional resources over time

including resource guarantees
• Protect against thrashing and over-use
– link bandwidth

upper bound on sustained rate (protect campus bandwidth)
– memory

kill largest user of physical memory when swap at 85%
Confluence of Technologies
•
•
•
•
•
•
•
•
Cluster-based management
Overlay and P2P networks
Virtual machines and sandboxing
Service composition frameworks
Internet measurement
Packet processors
Colo services
Web services
 The time is now.
PlanetLab
Usage Stats
• Users: 2500+
• Slices: 600+
• Long-running services: ~20
– content distribution, scalable large file transfer,
– multicast, pub-sub, routing overlays, anycast,…
• Bytes-per-day: 4 TB
– 1Gbps peak rates not uncommon
• Unique IP-addrs-per-day: 1M
Validation Gap
Deployment
Analysis
(models)
Simulation / Emulation
(results)
(measurements)
Experiment At Scale
(code)
With Real Users
Deployment Gap
Maturity
Commercial Adoption
Pilot Demonstration (PL Gold)
Deployment Study (PlanetLab)
Controlled Experiment (EmuLab)
Analysis (MatLab)
Time
Economic Reality
User & Network Reality
Implementation Reality
Emerging applications
•
•
•
•
•
Content distribution
Peer-to-Peer networks
Global storage
Mobility services
Etc. etc.
Vibrant research community
embarking on new direction and
none can try out their ideas.
PlanetLab
Trust Relationships
Princeton
Berkeley
Washington
MIT
Brown
CMU
NYU
ETH
Harvard
HP Labs
Intel
NEC Labs
Purdue
UCSD
SICS
Cambridge
Cornell
…
Trusted
Intermediary
NxN
(PLC)
princeton_codeen
nyu_d
cornell_beehive
att_mcash
cmu_esm
harvard_ice
hplabs_donutlab
idsl_psepr
irb_phi
paris6_landmarks
mit_dht
mcgill_card
huji_ender
arizona_stork
ucb_bamboo
ucsd_share
umd_scriptroute
…
Principals
• Node Owners
– host one or more nodes (retain ultimate control)
– selects an MA and approves of one or more SAs
• Service Providers (Developers)
– implements and deploys network services
– responsible for the service’s behavior
• Management Authority (MA)
– installs an maintains software on nodes
– creates VMs and monitors their behavior
• Slice Authority (SA)
– registers service providers
– creates slices and binds them to responsible provider
Trust Relationships
(1) Owner trusts MA to map network
activity to responsible slice
MA
(2) Owner trusts SA to map slice to
responsible providers
6
1
(3) Provider trusts SA to create VMs on
its behalf
4
Owner
Provider
2
(5) SA trusts provider to deploy
responsible services
3
5
SA
(4) Provider trusts MA to provide
working VMs & not falsely accuse it
(6) MA trusts owner to keep nodes
physically secure
Architectural Elements
MA
Node
Owner
slice
database
Owner
VM
NM +
VMM
SCS
VM
SA
node
database
Service
Provider
Slice Creation
.
.
.
CreateVM(slice)
PI
User/Agent
PLC
(SA)
NM VM VM …
GetTicket( )
VMM
.
.
.
(redeem ticket with plc.scs)
plc.scs
SliceCreate( )
SliceUsersAdd( )
Brokerage Service
.
.
.
Bind(slice, pool)
PLC
(SA)
NM VM VM
VM …VM
VMM
User
BuyResources( )
Broker
.
.
.
(broker contacts relevant nodes)
PlanetLab: Two Perspectives
• Useful research platform
• Prototype of a new network architecture
What are people doing in/on/with/around PlanetLab?
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Network measurement
Application-level multicast
Distributed Hash Tables
Storage
Resource Allocation
Distributed Query Processing
Content Distribution Networks
Management and Monitoring
Overlay Networks
Virtualisation and Isolation
Router Design
Testbed Federation
…
Lessons Learned
• Trust relationships
– owners, operators, developers
• Virtualization
– scalability is critical
– control plane and node OS are orthogonal
– least privilege in support of management functionality
• Decentralized control
– owner autonomy
– delegation
• Resource allocation
– decouple slice creation and resource allocation
– best effort + overload protection
• Evolve based on experience
– support users quickly
Conclusions
• Innovation can come from anywhere
• Much of the Internet’s success can be traced to
its support for innovation “at the edges”
• There is currently a high barrier-to-entry for
innovating “throughout the net”
• One answer is a network substrate that supports
“on demand, customizable networks”
– enables research
– supports continual innovation and evolution
PlanetLab Software
Overview
Mark Huang
[email protected]
Node Software
•
Boot
–
–
•
Virtualization
–
–
–
•
Node Manager
NodeUpdate
PlanetLabConf
Slice Management
–
–
•
Linux kernel
VServer
VNET
Node Management
–
–
–
•
Boot CD
Boot Manager
Slice Creation Service
Proper
Monitoring
–
–
PlanetFlow
pl_mom
PLC Software
• Database server
– pl_db
• PLCAPI server
– plc_api
• Web server
– Website PHP
– Scripts
• Boot server
– PlanetLabConf scripts
• PlanetFlow archive
• Mail, Support (RT), DNS, Monitor, Build, CVS, QA
Boot Manager
• Boot Manager
– bootmanager/source/
 Main BootManager class, authentication, utility functions,
configuration, etc.
– bootmanager/source/steps/
 Individual “steps” of the install/boot process
– bootmanager/support-files/
 Bootstrap tarball generation
 Legacy support for old Boot CDs
Virtualization
• Linux kernel
– Fedora Core 8 kernel

VServer patch
• VServer
– util-vserver/

Userspace VServer management utilities and libraries
• VNET
– Linux kernel module
– Intercepts bind(), other socket calls
– Intercepts and marks all IP packets
– Implements TUN/TAP, proxy socket extensions
Node Management
•
Node Manager (pl_nm)
– sidewinder/

Thin XML-RPC shim around VServer (or other VMM) syscalls, and other
knobs
– util-python/

Miscellaneous Python utility functions
– util-vserver/python/

•
Python bindings for VServer syscalls
Node Update
– NodeUpdate/

•
Wrapper around yum for keeping node RPMs up-to-date
PlanetLabConf
– PlanetLabConf/


Pull-based configuration file distribution service
Most files dynamically generated on a per-node or per-node group basis
Slice Management
• Slice Creation Service (pl_conf)
– sidewinder/
 Runs in a slice
 Periodically downloads slices.xml from boot server
 Local XML-RPC API for delegated slice creation, query
• Proper
– proper/
 Simple local interface for executing privileged operations
 Bind mount(), privileged port bind(), root read()
Administration and Monitoring
• PlanetFlow (pl_netflow)
– netflow/

MySQL schema and initialization/maintenance scripts
– netflow/html/

PHP frontend
– netflow/pfgrep/

Console frontend
– ulogd/

Packet header collection, aggregation, and insertion
• PlanetLab Monitor (pl_mom)
– pl_mom/swapmon.py

Swap space monitor and slice reaper
– pl_mom/bwmon.py

Average daily bandwidth monitor
Database and API
• Database
– pl_db/

PostgreSQL schema generated from XML
• PLCAPI
– plc_api/specification/

XML specification of API functions
– plc_api/PLC/

mod_python implementation
Web Server
• PHP, Static, Generated
– plc_www/includes/new_plc_api.php

Auto-generated PHP binding to PLCAPI
– plc_www/db/

Secure portion of website
– plc_www/generated/

Generated include files
– plc/scripts/

Miscellaneous scripts
Boot Server
• Secure Software Distribution
– Authenticated, encrypted with SSL
– /var/www/html/boot/

Default location for Boot Manager
– /var/www/html/install-rpms/

Default /etc/yum.conf location for RPM updates
– /var/www/html/PlanetLabConf/


Server-side component
Mostly PHP