Transcript planetLab

PlanetLab
Operating System support*
*a work in progress
What is it?
A Distributed set of machines that must be shared
in an efficient way.. Where “efficient” can mean
a varied “lot”..
Goals
Distributed Virtualization
PlanetLab account, together with associated resources
should span through multiple nodes. (SLICE)
Unbundled management
Infrastructure services (running a platform as opposed to
running an application) over a SLICE providing variety of
services for the same functionality.
Design
4 main areas..
•
•
•
•
VM Abstraction - Linux vserver
Resource Allocation + Isolation - SCOUT
Network virtualization
Distributed Monitoring
“Node Virtualization”
• Full virtualization like Vmware - performance, lot of
memory consumed by each memory image
• Para virtualization like xen - more efficient, a promising
solution (but still has memory constraints)
• Virtualize at system call level like Linux vservers, UML support large number of slices with reasonable isolation
OS for each VM ?
• Linux vservers - linux inside linux
• Each vserver is a directory in a chroot jail.
• Each virtual server,
–
–
–
–
–
–
–
share binaries
has its own packages,
has its own services,
is a weaker form of root that provides a local super user,
has its own users, i.e own GID/UID namespace
is confined to using some IP numbers only and,
is confined to some area(s) of the file system.
Communication among ‘vservers’
• Not local sockets or IPC
• but via IP
– Simplifies resource management and isolation
– Interaction is independent of their locations
Reduced resource usage
• Physical memory
– Copy of write memory segments across
unrelated servers
• Unification (Disk space)
– Share files across contexts
– Hard linked immutable un-linkable files
Required modifications for vserver
• Notion of context
–
–
–
–
–
–
Isolate group of processes,
Each vserver is a separate context,
Add context id to all inodes,
Context specific capabilities were added,
Context limits can be specified,
Easy accounting for each contexts.
vserver implementation
• Initialize vserver
– Create a mirror of reference root file system
– Create two identical login account
• Switching from default shell (modified shell)
–
–
–
–
Switch to the Slice's vserver security context
Chroot to vserver’s root file system
Relinquish subset of true super user privileges
Redirect into other account in that vserver
“Isolation & Resource Allocation”
•
•
•
•
KeyKOS - strict resource accounting
Processor Capacity Reserves
Nemesis
Scout - scheduling along data paths (SILK)
Overall structuring
• Central infrastructure services ( Planet Lab Central )
– central database of principles, slices, resource allocation and
policies
– Creation, deletion of slices through exported interface
• Node manager
– Obtains resource information from central server
– Bind resources to local VM that belongs to a slice
• Rcap -> acquire( Rspecs )
• Bind( slice_id, Rcap )
** Every resource accesses goes through the node manager as
system call and validated using Rcap
Implementation
• Non renewable resources
– Disk space, memory pages, file descriptor
– Appropriate system calls wrapped to check with per slice resource
limits, increment usage.
• Renewable resources
– Fairness and guarantees
• Hierarchical token bucket queuing discipline
– Cap per-vserver total outgoing bandwidth
• SILK for CPU scheduling
– Proportional share scheduling using resource containers
“Network virtualization”
• Filters on network send and receive - like Exokernel and
Nemesis.
• Sharing and partitioning a single network address space by using a safe version of raw sockets.
• Alternative approach (similar to xen) - Assign different IP
address to each VM, each using the entire port space and
manage its own routing table. The problem is
unavailability of enough IPV4 addresses in the order of
1000 per node.
Safe raw sockets
• The Scout module manages all TCP and UDP ports and ICMP IDs to
ensure that there are no collisions between safe raw sockets and
TCP/UDP/ICMP sockets
• For each IP address, all ports are either free or "owned" by a slice.
• Two slices may split ownership of a port by binding it to different IP
addresses.
• Only two IP addresses for a node as of now.. External IP + loop back
address
• SLICE can reserve port as any other resource (Xclusive)
• SLICE can open 3 sockets on a port
– Error socket, consumer socket, sniffer socket
Monitoring
• Http Sensor server collects data from sensor interface
on each nodes.
• Clients can query form the sensor database
Scalability
• Limited by disk space
• Of course limited by kernel resources
– Need to recompile to increase resources
• Thank you..