Extensible Resource Management for Networked Virtual Computing

Download Report

Transcript Extensible Resource Management for Networked Virtual Computing

Orca internals 101
Jeff Chase
Orcafest 5/28/09
D u k e
S y s t e m s
Summary of Earlier Talks
• Factor actors/roles along the right
boundaries.
– stakeholders, innovation, tussle
• Open contracts with delegation
– resource leases
• Orca as control plane for GENI:
– Aggregates are first-class (“authority”)
– Slice controllers are first-class (“SM”)
– Clearinghouse brokers enable policies
under GSC direction
Duke Systems
2
For more…
• For more on all that, see the slides
tacked onto the ends of the
presentation.
Duke Systems
3
Orca Internals 101
•
•
•
•
•
Leasing core: “shirako”
Plugins and control flow
Actor concurrency model
Lease state machines
Resource representations
Duke Systems
4
Actors: The Big Picture
Broker
request
ticket
delegate
Slice
Controller
redeem
lease
Authority
Duke Systems
5
Actors: The Big Picture
Integrate experiment
control tools here (e.g.,
Gush and DieselNet
tools by XMLRPC to
generic slice controller)
Broker
request
ticket
delegate
Slice
Controller
These are inter-actor RPC
calls made automatically:
you should not have to
mess with them.
Duke Systems
Integrate substrate
here with authorityside handler plugins
redeem
lease
Authority
6
Terminology
• Slice controller == slice manager ==
service manager == guest controller
– Blur the distinction between the “actor” and
the controller module that it runs.
• Broker == clearinghouse (or a service
within a clearinghouse)
– == “agent” (in SHARP, and in some code)
• Authority == aggregate manager
– Controls some portion of substrate for a site or
domain under a Management Authority
Duke Systems
7
Slice Controllers
• Separate application/environment demand
management from resource arbitration
Slice controller monitors
the guest and obtains/renews
leases to meet demand
Experiment
Manager
Site Authority
Slice
Controller
Aggregate/authorities monitor
resources, arbitrate access,
and perform placement of guest
requests onto resources
Experiment control tools (e.g., Gush
and DieselNet tools) e.g. with XMLRPC
to generic slice controller.
Duke Systems
8
ProtoGENI?
Possibility: export ProtoGENI XMLRPC from a generic slice controller
> > (One thing to consider is how close it is to protogeni, which is not that
> > complicated, something like:)
> From poster on protogeni.net. (We should be able to support it. )
> > - GetCredential() from slice authority
> > - CreateSlice() goes to slice authority
> > - Register() slice with clearinghouse
> > - ListComponents() goes to CH and returns list of AMs and CMs
> > - DiscoverResources() to AM or CM returns rspecs
> > - RequestTicket goes straight to an AM
> > - RedeemTicket also goes to the AM
> > - StartSliver to the AM after redeem: "bring sliver to a running state"
Duke Systems
9
Brokers and Ticketing
• Sites delegate control
of resources to a broker
Broker
request
– Intermediary/middleman
• Factor allocation
policy out of the site
ticket
delegate
Slice
Controller
– Broker arbitrates
resources under its control
– Sites retain placement policy
redeem
lease
Authority
• “Federation"
– Site autonomy
– Coordinated provisioning
Duke Systems
SHARP [SOSP 2003]
w/ Vahdat, Schwab
10
Actor structure: symmetry
• Actors are RPC clients and servers
• Recoverable: commit state changes
– Pluggable DB layer (e.g., mysql)
• Common structures/classes in all actors
– A set of slices each with a set of leases
(ReservationSet) in various states.
– Different “*Reservation*” classes with different
state machine transitions.
– Generic resource encapsulator (ResourceSet and
IConcreteSet)
– Common kernel: Shirako leasing core
Duke Systems
11
Actors and containers
• Actors run within containers
– JVM, e.g., Tomcat
– Per-container actor registry and keystore
– Per-container mysql binding
• Actor management interface
– Useful for GMOC and portals
– Not yet remoteable
• Portal attaches to container
– tabs for different actor “roles”
– Dynamic loadable controllers and views (“Automat”)
Duke Systems
12
“Automat” Portal
Duke Systems
13
Shirako Kernel
Snippet from “developer setup guide” on the web. The paths
changed in RENCI code base: prefix with core/trunk
Duke Systems
14
Shirako Kernel Events
The actor kernel (“core”) maintains state
and processes events:
– Local initiate: start request from local actor
• E.g., from portal command, or a policy
– Incoming request from remote actor
– Management API
• E.g., from portal or GMOC
– Timer tick
– Other notifications come through tick or
protocol APIs
Duke Systems
15
Pluggable Resources and Policies
Policy
Modules
“Controllers”
Leasing Core
instantiate guests
monitoring
state storage/recovery
negotiate contract terms
event handling
lease groups
Configure resources
Duke Systems
Resource
Handlers
and
Drivers
16
Kernel control flow
• All ops come through KernelWrapper
and Kernel (.java)
– Wrapper: validate request and access
• Most operations pertain to a single
lease state machine (FSM)
• But many access global state, e.g., alloc
resources from shared substrate.
– Kernel: execute op with a global “core” lock
– Nonblocking core, at least in principle
•
Duke Systems
17
Kernel control flow
• Acquire core lock
• Invoke *Reservation* class to transition
lease FSM
• Release core lock
• Commit new state to actor DB
• Execute async tasks, e.g. “service*”
methods to invoke plugins, handlers
• Ticks probe for completions of pending
async tasks.
Duke Systems
18
Lease State Machine
Broker policy selects resource
types and sites, and sizes unit
quantities.
Broker
original lease term
Ticketed
request
ticket
Extending
request ticket
extend
return
ticket
update
ticket
Reservation may change
size (“flex”) on extend.
Resources join
guest application.
Service
Manager
Nascent
Joining
Ticketed
Extending
Active
return
lease
redeem
ticket
Priming
request lease
extend
Active
Closed
Initialize resources when lease
begins (e.g., install nodes).
update
lease
close
handshake
Extending
Active
Time
Site policy assigns concrete
resources to match ticket.
Duke Systems
Active
Ticketed
Guest uses
resources.
form resource
request
Site
Authority
Guest may continue to
extend lease by mutual
agreement.
Teardown/reclaim resources
after lease expires, or on
guest-initiated close.
19
Handlers
• Invocation upcalls through
ResourceSet/ConcreteSet on relevant lease
transitions.
– Authority: setup/teardown
– Slice controller: join/leave
– Unlocked “async task” upcalls
• Relevant property sets are available to these
handler calls
– For resource type, configuration, local/unit
– Properties ignored by the core
• ConcreteSet associated with ShirakoPlugin
– e.g., COD manages “nodes” with IP addresses,
Duke Systems
20
Drivers
• Note that a handler runs within the actor
• So how to run setup/teardown code on
the component itself?
• How to run join/leave code on the
sliver?
• Option 1: handler invokes management
interfaces, e.g., XMLRPC, SNMP, ssh
• Option 2: invoke custom driver in a
NodeAgent with secure SOAP
Duke Systems
21
Example: VM instantiation
handlers
Duke Systems
drivers
22
TL1 Driver Framework
• General TL1 (command line) framework
– Substrate component command/response
• What to “expect”?
– XML file
Duke Systems
23
Orca: Actors and Protocols
Guest
formulate
requests
redeem
tickets
[1]
[5]
calendar
[2]
[6]
core
ticket
extendTicket
core
updateTicket
[4]
updateLease
[8]
redeem
extendLease
core
inventory
resource pools
ticket allocate
[3]
extend
assign redeem
[7]
extend
Broker
Authority
Duke Systems
24
Policy Plugin Points
Broker
plug-in broker policies for
resource selection and provisioning
Broker service interface
Negotiation between policy plugins
over allocation and configuration
Properties used to guide
negotiation
application
resource
request policy
join/leave
handler
for service
leasing
API
lease
event
interface
Service Manager
Guest application
Duke Systems
leasing
service
interface
lease
status
notify
assignment
policy
setup/
teardown
handlers for
resources
Site Authority
Host site (resource pool)
25
Property Lists
Broker
Examples: FCFS,priority,economic
Broker service interface
Request properties
elastic,deferrable
application
resource
request policy
join/leave
handler
for service
leasing
API
lease
event
interface
Service Manager
Guest application
Duke Systems
Resource Properties
machine.memory
machine.clockspeed
Configuration properties
image.id,public.key
Unit properties
host.ip,host.key
leasing
service
interface
lease
status
notify
assignment
policy
setup/
teardown
handlers for
resources
Site Authority
Host site (resource pool)
26
Messaging Model
• Proxies maintained in actor registry
– Asynchronous RPC
– Upcall to kernel for incoming ops
– Downcall from lease FSM for outgoing
• Local, SOAP w/ WS-Security, etc.
– WSDL protocols, but incomplete
• Integration (e.g., XMLRPC)
– Experiment manager calls into slice
controller module
– Substrate ops through authority-side handler
Duke Systems
27
The end, for now
• Presentation trails off….
• Follows are other slides from previous
presentations dealing more with the
concepts and rationale of Orca, and its
use for GENI.
Duke Systems
28
NSF GENI Initiative
Experiments
(Guests occupying slices)
Sliverable GENI Substrate
(Contributing domains/Aggregates)
Observatory
Wind tunnel
Embedding
Petri dish
Duke Systems
29
Dreaming of GENI
3. MA (because it has
sufficient credentials)
registers name, GID, URIs
and some descriptive info.
NSF GENI
clearinghouse
http://groups.geni.net/
Aggregate Mgmt GID
Authority
Component Registry
4. MA delegates rights to
NSF GENI so that NSF
GENI users can create
slices.
2. CM sends GID to MA; out
of band methods are used to
validate MA is willing to
vouch for component. CM
delegates MA the ability to
create slices.
Optical
Switch
Fiber ID
Switch Port
ρ
1. CM self-generates
GID: public and
private keys
Usage Policy
Engine
Notes:
• Identity and authorization are
decoupled in this architecture.
GIDs are used for identification
only. Credentials are used for
authorization. I.e., the GID says
only who the component is and
nothing about what it can do or
who can access it.
• Assuming aggregate MA already
has credentials permitting access
to component registry
Channel
Band
Duke Systems
Aaron Falk, GPO BBN
30
Slivers and Slices
Aaron Falk, GPO BBN
Duke Systems
31
GENI as a Programmable Substrate
• Diverse and evolving collection of substrate
components.
– Different owners, capabilities, and interfaces
• A programmable substrate is an essential platform for
R/D in network architecture at higher layers.
–
–
–
–
–
–
Secure and accountable routing plane
Authenticated traffic control (e.g., free of DOS and spam)
Mobile social networking w/ “volunteer” resources
Utility networking
Deep introspection and adaptivity
Virtual tunnels and bandwidth-provisioned paths
Duke Systems
32
Some Observations
• The Classic Internet is “just an overlay”.
– GENI is underlay architecture (“underware”).
• Incorporate edge resources: “cloud
computing” + sliverable network
• Multiple domains (MAD): not a “Grid”, but
something like dynamic peering contracts
– Decouple services from substrate; manage the
substrate; let the services manage themselves.
• Requires predictable (or at least
“discoverable”) allocations for reproducibility
– QoS at the bottom or not at all?
Duke Systems
33
Breakable Experimental Network (BEN)
• BEN is an experimental fiber facility
• Supports experimentation at metro scale
– Distributed applications researchers
– Networking researchers
• Enabling disruptive technologies
– Not a production network
• Shared by the researchers at the three Triangle Universities
– Coarse-grained time sharing is the primary mode for usage
– Assumes some experiments must be granted exclusive access to
the infrastructure
Duke Systems
34
Open Resource Control Architecture (Orca)
Cloud Apps
Services
Other Guests
Middleware
Resource Control Plane
VM
VM
VM
VM
Hardware
Hardware
Node
Node
Resource Control Plane
•
•
•
•
Contract model for resource peering/sharing/management
Programmatic interfaces and protocols
Automated lease-based allocation and assignment
Share substrate among dynamic “guest” environments
• http://www.cs.duke.edu/nicl/
Duke Systems
35
The GENI Control Plane
• Programmable substrate elements
• Dynamic end-to-end sliver allocation + control
– Delegation of authority etc.
– Instrumentation (feedback)
– Defining the capabilities of slivers
– “network virtual resource”
• Foundation for discovery
– Of resources, paths, topology
bandwidth shares
• Resource representation and exchange
16
→
rc=(4,4)
c
→
rb=(4,8)
→
ra=(8,4)
b
a
CPU shares
Duke Systems
36
Define: Control Plane
GGF+GLIF: "Infrastructure and distributed intelligence that
controls the establishment and maintenance of connections
in the network, including protocols and mechanisms to
disseminate this information; and algorithms for automatic
delivery and on-demand provisioning of an optimal path
between end points.”
s/connections/slices/
s/optimal path/embedded slices
provisioning += and programmed instantiation
Duke Systems
37
Key Questions
•
•
•
•
•
Who are the entities (actors)?
What are their roles and powers?
Whom do they represent?
Who says what to whom?
What innovation is possible within each
entity, or across entities?
Control plane defines “the set of entities that interact to
establish, maintain, and release resources and
provide…[connection,slice] control functions”.
Duke Systems
39
Design Tensions
•
•
•
•
•
•
•
•
Governance vs. freedom
Coordination vs. autonomy
Diversity vs. coherence
Assurance vs. robustness
Predictability vs. efficiency
Quick vs. right
Inclusion vs. entanglement
Etc. etc. …
Duke Systems
41
Design Tensions
• What is standardized vs. what is open
to innovation?
• How can GENI be open to innovation in
components/management/control?
– We want it to last a long time.
– Innovation is what GENI is for.
• Standardization vs. innovation
– Lingua Franca vs. Tower of Babel
Duke Systems
42
Who Are the Actors?
• Principle #1: Entities (actors) in the
architecture represent the primary
stakeholders.
1. Resource owners/providers (site or domain)
2. Slice owners/controllers (guests)
3. The facility itself, or resource scheduling
services acting on its behalf.
Others (e.g., institutions) are primarily
endorsing entities in the trust chains.
Duke Systems
43
Network
Service
Brokering
intermediaries
(ClearingHouse)
Etc.
Control Plane
Cloud
Service
Plug guests,
resources, and
management policies
into the “cloud”.
Resources
Infrastructure providers
Duke Systems
44
Contracts
• Principle #2: provide pathways for
contracts among actors.
– Accountability [SHARP, SOSP 2003]
• Be open with respect to what promises an
actor is permitted to make.
– Open innovation for contract languages and tools
– Yes, need at least one LCD
• Rspec > HTML 1.0
• Lingua Franca vs. Tower of Babel
• Resource contracts are easier than
service/interface contracts.
Duke Systems
45
Rules for Resource Contracts
• Don’t make promises you can’t keep…but
don’t hide power. [Lampson]
• There are no guarantees, ever.
– Have a backup plan for what happens if
“assurances” are not kept.
• Provide sufficient power to represent what
promises the actor is explicitly NOT making.
– E.g., temporary donation of resources
– Best effort, probabilistic overbooking, etc.
• Incorporate time: start/expiration time
– Resource contracts are leases (or tickets).
Duke Systems
46
Leases
• Foundational abstraction: resource leases
• Contract between provider (site) and guest
–
–
–
–
Bind a set of resource units from a site to a guest
Specified term (time interval)
Automatic extends (“meter feeding”)
Various attributes
request
Guest
grant
Duke Systems
Provider Site
<lease>
<issuer> Site’s public key </issuer>
<signed_part>
<holder> Guest’s public key </holder>
<rset> resource description </rset>
<start_time> … </start_time>
<end_time> … </end_time>
<sn> unique ID at Site </sn>
</signed_part>
<signature> Site’s signature </signature>
</lease>
47
Network Description Language?
<ndl:Interface rdf:about="#tdm3.amsterdam1.netherlight.net:501/3">
<ndl:name>tdm3.amsterdam1.netherlight.net:501/3</ndl:name>
<ndl:connectedTo
rdf:resource="http://networks.internet2.edu/manlan/manlan.rdf#manlan:if1"/>
<ndl:capacity
rdf:datatype="http://www.w3.org/2001/XMLSchema#float">1.244E+9</ndl:capacity>
</ndl:Interface>
<ndl:Interface rdf:about="http://networks.internet2.edu/manlan/manlan.rdf#manlan:if1">
<rdfs:seeAlso rdf:resource="http://networks.internet2.edu/manlan/manlan.rdf"/>
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:ndl="http://www.science.uva.nl/research/sne/ndl#”
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<!-- Description of Netherlight -->
<ndl:Location rdf:about="#Amsterdam1.netherlight.net">
<ndl:name>Netherlight Optical Exchange</ndl:name>
<geo:lat>52.3561</geo:lat>
<geo:long>4.9527</geo:long>
</ndl:Location>
<!-- TDM3.amsterdam1.netherlight.net -->
<ndl:Device rdf:about="#tdm3.amsterdam1.netherlight.net">
<ndl:name>tdm3.amsterdam1.netherlight.net</ndl:name>
<ndl:locatedAt rdf:resource="#Amsterdam1.netherlight.net"/>
<ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/1"/>
<ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/2"/>
<ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/3"/>
<ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/4"/>
<ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:502/1"/>
<ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:502/2"/>
<ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:502/3"/>
>
Duke Systems
50
Delegation
• Principle #3: Contracts enable
delegation of powers.
– Delegation is voluntary and provisional.
• It is a building block for creating useful
concentrations of power.
– Creates a potential for governance
– Calendar scheduling, reservation
– Double-edged sword?
• Facility can Just Say No
Duke Systems
52
Aggregation
• Principle #4: aggregate the resources
for a site or domain.
– Primary interface is domain/site authority
• Abstraction/innovation boundary
– Keep components simple
– Placement/configuration flexibility for
owner
– Mask unscheduled outages by substitution
– Leverage investment in technologies for
site/domain management
Duke Systems
53
BEN fiberplant
• Combination of NCNI fiber and campus
fiber
• Possible fiber topologies:
Duke Systems
54
Infinera DTN
• PIC-bases solution
• 100Gbps DLM (digital line module)
– Circuits provisioned at 2.5G granularity
• Automatic optical layer signal
management (gain control etc.)
• GMPLS-based control plane
• Optical express
– All-optical node bypass
Duke Systems
55
Experimentation on BEN
• Extend Orca to enable slivering of
– Network elements:
• Fiber switches
• DWDM equipment
• Routers
• Adapt mechanisms to enable flexible description of
network slices
– NDL
• Demonstrate end-to-end slicing on BEN
– Create realistic slices containing compute, storage and
network resources
– Run sample experiments on them
Duke Systems
56
BEN Usage
•
•
Experimental equipment connected to
the BEN fiberplant at BEN points-ofpresence
Use MEMS fiber switches to switch
experimental equipment in and out
–
•
•
Based on the experiment schedule
By nature of the facility, experiments
running on it may be disruptive to the
network
BEN Points of presence located at the
RENCI engagement sites and RENCI
anchor site
Duke Systems
57
BEN Redux
• Reconfigurable optical plane
– We will be seeking out opportunities to expand the available
fiber topology
• Researcher equipment access at all layers
– From dark fiber up
•
•
•
•
•
Coarse-grained scheduled
Researcher-controlled
No single-vendor lock-in
Equipment with exposable APIs
Connectivity with substantial non-production
resources
Duke Systems
58
Elements of Orca Research Agenda
• Automate management inside the cloud.
– Programmable guest setup and provisioning
• Architect a guest-neutral platform.
– Plug-in new guests through protocols; don’t hard-wire them
into the platform.
• Design flexible security into an open control plane.
• Enforce fair and efficient sharing for elastic guests.
• Incorporate diverse networked resources and virtual
networks.
• Mine instrumentation data to pinpoint problems and
select repair actions.
• Economic models and sustainability.
Duke Systems
59
Leasing Virtual Infrastructure
The hardware infrastructure consists of pools of typed
“raw” resources distributed across sites.
- e.g., CPU shares, memory etc. “slivers”
- storage server shares [Jin04]
- measured, metered, independent units
- varying degrees of performance isolation
16
→
rb=(4,8)
Policy agents control negotiation/arbitration.
- programmatic, service-oriented leasing interfaces
- lease contracts
<lease>
<issuer> Site’s public key </issuer>
<signed_part>
<holder> Guest’s public key </holder>
<rset> resource description </rset>
<start_time> … </start_time>
<end_time> … </end_time>
<sn> unique ID at Site </sn>
</signed_part>
<signature> Site’s signature
</signature>
</lease>
Duke Systems
→
rc=(4,4)
→
ra=(8,4)
c
b
a
request
Guest
grant
Provider Site
60
Summary
• Factor actors/roles along the right
boundaries.
– stakeholders, innovation, tussle
• Open contracts with delegation
• Specific recommendations for GENI:
– Aggregates are first-class entities
– Component interface: permit innovation
– Clearinghouse: enable policies under GSC
direction
Duke Systems
61
Modularize Innovation
• Control plane design should enable local
innovation within each entity.
• Can GENI be a platform for innovation of
platforms? Management services?
– How to carry forward the principle that
PlanetLab calls “unbundled management”?
• E.g., how to evolve standards for
information exchange and contracts.
– Lingua Franca or Tower of Babel?
Duke Systems
62
Slices: Questions
• What “helper” tools/interfaces must we have and
what do they require from the control plane?
• Will GENI enable research on new management
services and control plane?
– If software is the “secret weapon”, what parts of the platform
are programmable/replaceable?
• Co-allocation/scheduling of an end-to-end slice?
– What does “predictable and repeatable” mean?
– What assurances are components permitted to offer?
• What level of control/stability do we assume over the
substrate?
Duke Systems
63
Focus questions
• Specify/design the “core services”:
–
–
–
–
–
Important enough and hard enough to argue about
Must be part of facilities planning
Directly motivated by usage scenarios
Deliver maximum bang for ease-of-use
User-centric, network-centric
• Enable flowering of extensions/plugins
– Find/integrate technology pieces of value
• What requirements do these services place
on other WGs?
Duke Systems
64
Duke Systems
65