PPT - Duke University

Download Report

Transcript PPT - Duke University

Duke Systems
Intro to Clouds
Jeff Chase
Dept. of Computer Science
Duke University
Part 1
VIRTUAL MACHINES
The story so far: OS platforms
• OS platforms let us run programs in contexts.
• Contexts are protected/isolated to varying degrees.
• The OS platform TCB offers APIs to create and
manipulate protected contexts.
– It enforces isolation of contexts for running programs.
– It governs access to hardware resources.
• Classical example:
– Unix context: process
– Unix TCB: kernel
– Unix kernel API: syscalls
The story so far: layered platforms
• We can layer “new” platforms on “old” ones.
– The outer layer hides the the inner layer,
– covering the inner APIs and abstractions, and
– replacing them with the model of the new platform.
• Example: Android over Linux
Android
AMS
JVM+lib
Native virtual machines (VMs)
• Slide a hypervisor underneath the kernel.
– New OS/TCB layer: virtual machine monitor (VMM).
• Kernel and processes run in a virtual machine (VM).
– The VM “looks the same” to the OS as a physical machine.
– The VM is a sandboxed/isolated context for an entire OS.
• A VMM can run multiple VMs on a shared computer.
guest VM1
P1A
OS kernel 1
guest VM2
P2B
OS kernel 2
hypervisor/VMM
guest VM3
P3C
OS kernel 3
guest or
tenant
VM
contexts
host
What is a “program” for a VM?
VMM/hypervisor is a new layer of OS platform, with a
new kind of protected context. What is a program?
???
app
app
app
guest kernel
What kind of
program do we
launch into a VM
context?
hypervisor/VMM
It’s called a
virtual appliance or VM image.
A VM is called an instance
of the image.
virtual appliance
contains a
complete OS
system image, with
file tree and apps
[Graphics are from rPath inc. and VMware inc.]
Thank you, VMware
Motivation: support multiple OS
When virtual is better than real
When virtual is better than real
everyone plays nicely together
[image from virtualbox.org]
The story so far: protected CPU mode
Any kind of machine exception transfers control to a registered
(trusted) kernel handler running in a protected CPU mode.
syscall trap
u-start
fault
u-return
u-start
fault
u-return
kernel “top half”
kernel “bottom half” (interrupt handlers)
clock
interrupt
user
mode
kernel
mode
interrupt
return
Kernel handler manipulates
CPU register context to return
to selected user context.
A closer look
user stack
user stack
syscall trap
u-start
handler
dispatch
table
boot
fault
u-return
u-return
kernel
stack
clock
interrupt
fault
u-return
u-start
kernel stack
interrupt
return
X
IA/x86 Protection Rings (CPL)
• Modern CPUs have multiple
protected modes.
CPU Privilege Level (CPL)
• History: IA/x86 rings (CPL)
– Has built in security levels
(Rings 0, 1, 2, 3)
– Ring 0 – “Kernel mode”
(most privileged)
– Ring 3 – “User mode”
• Unix uses only two modes:
Increasing Privilege Level
Ring 0
– user – untrusted execution
Ring 1
– kernel – trusted execution
Ring 2
Ring 3
[Fischbach]
Protection Rings
• New Intel VT and AMD SVM
CPUs introduce new protected
modes for VMM hypervisors.
• We can think of it as a new
inner ring: one ring to bind
them all.
• Warning: this is an
oversimplification: the actual
architecture is more complex
for backward compatibility.
user
kernel
hypervisor
hypervisor
guest
user
Protection Rings
• Computer scientists have
drawn these rings since the
1960s.
• They represent layering: the
outer ring “hides” the interface
of the lower ring.
• The machine defines the events
(exceptions) that transition to
higher privilege (inner ring).
• Inner rings register handlers to
intercept selected events.
• But the picture is misleading….
Increasing Privilege Level
Ring 0
Ring 1
Ring 2
Ring 3
[Fischbach]
Protection Rings
• We might just as soon draw it
“inside out”.
• Now the ring represents power:
what the code at that ring can
access or modify.
• Bigger rings have more power.
• Inclusion: bigger rings can see
or do anything that the smaller
rings can do.
• And they can manipulate the
state of the rings they contain.
• But still misleading: there are
multiple ‘instances’ of the
weaker rings.
user
guest
hypervisor
Maybe a better picture…
There are multiple ‘instances’ of
the weaker rings.
And powers are nested: an outer
ring limits the “sandbox” or scope
of the rings it contains.
Post-note
• The remaining slides in the section are just
more slides to reinforce these concepts.
• We didn’t see them in class.
• There is more detail in the reading…
Kernel Mode
CPU mode (a field in some status register) indicates
whether a machine CPU (core) is running in a user
program or in the protected kernel.
Some instructions or register accesses are legal only
when the CPU (core) is executing in kernel mode.
CPU mode transitions to kernel mode only on
machine exception events (trap, fault, interrupt),
which transfers control to a handler registered by the
kernel with the machine at boot time.
CPU core
R0
Rn
PC
So only the kernel program chooses what code ever
runs in the kernel mode (or so we hope and intend).
A kernel handler can read the user register values at
the time of the event, and modify them arbitrarily
before (optionally) returning to user mode.
U/K
mode
x
registers
Exceptions: trap, fault, interrupt
synchronous
caused by an
instruction
asynchronous
caused by
some other
event
intentional
unintentional
happens every time
contributing factors
trap: system call
fault
open, close, read,
write, fork, exec, exit,
wait, kill, etc.
invalid or protected
address or opcode, page
fault, overflow, etc.
“software interrupt”
software requests an
interrupt to be delivered
at a later time
interrupt
caused by an external
event: I/O op completed,
clock tick, power fail, etc.
Kernel Stacks and Trap/Fault Handling
Processes
execute user
code on a user
stack in the
user virtual
memory in the
process virtual
address space.
Each process has a
second kernel
stack in kernel
space (VM
accessible only to
the kernel).
data
stack
stack
stack
syscall
dispatch
table
stack
System calls
and faults run
in kernel
mode on the
process
kernel stack.
Kernel code
running in P’s
process context
(i.e., on its
kstack) has
access to P’s
virtual memory.
The syscall handler makes an indirect call through the system call
dispatch table to the handler registered for the specific system call.
More on VMs
Recent CPUs support additional protected mode(s) for hypervisors. When the
hypervisor initializes, it selects some set of event types to intercept, and registers
handlers for them.
Selected machine events occuring in user mode or kernel mode transfer control
to a hypervisor handler. For example, a guest OS kernel accessing device
registers may cause the physical machine to invoke the hypervisor to intervene.
In addition, the VM architecture has another level of indirection in the MMU page
tables: the hypervisor can specify and restrict what parts of physical memory are
visible to each guest VM.
A guest VM kernel can map to or address a physical memory frame or command
device DMA I/O to/from a physical frame if and only if the hypervisor permits it.
If any guest VM tries to do anything weird, then the hypervisor regains control and
can see or do anything to any part of the physical or virtual machine state before
(optionally) restarting the guest VM.
If you are interested…
2.1 The Intel VT-x Extension
In order to improve virtualization performance and simplify VMM implementation, Intel has developed VT-x [37], a virtualization extension to the
x86 ISA. AMD also provides a similar extension with a different hardware interface called SVM [3].
The simplest method of adapting hardware to support virtualization is to introduce a mechanism for trapping each instruction that accesses
privileged state so that emulation can be performed by a VMM. VT-x embraces a more sophisticated approach, inspired by IBM’s interpre tive
execution architecture [31], where as many instructions as possible, including most that access privileged state, are executed directly in hardware
without any intervention from the VMM. This is possible because hardware maintains a “shadow copy” of privileged state. The motivation for this
approach is to increase performance, as traps can be a significant source of overhead.
VT-x adopts a design where the CPU is split into two operating modes: VMX root and VMX non-root mode. VMX root mode is generally used to
run the VMM and does not change CPU behavior, except to enable access to new instructions for managing VT-x. VMX non-root mode, on the
other hand, restricts CPU behavior and is intended for running virtualized guest OSes.
Transitions between VMX modes are managed by hardware. When the VMM executes the VMLAUNCH or VMRESUME instruction, hardware
performs a VM entry; placing the CPU in VMX non-root mode and executing the guest. Then, when action is required from the VMM, hardware
performs a VM exit, placing the CPU back in VMX root mode and jumping to a VMM entry point. Hardware automatically saves and restores most
architectural state during both types of transitions. This is accomplished by using buffers in a memory resident data structure called the VM
control structure (VMCS).
In addition to storing architectural state, the VMCS contains a myriad of configuration parameters that allow the VMM to control execution and
specify which type of events should generate VM exits. This gives the VMM considerable flexibility in determining which hardware is exposed to
the guest. For example, a VMM could configure the VMCS so that the HLT instruction causes a VM exit or it could allow the guest to halt the
CPU. However, some hardware interfaces, such as the interrupt descriptor table (IDT) and privilege modes, are exposed implicitly in VMX nonroot mode and never generate VM exits when accessed. Moreover, a guest can manually request a VM exit by using the VMCALL instruction.
Virtual memory is perhaps the most difficult hardware feature for a VMM to expose safely. A straw man solution would be to configure the VMCS
so that the guest has access to the page table root register, %CR3. However, this would place complete trust in the guest because it would be
possible for it to configure the page table to access any physical memory address, including memory that belongs to the VMM. Fortunately, VT-x
includes a dedicated hardware mechanism, called the extended page table (EPT), that can enforce memory isolation on guests with direct
access to virtual memory. It works by applying a second, underlying, layer of address translation that can only be configured by the VMM. AMD’s
SVM includes a similar mechanism to the EPT, referred to as a nested page table (NPT).
From Dune: Safe User-level Access to Privileged CPU Features, Belay e.t al., (Stanford), OSDI, October, 2012
VT in a Nutshell
• New VM mode bit
– Orthogonal to kernel/user mode or rings (CPL)
• If VM mode is off
– Machine looks just like it always did
• If VM bit is on
– Machine is running a guest VM
– “VMX non-root operation”
– Various events cause gated entry into hypervisor
– “virtualization intercept”
– Hypervisor can control which events cause intercepts
– Hypervisor can examine/manipulate guest VM state
There is another motivation for VMs and hypervisors.
Application services and computational jobs need access to
computing power “on tap”. Virtualization allows the owner of
a server to “slice and dice” server resources and allocate the
virtual slices out to customers as VMs. The customers can
install and manage their own software their own way in their
own VMs. That is cloud hosting.
Part 2
SERVICES
Services
RPC
GET
(HTTP)
End-to-end application delivery
Where is your application?
Where is your data?
Where is your OS?
Cloud and Software-as-a-Service (SaaS)
Rapid evolution, no user upgrade, no user data management.
Agile/elastic deployment on virtual infrastructure.
Networking
endpoint
port
operations
advertise (bind)
listen
connect (bind)
close
channel
binding
connection
node A
write/send
read/receive
node B
Some IPC mechanisms allow communication across a network.
E.g.: sockets using Internet communication protocols (TCP/IP).
Each endpoint on a node (host) has a port number.
Each node has one or more interfaces, each on at most one network.
Each interface may be reachable on its network by one or more names.
E.g. an IP address and an (optional) DNS name.
SaaS platform elements
browser
[wiki.eeng.dcu.ie]
container
“Classical OS”
Motivation: “Success disaster”
[Graphic from Amazon: Mike Culver, Web Scale Computing]
Motivation: “Success disaster”
[Graphic from Amazon: Mike Culver, Web Scale Computing]
“Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly provisioned
and released with minimal management effort or service
provider interaction.”
- US National Institute for Standards and Technology
http://www.csrc.nist.gov/groups/SNS/cloud-computing/
Part 2
VIRTUAL CLOUD HOSTING
Cloud > server-based computing
Client
Server(s)
• Client/server model (1980s - )
• Now called Software-as-a-Service (SaaS)
Host/guest model
Client
Service
Guest
Cloud
Provider(s)
Host
• Service is hosted by a third party.
– flexible programming model
– cloud APIs for service to allocate/link resources
– on-demand: pay as you grow
IaaS: infrastructure services
Client
Service
Platform
Hosting performance
and isolation is
determined by
virtualization layer
Virtual machines:
VMware, KVM, etc.
OS
VMM
Physical
Deployment of private
clouds is growing
rapidly w/ open IaaS
cloud software.
PaaS: platform services
Client
PaaS cloud services
define the high-level
programming
models, e.g., for
clusters or specific
application classes.
Service
Platform
Hadoop, grids,
batch job services,
etc. can also be
viewed as PaaS
category.
OS
VMM
(optional)
Physical
Note: can deploy
them over IaaS.
Varying
workload
Fixed system
Varying
performance
Varying
workload
Varying system
Fixed
performance
“Elastic Cloud”
Varying
workload
Varying system
Resource
Control
Target
performance
Elastic provisioning
Managing Energy and Server Resources in Hosting Centers, SOSP, October 2001.
EC2
The canonical public cloud
Virtual
Appliance
Image
OpenStack, the Cloud Operating System
Management Layer That Adds Automation & Control
[Anthony Young @ Rackspace]
IaaS Cloud APIs (OpenStack, EC2)
• Query of availability zones (i.e. clusters in Eucalyptus)
• SSH public key management (add, list, delete)
• VM management (start, list, stop, reboot, get console output)
• Security group management
• Volume and snapshot management (attach, list, detach, create,
bundle, delete)
• Image management (bundle, upload, register, list, deregister)
• IP address management (allocate, associate, list, release)
Adding storage
Competing Cloud Models: PaaS vs. IaaS
• Cloud Platform as a Service (PaaS). The capability provided to the consumer is
to deploy onto the cloud infrastructure consumer-created or acquired applications
created using programming languages and tools supported by the provider. The
consumer does not manage or control the underlying cloud infrastructure
including network, servers, operating systems, or storage, but has control over the
deployed applications and possibly application hosting environment
configurations.
• Cloud Infrastructure as a Service (IaaS). The capability provided to the consumer
is to provision processing, storage, networks, and other fundamental computing
resources where the consumer is able to deploy and run arbitrary software, which
can include operating systems and applications. The consumer does not manage
or control the underlying cloud infrastructure but has control over operating
systems, storage, deployed applications, and possibly limited control of select
networking components (e.g., host firewalls).
Amazon Elastic Compute Cloud (EC2)
Eucalyptus
OpenNebula
Post-note
• The remaining slides weren’t discussed.
• Some give more info on the various forms of cloud computing
following the NIST model. Just understand IaaS and PaaS
hosting models.
• The “Adaptation” slides deal with resource management: what
assurances does the holder of virtual infrastructure have about
how much resource it will receive, and how good its
performance will (therefore) be? We’ll discuss this more later.
• The last slide refers to an advanced cloud project at Duke and
RENCI.org, partially funded by NSF Global Environment for
Network Innovations (geni.net).
Managing images
• “Let a thousand flowers bloom.”
• Curated image collections are needed!
• “Virtual appliance marketplace”
Infrastructure as a Service (IaaS)
“Consumers of IaaS have access to virtual
computers, network-accessible storage,
network infrastructure components, and other
fundamental computing resources…and are
billed according to the amount or duration of
the resources consumed.”
Cloud Models
• Cloud Software as a Service (SaaS)
– Use provider’s applications over a network
• Cloud Platform as a Service (PaaS)
– Deploy customer-created applications to a cloud
• Cloud Infrastructure as a Service (IaaS)
– Rent processing, storage, network capacity, and
other fundamental computing resources
NIST Cloud Definition Framework
Hybrid Clouds
Deployment
Models
Service
Models
Community
Cloud
Private
Cloud
Software as a
Service (SaaS)
Public Cloud
Platform as a
Service (PaaS)
Infrastructure as a
Service (IaaS)
On Demand Self-Service
Essential
Characteristics
Common
Characteristics
Broad Network Access
Rapid Elasticity
Resource Pooling
Measured Service
Massive Scale
Resilient Computing
Homogeneity
Geographic Distribution
Virtualization
Service Orientation
Low Cost Software
Advanced Security
Computer
CPU
Memory
Disk
BW
memory shares
Adaptations: Describing IaaS Services
16
→
rc=(4,4)
→
rb=(4,8)
→
ra=(8,4)
b
a
CPU shares
c
Adaptations: service classes
• Must adaptations promise performance isolation?
• There is a wide range of possible service
classes…to the extent that we can reason about
them.
Continuum of service classes
Available
surplus
Weak
effort
Best
effort
Reflects load factor or
overbooking degree
Proportional
Elastic
share
reservation
Hard
reservation
Reflects priority
Constructing “slices”
• I like to use TinkerToys as a metaphor for
creating a slice in the GENI federated cloud.
• The parts are virtual infrastructure resources:
compute, networking, storage, etc.
• Parts come in many types, shapes, sizes.
• Parts interconnect in various ways.
• We combine them to create useful
built-to-order assemblies.
• Some parts are programmable.
• Where do the parts come from?