Transcript memory1.5

CPS110:
Wrapping up memory
(virtual machines)
Landon Cox
March 18, 2009
Traditional OS structure
App
App
App
Operating System
Host Machine
App
OS abstractions
Threads
Instructions
CPU
Last month of class
Applications
Virtual
Memory
OS
Virtual addrs
Physical mem
Hardware
What are the interfaces and the resources?
What is being virtualized?
“Kernel
library”
Syst calls
I/O devices
Courser abstraction: virtual machine
 We’ve already seen a kind of virtual machine
 OS gives processes virtual memory
 Each process runs on a virtualized CPU
 Virtual machine
 An execution environment
 May or may not correspond to physical reality
Virtual machine options
 How to implement a virtual machine?
1. Interpreted virtual machines
 Translate every VM instruction
 Kind of like on-the-fly compilation
 VM instruction  HW instruction(s)
2.
Direct execution
 Execute instructions directly
 Emulate the hard ones
Interpreted virtual machines
 Implement the machine in software
 Must translate emulated to physical
 Java: byte codes  x86, PPC, ARM, etc
 Software fetches/executes instructions
Program
(foo.class)
Byte code
Interpreter
(java)
x86
What does this picture look like?
Dynamic virtual memory translator
Java virtual machine
 What is the interface?
 Java byte-code instructions
 What is the abstraction?

Stack-machine architecture
 What are the resources?
 CPU, physical memory, disk, network
 The Java programming language
 High-level language compiled into byte code
 Library of services (kind of like a kernel)
 Like C++/STL, C#
Direct execution
 What is the interface?
 Hardware ISA (e.g. x86 instructions)
 What is the abstraction?
 Physical machine (e.g. x86 processor)
 What are the resources?
 CPU, physical memory, disk, network
Program
(XP kernel)
x86
Monitor
(VMware)
Different techniques
 Emulation
 Bochs, QEMU
 Full virtualization
 VMware
 Paravirtualization
 Xen
 Dynamic recompilation
 Virtual PC
Virtual machines are hot
VMware IPO: $19.1 billion
Xen sale: $500 million
Views of the CPU
 How is a process’s view of the CPU different than the OS’s?






Kernel mode
Access to physical memory
Manipulation of page tables
Other “privileged instructions”
Turn off interrupts
Traps
 Keep these in mind when thinking about virtual machines
Virtual machine structure
Guest
App
Guest
App
Guest
App
Guest OS
Guest OS
Guest OS
Virtual Machine Monitor (Hypervisor)
Host Machine
Why are hypervisors useful?
 Code reuse
 Can run old operating systems + apps on new hardware
 Original purpose of VMs by IBM in the 60s
 Encapsulation
 Can put entire state of an “application” in one thing
 Move it, restore it, copy it, etc
 Isolation, security
 All interactions with hardware are mediated
 Hypervisor can keep one VM from affecting another
 Hypervisor cannot be corrupted by guest operating systems
Encapsulation




Say I want to suspend/restore an application
I decide to write the process mem + PCB to disk
I reboot my kernel and restart the process
Will this work?




No, application state is spread out in many places
Application might involve multiple processes
Applications have state in the kernel (lost on reboot)
(e.g. open files, locks, process ids, driver states, etc)
Encapsulation
 Virtual machines capture all of this state
 Can suspend/restore an application
 On same machine between boots
 On different machines
 Very useful in server farms
 We’ll talk more about this with Xen
Security
 Can user processes corrupt the kernel?




Can overwrite logs
Overwrite kernel file
Can boot a new kernel
Exploit a bug in the system call interface
 Ok, so I’ll use a hypervisor. Is my data any less vulnerable?
 All the state in the guest is still vulnerable (file systems, etc)
 So what’s the point?
 Hypervisors can observe the guest OS
 Security services in hypervisor are safe, makes detection easier
Security
 Hypervisors buggy too, why trust them more than kernels?




Narrower interface to malicious code (no system calls)
No way for kernel to call into hypervisor
Smaller, (hopefully) less complex codebase
Should be fewer bugs
 Anything wrong with this argument?




Hypervisors are still complex
May be able to take over hypervisor via non-syscall interfaces
E.g. what if hypervisor is running IP-accessible services?
Paravirtualization (in Xen) may compromise this
VMware architecture
Host World
VMM World
Target
App
Host
App
VM App
Host OSVM Driver
Host Machine
Target
App
Target OS
Virtual Machine
Monitor
SimOS (proto-VMware) arch.
Target
App
Target
App
Target OS
Host
App
SimOS
Host OS
Host Machine
Host
App
SimOS memory
SimOS
SimOS VMemory
SimOS code, data
Target OS
TargOS code, data
Target App
TargApp code, data
Target App
Virtual MMU
SimDisk
Host OS
Host Machine
SimDisk File
Mem File
SimOS page fault
SimOS
SimOS VMemory
Target OS
Target App
Target App
SimOS Fault handler
What if I want to
TargOS Fault handler
suspend and
Unmapped
addr
migrate
the target
OS?
Virtual MMU
SimDisk
Host OS
Host Machine
SimDisk File
Mem File
Full vs interpreted
 Why would I use VMware instead of Java?
 Support for legacy applications
 Do not force users to use a particular language
 Do not force users to use a particular OS
 Why would I use Java instead of VMware?
 Lighter weight
 Nice properties of type-safe language
 Can prove safety at compile time
Full vs interpreted
 What about protection?
 What does Java use for protection? VMware?
 Java relies on language features (cannot express unsafe computation)
 VMware relies on the hardware to enforce protection (like an OS)
 What are the trade-offs? Which protection model is better?
 Java gives you stronger (i.e. provable) safety guarantees
 Hardware protection doesn’t constrain programming expressiveness
 What about sharing (kind of the opposite of protection)?




Sharing among components in Java is easy
(call a function, compiler makes sure it is safe)
Sharing between address spaces is more work, has higher overhead
(use sockets, have to context switch, flush TLB, etc)
Singularity (could try both)
Virtual machine challenges





Privilege modes
Memory management
Protection
Performance
Many more for every architecture…
Course administration
 Multi-process test cases
 Autograder will test your pager with > 1 process
 But don’t submit any to the autograder
 How to write multi-process test cases




Use vm_yield
Can quickly open processes in different windows
Can use sleep (unsigned int seconds)
Could use fork
Course administration
 Various Project 2 policies
 Free physical pages and disk blocks
 Easiest thing is to choose lowest page or block
 priority_queue<int, vector<int>, greater<int> > disk_free;
 Assigning disk blocks
 Allocate eagerly in vm_extend
 Fail vm_extend if there are no unassigned disk blocks left
 Do not want to overcommit disk space
Course administration
 Extra office hours




Ryan: Monday, 23rd, 12p-2p
Landon: Tuesday, 24th, 10a-12p
Matt: Tuesday, 24th, 415p-815p
Niko: Wednesday, 25th, 10a-12p
 Other questions?
Sharing machines among users
 PlanetLab (752 nodes at 361 sites)
 Platform for distributed applications
 Research testbed
 Why is this more useful than a cluster?
 See real Internet problems
 Latency, failures, etc
 Service fault tolerance
Sharing machines among users
Consolidate under-utilized servers
to reduce CapEx and OpEx
Avoid downtime with relocation
Dynamically re-balance workload
to guarantee application SLAs
Enforce security policy
What about the enterprise?
Sharing machines among users
 When?
 PlanetLab (testbeds, distributed services)
 Data centers (three-tier web applications)
 Scientific computing (protein folding, etc)
 What should the interface be?
Shared infrastructure interfaces
 Unmodified OS




Each app gets a login username
Access resources through system calls
Users can see other users’ files, processes
Drawbacks of this approach?




Administration, configuration headaches
(e.g. which libraries are installed?)
No performance isolation
(one process can dominate CPU, buffer cache, bandwidth)
Shared infrastructure interfaces
 Unmodified OS
 Retrofit resource accounting into OS (V-Servers)




Access resources through system calls
Virtualize some resources
(e.g. each app has own process table, file system)
Drawbacks of this approach?




How do you know that you’ve virtualized everything you need to
Especially hard for software resources
(e.g. what about entries in the file descriptor table?)
(e.g. who gets charged on a page fault?)
Shared infrastructure interfaces
 Unmodified OS
 Retrofit resource accounting into OS (V-Servers)
 Virtual machines (Xen, VMware)




Virtualize hardware interface
Each app gets to choose its own OS
(e.g. apps have their own virt. CPU, physical memory, disk)
Drawbacks of this approach?
 Heavy-weight (debatable)
 Redundant state (e.g. kernel, libraries, executables)
 Might not scale well (if you believe it is heavy-weight)
Amazon EC2
Anyone know what EC2 uses?
Xen
Xen challenges
 Kind of the opposite approach of V-Servers
 V-Servers: start with OS, virtualize
 Xen: start with VM, “para-virtualize”
 Goals
 Performance isolation
 Support many operating systems
 Reduce performance overhead of virtualization
Para-virtualization
 Full virtualization
 Fool OS into thinking it has access to hardware
 Para-virtualization
 Expose real and virtual resources to OS
 Why do we need para-virtualization?
 Mostly because x86 made full virtualization hard
 This is much less so now
 Unlikely to be an issue in the near future
Why para-virtualize?
 Limitations of x86




Privileged instructions fail silently
VMM must execute these instructions
Cannot rely on traps to VMM
How does VMware deal with this?
 At run-time rewrite guest kernel binary
 Insert traps into the VMM, when necessary
Why para-virtualize?
 Limitations of x86
 Timing issues
 May want to expose “real time” to OS
 TCP time outs, RTT estimates
 Support for performance optimizations
 Superpages
 Page coloring
VMware architecture
Host World
VMM World
Target
App
Host
App
VM App
Host OSVM Driver
Host Machine
Target
App
Target OS
Virtual Machine
Monitor
SimOS architecture
Target
App
Target
App
Target OS
Host
App
SimOS
Host OS
Host Machine
Host
App
Xen architecture
Guest
App
Guest
App
Guest OS
Guest OS
Host
App
Xen
Domain 0
Host Machine
X86_32 address space
When are each set of virtual addresses are valid?
4GB
3GB
Xen
S
Kernel
S
User
All
address
spaces
All of a
VM’s
address
spaces
U
0GB
When does the hypervisor need to flush the TLB?
When a new guest VM or guest app needs to be run.
Each
guest
app
Xen physical memory
 Allocated by hypervisor when VM is created
 Why can’t we allow guests to update PTBR?
 Might map virtual addrs to physical addrs they don’t own
 VMware and Xen used to handle this differently
 VMware maintained “shadow page tables”
 Xen used “hypercalls”
 (Xen and VMware support both mechanisms now)
VMware guest page tables
Virtual → Machine
Update PTE
Guest OS
How does VMM grab control when PTE is updated?
Marks PTE pages read-only, generates page fault.
Shadow page table
VMM
Hardware
MMU
Xen physical memory
 Guest OSes allocate and manage own PTs
 “Hypercall” to change PT base
 Like a system call between guest OS and Xen
 Xen must validate PT updates before use
 What are the validation rules?
1. Guest may only map phys. pages it owns
2. PT pages may only be mapped RO
Xen guest page tables
Virtual → Machine
Update PTE hypercall (like a syscall)
Guest OS
1) Validation check
2) Perform update
VMM
Hardware
MMU
Para-virtualized CPU
 Hypervisor runs at higher privilege than guest OS
 Why is having only two levels a problem?
 Guest OSes must be protected from guest applications
 Hypervisor must be protected from guest OS
 What do we do if we only have two privilege levels?
 OS shares lower privilege level with guest applications
 Run guest apps and guest OS in different address spaces
 Why would this be slow?
 VMM must flush the TLB on system calls, page faults
X86_32 address space
4GB
3GB
Xen
Kernel
S
S
Ring 0
Ring 1
Ring 3
User
U
0GB
What does this assume?
Guest OS doesn’t need ring 2 (e.g. OS/2).
Para-virtualized CPU
 Hypervisor runs at higher privilege than guest OS
 Ring 0  hypervisor, ring 1  guest OS, ring 3  guest apps
 Handling exceptions
 Guest registers handlers with Xen (must modify guest)
 System calls
 Guests register "fast" handler with Xen
 Xen validates handler, inserts in CPU’s handler table
 No need to go to ring 0 to execute
 What handler cannot be executed directly and why?
 Page fault handler must read register CR2 (only allowed in ring 0)
 CR2 is where the fault-generating address is stored
Para-virtualization
 Pros
 Better performance (maybe)
 Scales better than full virtualization
 Cons




OS needs (minor) changes
How well does it actually scale? Unclear at this point
Impure abstractions
Is it important to provide good abstractions?
 I say yes
 Bad interfaces lead to code complexity, maintainability issues