Transcript Chapter 8
System VMs
This material is based on the book, Virtual Machines: Versatile Platforms for
Systems and Processes, Copyright 2005 by Elsevier Inc. All rights reserved.
It has been used and/or modified with permission from Elsevier Inc.
System VMs
Support multiple guest OSes on single hardware
platform; all running the same ISA
Linux
Application
Windows
Application
OS/2
Application
Linux OS
Windows OS
OS/2 OS
Virtual Intel x86
Virtual Intel x86
Virtual Intel x86
Intel x86
Hardware
System VMs
2
System VM Outline
Applications
Virtualizing Processors
Virtualizing Memory
Virtualizing I/O
Formal Virtualizability – ISA features
Case Studies – IBM VM, x86/VMware, Intel VT-x
System VMs
3
Applications
Simultaneous support for multiple OSes/Apps
•
Simultaneous support for different OSes/Apps
•
E.g. Windows and Unix
Error containment
•
Easy way to implement timesharing
If a VM crashes, the other VMs can continue to work
Assumes VMM is correct (smaller/simpler)
Operating System debugging
•
Can proceed while system is being used for normal work
System VMs
4
Applications, contd.
Operating System Migration
•
Can proceed while “old” OS continues to be used
TIME
New
Release
Old
Release
System Programmers
Production Users
Converted
Production Users
Unconverted
Production Users
System Programmers
Converted
Production Users
Permanently
Unconverted
Production Users
new release
being tested
System VMs
new release
installed
newer release
being tested
5
Applications, contd.
Retrofitting new features
•
Support for multiple networked machines on one
physical machine
•
Allows debug of network software
Enables complex debugging and performance
monitoring tools
•
Have VMM transform new device into a virtual device
By putting them in the VMM (not the guest OS)
Education
System VMs
6
History
Early-60s IBM M44/44X
•
Mid-60s IBM CP-40
•
•
•
VM for modified IBM 7044
“close enough to a virtual machine to show that ‘close
enough’ did not count”
Time-sharing system that protects users via virtual machines
aka “pseudo machines”
Used modified IBM 360/40
Implemented via assoc. memory and microcode
VMs used real memory; VMM managed virtual memory
CMS
•
•
Cambridge (conversational) monitor system –
Single user OS developed for VMs (like DOS)
System VMs
7
History
Mid/late-60s IBM 360/67 -- CP-67
•
•
Late 60s/early 70s
•
First 360 with VM.
CMS an essential part
VMs blossomed as a research topic
Early 70s several VM implementations
•
•
•
•
Honeywell
DEC
RCA
Several university projects
System VMs
8
System VMs
Virtual Machine Monitor (VMM) manages real
hardware resources
All Guest systems must be given logical
hardware resources
All resources are virtualized
•
•
Linux
applications
Windows
applications
OS/2
applications
Linux
Windows
OS/2
By partitioning real resources
By sharing real resources
Guest state must be managed
•
•
Virtual Machine Monitor (VMM)
x86 PC
By using indirection
By copying
System VMs
9
System VMs: Processor Mgmt/Protection
VMM runs in system mode
•
VMM manages/protects processor through
conventional mechanisms
Guest OSes run in user mode
Guest OSes do not have direct control over
hardware resources
All attempts to interact w/ hardware resources are
intercepted by VMM
VMM manages shadow copies of Guest
System state (incl. control registers)
VMM schedules and runs Guest Systems
System VMs
10
VM Timesharing
VMM Timeshares resources among guests
•
Similar to OS timesharing applications
VMM
VMM restores
determines next
architected state
VM to be
for next VM
activated VMM sets timer
VMM sets PC to timer
interval and
VMM saves
Timer interrupt
interrupt handler of OS
enables
architected state
occurs
in next VM
interrupts
of running VM
First VM Active
System VMs
VMM Active
Next VM Active
11
Native and Hosted VMs
Virtual
Machine
Applications
Virtual
Machine
VMM
Virtual
Machine
VMM
OS
VMM
Host OS
Hardware
Hardware
Hardware
Hardware
Traditional
uniprocessor
system
Native
VM system
User-mode
Hosted
VM system
Dual-mode
Hosted
VM system
System VMs
Host OS
Non-privileged
modes
Privileged
Mode
12
Virtualizing State
VMM Memory
Indirection
Hold guest state in VMM
memory
Change pointer on guest
switch
Example: registers
Processor
Register Block
Register
values
for VM 1
Register
values
for VM 2
Pointer
Register
values
for VM 3
System VMs
13
Virtualizing State
Copying
Hold guest state in VMM
Memory
Copy state on guest
switch
VMM Memory
Processor
Register values
for VM 1
Register values
Processor
for VM 2
Registers
Register values
for VM 3
System VMs
14
Processor Management/Protection
Application
Traps and interrupts (& sys calls)
•
•
•
Guest OS “return” to user app.
•
•
Transfer to VMM
VMM determines appropriate Guest OS
VMM transfers to Guest OS
•
•
•
Guest OS
privileged operation
next instruction
Transfer to VMM
VMM bounces return back to Guest app.
Read/Write of protected control
registers
•
system call/trap
Trap to VMM
VMM reads/modifies guest copy
May modify shadow copy
Returns to Guest
virtual vector location:
VMM
check privileges
perform operation
return
vector location:
System VMs
15
OS VMs: Key Issue – ISA Virtualizability
What if privileged instruction no-ops in user mode?
(rather than trapping)
•
What if user can access memory with real address?
•
Then… VMM can’t intercept when Guest OS attempts the
privileged instruction
Then… a guest OS may see that the real memory it really has is
different from the memory it thinks it has
What if user can read system control registers?
•
Then… guest OS may not read the same state value that it thinks
it wrote
System VMs
16
Virtualizability (Popek, Goldberg, 74)
Classic work in formalizing OS VM concepts
Defines basic VM properties
Defines properties of instruction sets
Proves that VMM can be constructed if
instruction set properties hold
Extends to recursive VMs
Reduces to hybrid VMs
System VMs
17
VM Properties
Virtual Machine: efficient, isolated
duplicate of the real machine
Virtual Machine Monitor: software that
implements VMs
Essential VMM characteristics
1)
2)
3)
Provides an environment essentially identical to
the real machine
Except timing and availability of resources
Programs show only minor decreases in speed
Mostly native instruction execution
Has complete control of system resources
System VMs
18
Privileged Instructions, Definition:
Trap if executed in user mode; not in
supervisor mode
Privileged instructions are required to trap
•
No-op in user mode is not enough
System VMs
19
Control Sensitive instructions:
1. All instructions that change the amount of
(memory) resources (or the mapping)
•
•
base/limit register in simplified paper version
page table in general
2. All instructions that change the processor mode
Instructions that provide control of resources
Examples:
•
•
•
Load TLB (if TLB is architected)
Load control register
Return to user mode
System VMs
20
Behavior Sensitive instructions:
1.
2.
All instructions whose results depend on
the mapping of physical memory
All instructions whose behavior depends on
the mode
Instructions whose behavior depends on
configuration of specific resources (and
who owns them)
Examples:
•
•
Load physical address
POPF (Intel x86): Interrupt-enable flag remains
unaffected in user mode
System VMs
21
Instruction Types -- Summary
NonPrivileged
Innocuous
Privileged
Behaviorsensitive
Sensitive
Controlsensitive
Sensitive
Innocuous Instructions: Those that are not control or
behavior sensitive
System VMs
22
VMM components
Instruction
trap occurs
These instructions
Dispatcher
desire to change
machine resources,
e.g. Load Relocation
Bounds Register Privileged
Instruction
Allocator
Privileged
Instruction
Privileged
Instruction
Interpreter
Routine 1
Privileged
Instruction
Interpreter
Routine 2
These instructions do not
change machine resources,
but access privileged
resources, e.g. IN, OUT,
Write TLB
System VMs
Interpreter
Routine n
23
VMM components
Dispatcher
•
•
Allocator
•
Decides which system resources should be provided and to
manage shared resources among VMs
Interpreters
•
Target of vectored traps – entry point for VMM
Decides which of other components to call
Emulate the effects of privileged instructions
VMM runs in supervisor mode; all other software in
user mode
System VMs
24
Privileged Instruction Handling
LPSW: Load Program Status Word
Includes Mode Bit and PC (among other things)
Guest OS code in VM
VMM code
(user mode)
(privileged mode)
Dispatcher
Privileged instruction (LPSW)
…
...
…
...
Next instruction (target of LPSW)
System VMs
LPSW Routine:
Change mode to privileged
Check privilege level in VM
Emulate instruction
Compute target
Restore mode to user
Jump to target
25
Virtual Machine “requirements”
1.
2.
3.
All innocuous instructions are executed by
the hardware directly
The allocator must be invoked when any
program attempts to affect system
resources
Any program executes exactly as on real
hardware except
•
•
For timing
Availability of system resources
A VMM satisfies all three requirements
Precise versions of informal definitions
given earlier
System VMs
26
Virtual Machines: Main Theorem
A virtual machine monitor can be constructed if the set
of sensitive instructions is a subset of the set of
privileged instructions
Proof shows
Equivalence by interpreting privileged instructions and
executing remaining instructions natively
Resource control by having all instructions that change
resources trap to the VMM
Efficiency by executing all non-privileged instructions
directly on hardware
A key aspect of the theorem is that it is easy to check
System VMs
27
Recursive Virtualization
Virtual
Machine
Virtual
Machine
Virtual
Machine
VMM
Virtual
Machine
2nd level VMM
Non-privileged
modes
Privileged
Mode
Hardware
System VMs
28
Recursive Virtualization
Running a VMM as a VM on a VM on a VM….
Theorem: A conventional third generation computer is
recursively virtualizable if it is (a) virtualizable, and
(b) a VMM without any timing dependences can be
constructed for it
Proof – A VMM is a program and from the VM theorem
will be “identically performing” except for timing
dependences and resource constraints.
Timing is excluded in the theorem;
Resource constraints only limit the depth of
recursion.
System VMs
29
Hybrid Virtualization
Some ISAs are more virtualizable than others
•
•
User sensitive instructions
Executed in user mode and can change memory
resources or processor mode, or whose
behavior depends on real memory locations
Supervisor sensitive instructions
Executed in supervisor mode and can change
memory resources or processor mode, or
whose behavior depends on real memory
locations
System VMs
30
Hybrid Virtualization: Theorem
A hybrid virtual machine monitor can be constructed if
the set of user sensitive instructions is a subset of the
set of privileged instructions
Nonprivileged supervisor sensitive instructions are OK
Example: PDP-10 JRST 1 – return to user mode
•
When the VMM executes the VM supervisor, it must use
some form of emulation to locate supervisor sensitive
instructions
•
(does not trap if already in user mode)
Low efficiency, but only in VM supervisor, not user code
If a user sensitive instruction is not privileged, then the
VMM must emulate all the user code
•
•
Fails efficiency condition
But “binary translation” be done more efficiently than interpreting
System VMs
31
Case Study: Virtualizing the x86 ISA
x86 Evolved through many extensions
Instruction set is not (strictly) virtualizable
•
Nor is it hybrid virtualizable
System VMs
32
X86 Processor Control
Uses “baroque” late 1970s style protection rings
•
Four rings, 0-3;
0
1
2
3
Unix was a reaction to this style
OS Kernel
High priority drivers and OS services
Low priority device drivers
User
0
1
2
3
Transfer to lower ring (higher privilege) must go
through “gate”
System VMs
33
Memory Mapping
Segments map to 2GB memory space
2 GB space maps to fixed-size pages
Segment descriptor info
•
•
•
•
•
Valid, Base, Limit,
Type (code or data)
R/W rights,
Descriptor Privilege Level (DPL)
Etc.
System VMs
34
Memory Mapping
code segment register
2GB Memory
segment register
base, limit, rights (R/W),
Desc. Priv. Level (DPL)
Req. Priv. Level (RPL)
Code
stack segment register
segment register
2 Level
Page Table
data segment register
Real
Pages
Data
segment register
data segment register
segment register
loaded into
segment registers
data segment register
segment register
data segment register
Segment
Descriptor
Tables
segment register
Descriptor Table
Registers
System VMs
35
Addressing
Addressing is via Segment Registers
Segment Registers
•
•
•
•
CS code segment
SS stack segment
DS, ES, FS, GS data segments
All memory accesses are via a segment register
Segment descriptors are entered into segment
registers
•
•
And given an RPL, Requestor Privilege Level
In some cases privilege is lowest of RPL,DPL
e.g. when pointers are passed
System VMs
36
X86 Processor/Memory Protection
CPL –current protection level, normally
determined by DPL of current code
segment
•
CPL == processor mode
To access data, CPL DPL
To call procedure, must enter through gate
if CPL(callee) < CPL(caller)
<< this is a very abbreviated description >>
System VMs
37
X86 Instruction Set Virtualizability
Ordinarily:
•
•
To virtualize, everything runs at level 3
IN, INS, OUT, OUTS – I/O instructions
•
•
•
•
Levels 0,1,2 == supervisor
Level 3 == user
Perform check CPL IOPL (I/O Privilege Level)
Not privileged (by Goldberg’s defn)
Control sensitive (I/O is resource), action sensitive to CPL
Could be user sensitive
POPF, PUSHF
•
•
•
Push/pop stack to/from EFLAGS register
EFLAGS contains IOPL (among other things)
And this flag indicates IO privilege level of current task
System VMs
38
X86 ISA Virtualizability, contd.
SGDT, SIDT, SLDT, SMSW, STR
•
•
•
Copy descriptor pointer register, or system state
information
Typical manual entry:
“The SGDT and SIDT instructions are only useful in
operating-system software; however, they can be used in
application programs without causing an exception to be
generated”
E.g. behavior sensitive, non privileged
VERR/VERW
•
•
•
Verify if addressed segment is readable or writeable by
CPL –
Seem like perfectly reasonable instructions,
BUT behavior sensitive and not privileged
System VMs
39
X86 ISA Virtualizability, contd.
LAR/LSL
•
•
•
•
•
MOVs, PUSH/POP to/from segment registers
•
•
LAR -- load access rights and DPL
LSL – load segment limit
May no-op, in effect, if CPL isn't good enough.
I.e. performs CPL/RPL check before it does inst.
Behavior sensitive and not privileged
Copy RPL from segment register
Behavior sensitive and not privileged
Pre-Scanning is probably a necessity
System VMs
40
Hybrid Virtualization: Patching
Scan Guest OS, find problem instructions, replace with jump
to VMM
Code Patch for
discovered
critical instruction
Scanner and
Patcher
Control transfer,
e.g. trap
VMM
Original Program
System VMs
Patched Program
41
Hybrid Virtualization: Code Caching
Scan Guest OS, “translate” into code cache, find problem
instructions, replace with jump to VMM
Specialized
Emulation Routines
Block 1
Block 1
Code section
emulated in code
cache
Control transfer,
e.g. trap
Translation
Table
Block 2
Code
Cache
Block 3
Block 2
Two critical
instructions combined
into a single block
Block 3
Patched Program
System VMs
VMM
42
Virtualizing Memory: Review
OS memory region
PT Pointer
process 1 PT
user
user
Context
switch
OS managed
Real Pages
super
process n PT
user
super
System VMs
43
Virtualizing Memory
Real memory partitioning?
•
•
Guest manages its virtual page tables
Guest page table addresses are write protected
VMM manages shadow page tables that reflect
actual mapping to physical pages
•
Could be fixed partition per guest => inefficient
Typically flexible partitioning via VMM management
Note Real / Physical page distinction
VMM can change shadow page table by writing page
table pointer
•
i.e. virtual machine state change via indirection
System VMs
44
Virtualizing Memory – Example
VMM-managed
Physical Pages
VMM memory region
Guest 1 Shadow PTs
PT Pointer
process 1 user mode
PT
Guest 1 ShadowPTs
process 1 super mode
system
PT
call
context
switch
process n user mode
PT
process n super mode
PT
guest OS switch
Guest 1 PT Pointer
Guest 1 OS memory
region
process 1 PT
user
user
Guest 1 OS managed
"Real" Pages
context switch super
Guest n Shadow PTs
process n PT
user
Guest n PT Pointer
super
System VMs
45
Virtualizing Memory – Operations
Guest OS
Guest application performs
system call
•
•
•
•
•
next instruction
Trap to VMM
VMM changes shadow mapping
to reflect guest privilege change
Guest OS performs context
switch
•
write PT pointer
Writes PT pointer
Trap to VMM
VMM writes guest PT pointer
VMM modifies shadow PT
pointer
VMM
Shadow PT ptr
check privileges
write guest PT ptr
write shd. PT ptr
return
Guest PT ptr
System VMs
46
Virtualizing Memory -- TLBs
TLB plays role of page table
Page table is just a software structure of which the
VMM has no special knowledge
Assume TLB entry:
•
Virtualize TLBs
•
•
•
PId, Protection bits, usage bits, real page frame
VMM keeps track of Guest’s copies
VMM manages real copy
Real TLB holds subset of pages mapped in Guest copy
Virtualize PIds
•
•
VMM manages real PIds
Keep track of mapping from guest PIds to real PIds
At any given time all TLB entries with same PId are
associated with same guest
System VMs
47
Virtualizing TLBs
TLB Read/Write are privileged instructions
•
Guest OS write TLB
•
•
•
Behavior and control sensitive
Intercepted by VMM
VMM updates guest’s virtual copy
VMM may modify real TLB
Guest OS read TLB
•
•
•
Intercepted by VMM
VMM reads guest’s virtual copy and returns contents to guest
May have to merge in usage data from real version in TLB
System VMs
48
Virtualizing TLBs
TLB miss
•
•
TLB management
•
•
Traps to VMM
VMM check to see if virtual TLB maps page
If so, VMM handles it and silently returns
Else, VMM reflects fault to guest OS
Can’t switch TLBs via indirection as with page tables
PId management can give similar control, however
Guest system call/returns (privilege changes)
•
•
Flush old mode PId entries
New mode TLB entries re-loaded on demand
Or write new TLB entries with privileges
Use two real PIds per virtual PId
One with virtual system mode privileges
Other with virtual user mode privileges
System VMs
49
Virtualizing I/O
Hardest part of virtualization
•
•
•
Many device types
Many devices of each type
Each with its own driver
New devices may be added during lifetime of system
In older, “classic” systems, less of a problem
•
•
•
Entire system developed by one company
Far fewer devices to worry about
Channels (IO Processors) isolated key IO software
System VMs
50
I/O Architecture
I/O instructions
•
•
•
Memory mapped I/O
•
•
Special privileged opcodes
Similar to loads/stores
Address and data read/written on I/O bus
Load/stores to special (protected) memory addresses
Addresses/data decoded by hardware and translated
to I/O addresses/data
Addresses indicate I/O devices/registers
Data can be status, commands, or real data
System VMs
51
I/O Architecture (contd)
DMA (block) transfers may require several I/O
operations
•
•
•
•
Starting address(es)
Block length
Command (read, write, interrupt on completion)
Requires exclusive device access
Interrupts
•
From I/O devices to force processor transfers to I/O software
routines
data bus
address bus
Decode
Data Buff
System VMs
Status
Start Address
Block Size
52
I/O Management: review
OS manages I/O resource
•
•
User software performs system calls
with general I/O requests
OS converts I/O calls to driver calls
•
Allocates space on storage devices, etc.
Serializes requests for shared devices
Driver contains device-specific software
Exact commands, controller registers, etc.
Driver generates device (and bus)specific I/O operations
System VMs
Application
system calls
Operating System
driver calls
VM mgr
I/O Drivers
phy. mem. and I/O operations
Hardware
53
Device Types
Dedicated
•
•
•
Partitioned
•
•
Monitor, mouse, keyboard
Device can’t be virtualized; must be shared (under user control)
VMM still controls due to privileged mode
Disk
Make multiple, smaller virtualized versions
Shared
• Network adapter
• VMM manages virtual state information
• Translate virtual requests to physical requests
Spooled
• Printer
• Shared but at coarse granularity
System VMs
54
Spooled Devices
Two level spool table
First write to VM spool
area
When ready, VMM copies
to VMM spool area
Then invokes device
When device finished
Virtual Machine 1 Spool Table
Program
Status
A
Printed
1000
11000
400
B
Completed
2000
12000
200
C
Running
3000
13000
200
D
Completed
4000
14000
500
•
Both VM and VM spool tables
receive “complete”
Real loc.
Size
10000
Virtual Machine 2 Spool Table
20000
Program
Location
Status
Location
Real loc.
Size
P
Running
1000
21000
400
Q
Completed
2000
22000
800
VMM Spool Table
30000
VM
Program
Status
1
A
Printed
30000
400
2
Q
Printing
31000
800
1
B
Waiting
31800
200
1
D
Waiting
30400
500
Real loc.
Size
Optimizations are possible
•
E.g. VMM uses VM spool buffer
System VMs
55
Spooled Devices
Virtual Machine 1 Spool Table
Program
Status Location
Real loc. Size
A
Printed
1000
11000
400
B
Completed
2000
12000
200
C
Running
3000
13000
200
D
Completed
4000
14000
500
10000
Virtual Machine 2 Spool Table
20000
Program
Status Location
Real loc. Size
P
Running
1000
21000
400
Q
Completed
2000
22000
800
VMM Spool Table
VM Program
System VMs
Status
30000
Real loc. Size
1
A
Printed
30000
400
2
Q
Printing
31000
800
1
B
Waiting
31800
200
1
D
Waiting
30400
500
56
Non-existent Devices
Implement virtual version only
Example: network adapter
•
Allows VMs on same platform to
communicate
System VMs
57
I/O Interception Points
Attempts to interact with virtual devices are
intercepted by VMM which translates to real
devices
Application
At system call interface
system calls
At driver call interface
Operating System
driver calls
VM mgr
At I/O device interface
I/O Drivers
phy. mem. and I/O operations
Hardware
System VMs
58
At system call interface
System call traps to VMM
VMM interprets system call to produce driver calls
VMM contains shadow drivers
•
Guest OS contains virtual I/O code and drivers
•
(Implement VMM with driver interface compatible with some
existing OS?)
Must still be executed, for correct guest state updates
Problems
•
•
•
VMM must interpret all I/O system calls for all guest OSes
VMM must have access to drivers for all real devices
I/O initiated by guest OS may not always pass through call
interface
System VMs
59
At driver call interface
Guest OS contains driver stubs
Guest OS driver calls can operate on
generic virtual devices
•
VMM contains shadow drivers
•
To simplify conversion
•
•
system calls
Guest OS
driver calls
These drivers correspond to real devices
Generic I/O operations passed to VMM
and converted to shadow driver calls
Problem
•
Guest Application
VMM must have access to real drivers
Need generic drivers for each guest OS
Guest OSes must have well defined,
modular driver call interface
System VMs
Generic I/O
Drivers
generic I/O operations
VMM
.interpret
I/O drivers
I/O operations
Hardware
60
At I/O device interface
Guest OSes contain real drivers
Low level I/O operations trap to VMM
VMM must check/translate I/O
operation
If legal, VMM performs I/O operation
on behalf of guest
VMM passes control back to guest
Problems
•
VMM must know some device specifics
(even if it doesn’t contain full drivers)
VMM must manage serialization for
shared devices
VMM must check correctness of I/O
operations
System VMs
Guest Application
system calls
Guest OS
driver calls
I/O Drivers
I/O operations
check/
VMM . translate
I/O operations
Hardware
61
Virtualization with IOPs (IBM Style)
IO instruction points to Channel program
•
•
•
•
Similar to driver
Micro-code like
Very simple control flow
“Packages” sequences of related operations
VMM can translate channel program as a whole
•
•
Mostly consists of address re-mapping
And dealing with non-contiguous pages
Reduces/eliminates problems with I/O sequences that
require exclusive access to a device
System VMs
62
Case Study: IBM 360/370/390
CP-67 on 360/67 in 1960s
•
•
•
VM/370 (1972) led to widespread use of VMs
Virtual Machine Assist (1974)
•
Further enhances VM support
Handshaking
•
Enhancements to support VMs
Extended Control Program Support (1978)
•
First production VM implementation
Provided means for supporting timesharing via Multiple guest
versions of CMS – single user OS
Used basic virtualization concepts described by Goldberg
Lets Guest OS in on the secret
Interpretive Execution Facility (IEF)
System VMs
63
Reasons for VM Slowdown
VM initialization
•
Privileged Instruction overhead
•
•
•
Reflect through VMM before getting to Guest OS
Virtual Memory Management
•
Requires trap/reflection back to Guest OS
Interrupts
•
Trap to VMM
Interpretation by VMM
Return from VMM to guest
System Calls (SVC) by guest in user mode
•
Setting up virtual state
Shadow page faults when page is already mapped
Duplicated effort between VMM and Guest OS
•
Memory management done by both
System VMs
64
Virtual Machine Assists
Ways of making application on VM run faster
•
Have no performance effect if run in native mode
Instruction Emulation
Shadow Table Management
Virtual interval timer
System VMs
65
IBM 370 Virtual Machine Assist
Add Control Register 6 (CR6)
•
•
•
•
•
•
Bit 0 VM Assist On/Off
Bit 1 Virtual user/supervisor state
Bit 4 SVC handling On/Off
Bit 5 Shadow table fixup On/Off
Bit 7 Virtual interval timer assist
Bits 8-28 address of VM pointer list
CR6 Set by VMM when Guest is dispatched
System VMs
66
Instruction Emulation
Certain privileged instructions emulated directly in
microcode
•
•
•
Avoids trap/interpretation by VMM
Guest must be in Virtual Supervisor mode (held in CR6)
Examples:
Load PSW Load Real Address
Reset Reference Bit
Store Control
Supervisor Calls also emulated
•
•
If SVC handling is enabled via CR6
Avoids trap/reflection through VMM
System VMs
67
Shadow Table Management
When page fault occurs:
•
•
If Guest OS has page mapped
and page is already present in real memory
but not mapped by guest’s shadow table then
VM assist updates shadow table automatically
Else, reflects fault to VMM
Uses VM pointer list to find guest tables
System VMs
68
Performance Improvement
Reduction in Supervisor State Time
•
Reduction in Elapsed Time
•
70-90%
40-65%
Reduction in Priv. Insts. Interpreted by VMM
•
75-95%
System VMs
69
Extended Control Program Support
Emulates additional Privileged Instructions
•
Partially handles other Privileged Instructions (with
help from VMM)
Non-architected instructions for use by VMM
•
e.g. Purge TLB, Test Channel
Examples
Decode channel words
Dispatch a virtual machine
Locate virtual I/O control blocks
(many others)
Virtual Timer Assist
•
•
Maintains a virtual interval timer for guest VM
Real interval timer is a hardware resource
System VMs
70
Interpretive Execution Facility
Provides a way to execute most of the VMM
functions in hardware
Function of VMM separated between hardware and
software
•
Advantages of interpretive execution
•
•
•
Cleaner separation compared to earlier VM assists
Better performance
Better predictability of performance
Applicable for all types of guest operating systems
Key instruction: SIE (Start Interpretive Execution)
•
•
•
•
Used by VMM to give control to hardware
Architectural state of VM in table accessible to hardware
Privileged instructions interpreted in hardware
Occasionally need to get back to the software part of the
VMM
System VMs
71
Entry and Exit from IE mode
VMM Software
.
.
.
SIE
.
.
.
.
.
.
Emulation
.
.
Entry into InterpretiveExecution mode
Interpretive
Execution Mode
Exit for interception
Exit for host interrupt
Host Interrupt
Handler
System VMs
72
Inter VM Communication
Other VMM extensions focus on inter-machine
communication by emulating many distributed
system features
•
•
e.g. virtual LANs
VMs by their nature are isolated – but inter-user
communication is also desirable
System VMs
73
IBM Handshaking (Para Virtualization)
Allow Guest OS to discover that it is running on VMM
•
•
Reduces duplicated effort
•
Guest “probes” for VMM when it is booted
Then informs VMM that it expects VMM support
OS can mark all page frames fixed,
disable demand paging,
bypass channel address translation
Pseudo page fault handling
•
•
•
Under operator control, VMM notifies Guest OS when
VMM is handling a page fault by the Guest VM
Guest OS marks faulting task as “page wait”
Guest OS Dispatches another task
(I.e. whole Guest VM does not have to wait)
System VMs
74
VMware: an x86 System Virtual Machine
Applying Conventional VMs to PCs – Problems:
•
•
Installing the VMM on bare hardware, then booting Guests onto
VMM.
Need to support many device types, many more drivers
VMware solves both problems
Uses Host OS/Guest OS model
•
•
“Hosted VM”
Uses Host OS for some VMM functions
Including I/O
System VMs
75
VMware: Three Main components
Begin with already-loaded Host OS
VMDriver (Pseudo-Driver)
• Host OS-specific
• Installed as a driver, but can take
over the machine
• Acts as conduit between
System and User VMMs
VMMonitor (System-level VMM)
• Slipped under installed OS
via Pseudo-Driver
VMApp (User-level VMM)
Host Apps
• Appears as ordinary
VMApp
application to installed OS
VMDriver
• Can make normal I/O calls
Host OS
(and use installed drivers)
Virtual Machine
Applications
OS
(eg. Linux, Windows)
Hardware
(x86motherboard,
display, adapters, etc.)
User mode
VMMonitor
Privileged
Mode
Hardware
System VMs
76
VMM Communication
VMM control passes back and forth between
user and system-level VMM portions
User VMM performs system call to pseudodriver; then waits for response
System VMM maintains control, then sends
response message back to User VMM
System VMs
77
Resource Management
Host OS schedules processor resource
•
User-level VMM is just another application
Host OS manages memory
•
•
VM memory is allocated as address space of Userlevel VMM
User level VMM “mallocs”; whole VM uses it
System VMs
78
VMware I/O
Guest OS contains generic
drivers
Generic drivers operate on virtual
devices managed by user mode
portion of VMM
User mode portion of VMM makes
normal system calls
System calls cause Host OS to
use real drivers and devices
Guest Application
system calls
Guest OS
VM mgr
Generic I/O
dirvers
phy. mem. and I/O operations
SW
Virtual
VMM Devices
VM
mgr (user mode)
System Calls
Host OS
VM mgr
I/O Drivers
phy. mem. and I/O operations
Hardware
System VMs
79
I/O Sequence
Guest application makes system call
Intercepted by System-level VMM, reflected to
Guest OS
Guest OS performs I/O operations specified in
generic drivers
System-level VMM captures I/O operations, and
interprets them
Passes operation back up to User-level VMM
User-level VMM performs I/O call to Host OS
System VMs
80
Example: Network Virtualization
Virtual and Physical Network Interface Card
(NIC) the same
Message Send
•
•
X86 OUT or OUTS plus port# (in range of IDs for NIC)
Each port has state bit trap on I/O request
VMM saves permission “map” for all ports per
guest VM
System VMs
81
Example: Network Virtualization
Sequence below
Guest OUT traps to VMM
VMM checks guest permissions before making request to
physical NIC
User on VM 1
OS on VM 1
VMM
Device Driver
User sends
message to
external machine
e.g.. usingsend()
OS converts into
I/O instructions for
virtual NIC e.g.
OUTS 0xf0,...
VMM sends packet
on virtual bridge to
device driver of
physical NIC e.g.
OUTS 0x280,….
NIC device driver
launches packet
on network using
wire signals
User mode
System VMs
To network
Privileged mode
82
Virtual Network
Virtual and Physical NIC different
Special case: virtual network
User sends
message to local
virtual machine
e.g.using send()
OS converts into
I/O instructions
e.g.
OUTS 0xf0,...
User on VM 2
OS on VM 2
Receiver gets
packet
Interrupt handler
in OS generates
I/O instructions to
receive packet
User mode
System VMs
VMM sends packet
on virtual bridge to
device driver of
physical NIC e.g.
OUTS 0x280,….
NIC device driver
converts send
message to a
receive message
for receiving VM.
No wire signals
are generated.
VMM raises interrupt
in receiver’s OS
Privileged mode
83
Case Study: Intel VT-x (Vanderpool)
x86 Virtualization Extensions recently announced
by Intel
New VMX mode
•
Root level
•
•
•
Two privilege levels: root and non-root
Similar to conventional x86
Plus new VMX instructions
VMM runs in root level
Non-root level
•
•
•
Limited control of resources
Including when in ring 0
Guest OS plus apps runs in non-root level
System VMs
84
VT-x Operation
Transition from normal mode to VMX root mode via vmxon
instruction
VMM in root level, sets up the environment for each VM and
initiates the virtual machine via vmlaunch instruction
Attempts to modify resource cause return to root level
Explicit vmcall causes return to root mode
vmresume instructions causes return to guest in non-root
mode
vmxoff instruction causes exit from VMX mode
vmxon
Regular
Mode
vmlaunch
VM1
Root Mode
(VMM)
vmlaunch
VM2
Non-Root
(VM1)
VM1
exits
System VMs
vmresume
VM2
vmresume vmresume
VM2
VM1
vmxoff
Non-Root
(VM2)
VM2
exits
Regular
Mode
VM2
exits
VM2
exits
VM1
exits
85
VT-x Capabilities
Root mode eliminates need to run all guest code in user
mode
VMM runs in root mode
• For code regions with no critical instructions, HW is as efficient as
normal machine
•
VT-x HW maps state-holding data elements directly to native
structures during VM execution.
VMCS (virtual machine control structure) encapsulates VM state
• HW implementation can take over loading and unloading state
• No need for VMM to perform load/stores of state info.
•
Eliminates the need for paravirtualization,
Allows standard versions of OSes to be used as guests
• The vmcall instruction, can be used to pass hints and data to the
VMM if desired
•
System VMs
86
VMCS
Can be implemented by HW or SW in root mode
•
VMM is implementation-dependent
Aligned on 4KB boundary
Pointed to by VMPTR
Load VMPTR with vmptrld instruction
• Read VMCS with vmread ; Write VMCS with vmwrite
•
State Area
Guest State
Host State
VM Execution Controls
Control Area
VM Exit Controls
VM Entry Controls
VM Exit
Information
System VMs
Basic Information
Other Exit Information
Register State
Interruptibility State
Register State
Pin-based Execution Controls
Processor-based Execution Controls
Bitmap Fields
etc.
Control Bitmap
MSR Controls
Control Bitmap
MSR Controls
Controls for Event Injection
VM-Exit Information
Vectoring Event Information
Due to Event Delivery
Due to Instruction Execution
87
Critical Instructions
Programmable VM exit conditions given in VMCS
E.g., which instructions should cause exit to VMM
Example: Read Time Stamp Counter (RDST)
Contained in 64-bit MSR -- IA32_TIME_STAMP_COUNTER
• Works in any mode if TSD bit in control register 4 is off
• Otherwise works only in Ring 0; otherwise traps
(protection mode exception)
•
System VMs
88
RDST
rdtsc instruction
encountered
Machine in
VMX mode?
No
Yes
RDTSC
exiting bit is set
in VMCS?
Yes
Perform normal
operation
Save exit information
Exit VM.
Return control to
VMM
No
TSD bit
of CR4 is set
in VM?
No
Yes
Ring 0
operation?
No
Yes
Use
TSC Offsetting
bit is set
in VMCS?
Protection Exception.
Save exit information. Yes
Exit VM.
No
Return timestamp counter
value
Return control to
Add TSC offset to timeVMM
stamp counter value.
System VMs
Return sum
89