Transcript ppt
Disco: Running Commodity Operating
Systems on Scalable Multiprocessors
Edouard et al.
Madhura S Rama
Agenda
Goal and Objective
The Problem
Virtual Machine Monitor
Disco – VMM
Experimental Results
Related Work
Conclusion
Goal and Objective
Extend modern OS to run efficiently on
shared memory multiprocessors without
large changes to the OS.
Use Disco, a Virtual Machine Monitor that
can run multiple copies of (IRIX) OS on a
multiprocessor.
Problem
Scalable shared multiprocessors are highly
available in the market.
System Software for these machines have
trailed behind.
Extensive modifications (like partitioning the
system, building single system image, fault
containment and ccNUMA management) to the
OS is necessary for it to support scalable
machines – resource intensive.
High cost and reliability issues
Disco
A prototype designed to run on FLASH
(developed at Stanford), an experimental
ccNUMA machine.
Disco combines commodity OS not
designed for running on SMMP to form a
high performance system software base
It is a software layer that is inserted
between the hardware and the OS.
Virtualize to run multiple OS concurrently
ccNUMA Architecture
Provides a single memory
image – logically belongs
to one shared address
space
As memory is physically
distributed, the access
time is not uniform – Non
Uniform Memory Access
(NUMA)
Variables must be
consistent - Cache
Coherent (ccNUMA)
Virtualization
Pure
Present abstracted
hardware
Compile code to
abstracted hardware
Compilation not
required if h/w is
abstracted properly –
binary compatibles
are sufficient
Interpret code to run
on real hardware
Efficient
Requires 2 privilege
levels
User mode programs
run directly on h/w
Privileged instructions
are intercepted and
translated by the
VMM
Virtual Machine Monitor
A software layer between the hardware
and the OS.
Virtualizes all the resources
Allows multiple OS to coexist
VM’s communicate using distributed
protocols
Small piece of code with minimal
implementation effort.
Architecture of Disco
Advantages
By running multiple copies of an OS,
VMM handles the challenges of ccNUMA
machines:
Scalability – only the monitor and the
distributed protocols need to scale to the size
of the machine
Fault Containment – system s/w failure
contained in the VM. Simplicity of monitors
makes these tasks easier.
Contd..
NUMA memory management issues –
VMM hides the entire problem from the
OS by careful page placement, dynamic
page migration and page replication.
Single ccNUMA multiprocessor can run
multiple OS concurrently –older versions
provides a stable platform and newer
versions can be staged in.
Challenges of VMM
Overheads
Execution of Privileged instructions must be
emulated by the VMM
I/O devices are virtualized – requests must
be intercepted and remapped by the VMM
Code and data of each OS is replicated in the
memory of each virtual machine.
File system buffer cache is replicated in each
OS
Contd…
Resource Management – VMM makes
poor resource management decisions
due to lack of information
Communication and Sharing – In a naïve
implementation, File Sharing is not
possible between different VM’s of the
same user. Each VM acts as an
independent machine in a network.
Disco Implementation
Runs multiple independent virtual machines
concurrently on the same h/w
Processors – Disco emulates all instructions,
MMU and traps allowing unmodified OS to run
on a VM
Physical Memory – Provides an abstraction of
main memory residing in contiguous physical
address space starting at 0.
I/O Devices – All I/O devices are virtualized and
intercepts all communication to
emulate/translate the operation.
Disco Implementation
Small size of code, allows for higher
degree of tuning – replicated in all
memories
Machine-wide data structures are
partitioned such that parts accessed by a
single processor are in a memory local to
that processor
Virtual CPU’s
Disco emulates the execution of virtual
CPU by using direct execution on the real
CPU – user applications runs at the speed
of h/w
Each virtual CPU contains data structure
similar to a process table - contains
saved registers and other state info.
Maintains privileged registers and TLB
contents for privileged instructions **
Virtual Physical Memory
Maintains physical - (40 bit) machine
address mapping.
When OS tries to insert a virtual-physical
address mapping in the TLB, Disco
emulates and gets the machine address
for that physical address. Subsequent
accesses have no overhead
Each VM has a pmap –contains one entry
for each physical page **
Contd..
Kernel mode references on MIPS
processors access memory and I/O
directly - need to relink OS code and data
to a mapped address space
MIPS tags each TLB entry with Address
space identifiers (ASID)
ASIDs are not virtualized – need to be
flushed on VM context switches and not
on MMU Context switches
Increased TLB misses – create 2nd level
software - TLB **
NUMAness
Cache misses must be satisfied from local
memory to avoid latency
Disco implements dynamic page replacement
and migration **
Read-shared pages are replicated and writeshared pages are not
Migration and replication policy driven by cache
miss counting
Memmap – contains entry for each real machine
memory page. Used during TLB shootdowns
Transparent Page Replication
Virtual I/O Devices
Monitor intercepts all device accesses **
Single VM accessing a device does not
require virtualizing the I/O – only needs
to assure exclusivity
Interposition on all DMA requests allows
to share disk and memory resources
among virtual machines and allows VMs
to communicate with each other
Copy-on-write Disks
disk reads can be
serviced by monitor and if
request size is a multiple
of the machine page size,
monitor only has to
remap machine pages
into the VM physical
memory address space.
**
pages are read-only and
an attempt to modify will
generate a copy-on-write
fault
Virtual N/W Interface
OS Changes
Minor changes to kernel code and data segment
(unique to MIPS architecture)
Disco uses original device drivers
Added code to HAL to pass hints to monitor in
physical memory
Request zeroed page, unused memory
reclamation
Change in mbuf freelist data structure
Call to bcopy, remap function in HAL
Experimental Results
Targeted to run on FLASH machine. Due
to unavailability, simOS used to develop
and evaluate Disco.
simOS slowdowns prevented from
examining long running workloads.
Using short workloads, issues like CPU
and memory overhead, scalability and
NUMA memory management issues were
studied.
Execution Overhead
experimented on a
uniprocessor, once
running IRIX directly
on the h/w and once
using disco running
IRIX in a single
virtual machine
Overhead ranges
from 3% - 16%.
Mainly due to TLB
miss.
Memory Overhead
Ran single workload
of eight different
instances of pmake
with six different
system configurations
Effective sharing of
kernel text and buffer
cache limits the
memory overheads of
multiple VM’s
Scalability
Ran pmake workload
under six
configurations.
Suffers from high
synchronization
overheads.
Using a single VM
has a high overhead.
When increased to 8
VM’s execution time
reduced to 60%
NUMA
Performance of UMA
machine determines
the lower bound for
the execution time of
NUMA machine
Achieves significant
performance
improvement by
enhancing the
memory locality.
Related Work
System software for scalable shared
memory machines
Virtual Machine monitors
Other system software structuring
techniques
ccNUMA memory management
Conclusion
Develop system software for scalable
SMMPs without massive development
effort
Experimental results shows that the
overhead of virtualization is modest in
both processing time and memory
footprints
Disco provides simple solution for
scalability, reliability and NUMA
management issues