W10-OPT Virtualizationx

Download Report

Transcript W10-OPT Virtualizationx

Software Architecture
in Practice
Docker in a few slides
Moving the boundary
Traditional VVMs
CS@AU
Docker
Henrik Bærbak Christensen
2
Images and Containers
• Core concepts in Docker
– Image
The encapsulation of a VM
• I.e. the physical file that contains the VM
• Similar to a Java Jar file
– Container
The executing instance of an image
• Similar to an executing Java system, running the main() from the Jar
file
– Docker Engine The VVM program on Linux
• That handles images and executes containers
CS@AU
Henrik Bærbak Christensen
3
Containers
• Container
– Application running on a slice/view of a (shared) OS
• Linux LXC technology extended
– namespaces (isolation) provides an isolated share of OS
resources
– cgroups (configuration) provides resource management of OS
resources (RAM, cpu, …)
CS@AU
Henrik Bærbak Christensen
4
Images
• Onion file system: Copy-on-Write
– Every operation basically creates a new file layer
• Changing ‘hans.txt’ in layer N creates a (modified) copy of ‘hans.txt’
in layer N+1
• Base images = ‘prebaked file system’
– All layers up-till N forms an Image
• I.e. henrikbaerbak/cloudarch:e16.1
– Ubuntu 16.04 LTS server base image
– Java, Ant, Ivy, Git, …
are all layered on top
CS@AU
Henrik Bærbak Christensen
5
Building Images
• How do you build a traditional server?
– Unbox the machine, power up, install Linux, install application
suite and libraries, execute server software
• Lifecycle - classic
– container = instantiate(image1)
Power up
• Docker run …
Install your app
– modify container
• Install software, change files, add stuff, …
– commit container → image2
‘Freeze’ the machine
• Docker commit
CS@AU
Henrik Bærbak Christensen
6
Building Images
• Lifecycle – infrastructure-as-code
– You automate the install script: Dockerfile
• Example: henrikbaerbak/cloudarch:e16.1
CS@AU
Henrik Bærbak Christensen
7
DevOps
• DevOps is about speed and agility in going from Dev to
Ops
• Dockerfiles are one piece of this puzzle: Installing the
software on a server, is coded in a programming
language
CS@AU
Henrik Bærbak Christensen
8
Docker Hub
• How to share ‘nice’ images with colleagues?
• The GitHub / Maven Repo movement
– ‘push’ your commits to a cloud base storage service
• Mvnrepository / github / bitbucket …
– ‘pull/clone’ from there
• Docker Hub
– Register as a user (free)
– Push your image to docker hub
One private repository per user. Use
– Done!
that for your SkyCave images!
CS@AU
Henrik Bærbak Christensen
9
Networking
• Distributed systems rely on networking!
• By default, network is an isolated resource in Docker!
– Ten Apache web servers, all listening on port 80, on the same
machine!
• Two core technologies
– Port forwarding
• For ‘exposing’ container services to the outside
– Docker network drivers
• For ‘binding’ container services together securely
CS@AU
Henrik Bærbak Christensen
10
Port Forwarding
• docker run -p 7777:6745 …
– Bind container port 6745 in container to host’s external port 7777
• Make docker service acts like it was deployed on host
CS@AU
Henrik Bærbak Christensen
11
Network drivers
• Any machine has several network interfaces
– Linux: ‘lo’ = Local Loopback, ‘ens32’ = Ethernet, …
• Docker will create new networks and attach containers to
them
– By default they are not shared among containers
• docker run --network=container:daemon (image) cmd
– This container will now reuse the network of container named
‘daemon’, i.e. they can communicate!
• Other options are
– --network=host
– --network=my-network
CS@AU
reuse host’s network
use named network
Henrik Bærbak Christensen
12
Software Architecture
in Practice
Building VMMs
Disclaimer
• Modern processers are much more virtualization
enabled, so some of these technologies are in less use
today…
– Follow up on hardware status is pending…
CS@AU
Henrik Bærbak Christensen
14
Definition
• [Rosenblum & Garfinkel 2005]
– A CPU architecture is virtualizable if it supports the basic VMM
technique of direct execution—executing the virtual machine on
the real machine, while letting the VMM retain ultimate control of
the CPU.
• That is, in a ‘same ISA’, execution of a guest OS
instructions should be executed directly by the hardware
– Why? Performance of course!
– Ex. JavaVM poses a performance penalty
CS @ AU
Henrik Bærbak Christensen
15
Challenges
• The problem is that not all instructions are possible to
execute directly!
• To see the problem we have to dig a bit into modern CPU
architectures
– I stopped around Z80 and Motorola 68000 
– 80286:
• First with protected mode…
• Support multitasking, process protection, memory mgt., …
CS @ AU
Henrik Bærbak Christensen
16
A few CPU terms
• Supervisor mode/Privileged mode:
– “An execution mode on some processors which enables
execution of all instructions, including privileged instructions. It
may also give access to a different address space, to memory
management hardware and to other peripherals. This is the mode
in which the operating system usually runs.” (Wikipedia)
CS @ AU
Henrik Bærbak Christensen
17
Supervisor/User mode
• x86 architecture
– Ring 0: Kernel mode
• Can do anything
• OS and device drivers
– Ring 3: User mode
• Limited Instruction Set
• Apps can fail at any time without impact on rest of system!
• User applications
• User apps must do system call (call OS) to interact with
hardware like device drivers...
CS @ AU
Henrik Bærbak Christensen
18
Performance
• Switching from “user mode” to “kernel mode” is, in most
existing systems, very expensive. It has been measured,
on the basic request getpid, to cost 1000-1500 cycles on
most machines.
CS @ AU
Henrik Bærbak Christensen
19
Trapping
• What happens if code in user mode executes a privileged
instruction?
– In computing and operating systems, a trap is a type of
synchronous interrupt typically caused by an exceptional
condition (e.g. division by zero or invalid memory access) in a
user process. A trap usually results in a switch to kernel mode,
wherein the operating system performs some action before
returning control to the originating process.
CS @ AU
Henrik Bærbak Christensen
20
Virtualization Requirements
• If VMWare is an app, running in Windows, and it runs the
Linux OS “inside”, it follows that
– Privileged instructions have to run in user-mode!
• VVM: Virtual Machine Monitor
– VVM runs in priviledged mode
– All ‘inside’ runs in user mode
CS @ AU
Henrik Bærbak Christensen
21
VMM and privileged instructions
• OK, so…
– Linux runs in user mode inside a VMM
– Now the Linux kernel disables interrupts (CLI)
• Which is certainly a privileged instruction which is not allowed in user
mode
• So – what can we do about that?
– Emulation
– Trap-and-Emulate
– Binary Translation
CS @ AU
Henrik Bærbak Christensen
22
Emulation Example: CPUState
static struct {
uint32 GPR[16];
uint32 LR;
uint32 PC;
int
IE;
int
IRQ;
} CPUState;
void CPU_CLI(void)
{
CPUState.IE = 0;
}
void CPU_STI(void)
{
CPUState.IE = 1;
}
CLI=Clear Interrupt Flag
STI=Set Interrupt Flag
IE = Interrupt Enabled
• Goal for CPU virtualization techniques
– Process normal instructions as fast as possible
– Forward privileged instructions to emulation routines
CS @ AU
Henrik Bærbak Christensen
23
Instruction Interpretation
• Emulate Fetch/Decode/Execute pipeline in
software
• Positives
– Easy to implement
– Minimal complexity
• Negatives
– Slow!
CS @ AU
Henrik Bærbak Christensen
24
Example: Virtualizing the Interrupt Flag
w/ Instruction Interpreter
void CPU_Run(void)
{
while (1) {
inst = Fetch(CPUState.PC);
CPUState.PC += 4;
switch (inst) {
case ADD:
CPUState.GPR[rd]
= GPR[rn] + GPR[rm];
break;
…
case CLI:
CPU_CLI();
break;
case STI:
CPU_STI();
break;
}
CPUState.IE = 0;
CPU_Vector(EXC_INT);
}
}
}
void CPU_CLI(void)
{
CPUState.IE = 0;
}
void CPU_STI(void)
{
CPUState.IE = 1;
}
void CPU_Vector(int exc)
{
CPUState.LR = CPUState.PC;
CPUState.PC = disTab[exc];
}
if (CPUState.IRQ
&& CPUState.IE) {
CS @ AU
Henrik Bærbak Christensen
25
Guest OS + Applications
Undef
Instr
MMU
Emulation
CPU
Emulation
vIRQ
I/O
Emulation
Privileged
Page
Fault
Unprivileged
Trap and Emulate
Virtual Machine Monitor
CS @ AU
Henrik Bærbak Christensen
26
The protocol…
• The Linux kernel code contains CLI
– As in user mode, the CPU traps and transfer control to the VVM
– VVM sets internal flag that interrupts are disabled
• Interrupts are now not delivered to the guest OS until it calls the STI
• which again traps and the VVM clears the internal flag!
• From that time on, interrupts are again delivered.
• Trap-and-Emulate:
– All user mode instructions execute full speed
– Privileged instructions are trapped and emulated…
CS @ AU
Henrik Bærbak Christensen
27
Issues with Trap and Emulate
• Not all architectures support it
• Trap costs may be high
– Cf. performance measurements earlier
CS @ AU
Henrik Bærbak Christensen
28
Challenges…
• The x86 is not virtualizable 
– i.e. there are privileged instructions that do not trap!
• Ex.
– POPF = pop CPU flag from stack
• Contains the interrupt flag bit and thus can enable/disable interrupts
without the VVM being notified!
• Require advanced techniques to cope beyond trap-andemulate…
– Para Virtualization
– Binary Translation
CS @ AU
Henrik Bærbak Christensen
29
Para Virtualization
• Para virtualization = replacing non-virtualizable portions
of the original instruction set with easily virtualizable and
more efficient equivalents.
–  OS code has to be ported!
–  User apps run unmodified.
• Ex. Disco: Change MIPS interrupt instruction to
read/write of special memory location in the VVM
–  More efficient than if handled by Trapping
–  Iris OS had to be ported…
CS @ AU
Henrik Bærbak Christensen
30
Binary Translation
• Basic idea:
– Read code blocks before the CPU gets them
– For each instruction classify
• “Ident”: copy directly to translation cache (TC)
• “Inline”: replace instruction by inline equivalent instructions and copy
these to TC
• “Callouts”: replace instruction by call to emulation code in VVM
– Let CPU execute contents of translation cache instead of original
block
• TC: keep the translated block for future execution without
any translation
CS @ AU
Henrik Bærbak Christensen
31
Basic Blocks
Guest Code
vPC
CS @ AU
mov
cli
and
mov
sti
ret
ebx, eax
ebx, ~0xfff
ebx, cr3
Straight-line code
Basic Block
Control flow
Henrik Bærbak Christensen
32
Binary Translation
Guest Code
vPC
CS @ AU
mov
cli
and
mov
sti
ret
ebx, eax
ebx, ~0xfff
ebx, cr3
Translation Cache
mov
call
and
mov
call
call
jmp
Henrik Bærbak Christensen
ebx, eax
HANDLE_CLI
ebx, ~0xfff
[CO_ARG], ebx
HANDLE_CR3
HANDLE_STI
HANDLE_RET
start
33
Binary Translation
Guest Code
vPC
CS @ AU
mov
cli
and
mov
sti
ret
ebx, eax
ebx, ~0xfff
ebx, cr3
Translation Cache
mov
mov
and
mov
call
mov
test
jne
call
jmp
Henrik Bærbak Christensen
ebx, eax
[CPU_IE], 0
ebx, ~0xfff
[CO_ARG], ebx
HANDLE_CR3
[CPU_IE], 1
[CPU_IRQ], 1
start
HANDLE_INTS
HANDLE_RET
34
CPU future…
CS @ AU
Henrik Bærbak Christensen
35
Hypervisor mode
• Modern/Future CPU architectures
– Make a CPU with a Ring -1
– VVM runs in Ring -1, and can run the OS in Ring 0.
– Recent CPUs from Intel and AMD offer x86 virtualization
instructions for a hypervisor to control Ring 0 hardware access.
[…] a guest operating system can run Ring 0 operations natively
without affecting other guests or the host OS.
CS @ AU
Henrik Bærbak Christensen
36