Transcript Document

The 27 Year Old Microkernel
Sebastien Marineau-Mes & Colin Burgess
Agenda

Background and timeline
 Hybrid Software Model
 Anatomy of the microkernel
 How things work – system calls, process manager
 Q&A
All content copyright QNX Software Systems
2
A History of Software Innovation
1980: First commercially
available microkernel OS
1992: First RTOS to
offer built-in faulttolerant networking
1985: First memoryprotected RTOS
1982: First RTOS to
support a hard disk on
a PC
1990: First
POSIXcertified RTOS
1997: First RTOS to
support symmetric
multiprocessing
(SMP)
1994: US patent for
scalable
microkernel
windowing system
2002: First RTOS
vendor to deliver
Eclipse-based IDE
2005: First to offer
“bound” multiprocessing
2007: Introduces hybrid
software model and opens
source code
1980
QNX2
1985
1990
QNX4
1995
All content copyright QNX Software Systems
2000
QNX6
2005
3
Hybrid Software Model

Developer Enablement
> Published source code – runtime components
> Transparent development
 QNX development teams working in the open
 Live check-ins for features and bugfixes

Community Enablement
> Foundry 27 developer portal
> Initial projects: OS, Tools, BSPs, Bazaar

Business Enablement
> Free access to development tools for non-commercial and partners
> Free access to source for development
 Standard business model & pricing for commercial projects
> Ability to create and distribute derivative works
> Flexible contribution model
All content copyright QNX Software Systems
4
Microkernel Architecture
Process
Manager
File
System
Networking
Windowing
Multi-media
Message Bus
µK
Microkernel
Arm, Mips, SH4
PowerPC, Xscale, X86
Microkernel
+
Process Manager
are the only trusted
components
Application
Applications and Drivers
Are processes which plug into a message bus
• Reside in their own memory-protected address
space
• Have a well defined message interface
• Cannot corrupt other software components
• Can be started, stopped and upgraded on the fly

All content copyright QNX Software Systems
5
Separation of Duties – Process
Manager vs. MicroKernel
Microkernel
Process Manager
Messages
Pathname
Threads
Process
Synchronization
Virtual Memory
Scheduling
procfs
Signals
Debug
Timers
Resources
Channels
Loader
Connections
Named Sems
Interrupts
imagefs
procnto
All content copyright QNX Software Systems
6
Microkernel Services

Messages
Simple pre-emptable operations
 Provides basic system services
>
Threads
>
>
Synchronization
Scheduling

Signals
Timers
Channels
Connections
Interrupts
Implements much of the POSIX thread and realtime
standard
Interrupt and exception redirection
IPC primitives
Most of the microkernel is hardware
independent
>
>
CPU-dependant layer for low-level cpu interfaces
CPU-specific optimized routines

Only pieces of code that runs with full
system privilege
 Microkernel does not run “on its own”
>
Only reacts to external events: system calls,
interrupts, exceptions
All content copyright QNX Software Systems
7
Process Manager Services
Process Manager

Implements long, complex
operations and services
> Ex: Process creation and memory
management

Is a multi-threaded process that is
scheduled at normal priority
> Competes for CPU with all other threads
in the system

Message driven server
 More on this later
Pathname
Process
Virtual Memory
procfs
Debug
Resources
Loader
Named Sem
imagefs
All content copyright QNX Software Systems
8
Procnto Source Layout
/services/system/
ppc
sh
x86
pathmgr
procmgr
memmgr
arm
mips
…
Process
lifecyle
management
mips
Support
Functions
arm
proc
Pathname
management
ker
Looking for source code? Go to www.foundry27.com -> Projects ->
Core Operating System -> Source Guide
All content copyright QNX Software Systems
9
Kernel call operations sequence
Kernel entry
Entry
Interrupts off
Unlocked
Kernel
Operation
which
may
include
message
pass
Locked
Pre-emptable
No pre-emption
Interrupts on
Unlocked
Pre-emptable
Exit
Interrupts off
Kernel exit
All content copyright QNX Software Systems
10
Anatomy of a Kernel Call

A user-mode thread makes a call to a system call stub located in libc
>

Ex: MsgSendvnc()
The system call stub executes a TRAP instruction (or whatever instruction is
appropriate for the particular hardware).
MsgSendvnc:
lw $8,16($29)
addiu
$2,$0,12
syscall
jr $31
nop



The processor changes privilege state, interrupts are disabled
Execution resumes at the appropriate vector
services/system/ker/<cpu>/kernel.s
>
One of the few pieces of code that is all assembly – see __ker_entry, __ker_sysenter
/*
* r4k_syscall_handler()
* Streamlined path for our most common operation--kernel calls
*/
FRAME(r4k_syscall_handler,sp,0,ra)
.set
noat
/*
* Coming from user mode. Save user registers, and get
* a fresh kernel stack. Move GP to our own short data
* area.
*/
LD_ACTIVE_AND_KERSTACK(k0,k1)
addiu
k0,k0,REG_OFF
SAVE_REGS(1)
All content copyright QNX Software Systems
11
Kernel Entry
/*
* r4k_syscall_handler()
* Streamlined path for our most common operation--kernel calls
*/
FRAME(r4k_syscall_handler,sp,0,ra)
/*
* Coming from user mode. Save user registers, and get
* a fresh kernel stack. Move GP to our own short data
* area.
*/
LD_ACTIVE_AND_KERSTACK(k0,k1)
System call entry
Load kernel stack
Save thread context (register set)
SAVE_REGS(1)
Acquire the kernel lock
ACQUIRE_KERNEL(INKERNEL_NOW,zero,1)
• On uni processor, atomically set the INKERNEL_NOW bit
• On SMP systems, spinlock on INKERNEL_NOW
Enable Interrupts
Transfer to kernel call implementation
• services/system/ker_call_table.c
/*
* Interrupts are now OK again
*/
STI
/*
* Kernel call number should still be intact in v0.
* Save the kernel call number.
*/
sw
v0,SYSCALL(s0)
#if defined(VARIANT_instr)
la
t1,_trace_call_table
#else
la
t1,ker_call_table
#endif
/*
* Index the call table and run the C code
*/
All content copyright QNX Software Systems
12
Kernel Function Implementation
Entry from kercall table
int kdecl
ker_timer_create(THREAD *act, struct kera
*kap) {
VALID_CLOCKID(kap);
Validate parameters
if(kap->event) {
Verify pointers referenced by kernel are valid
RD_VERIFY_PTR(act, kap->event, si
• RD_PROBE/WR_PROBE functions
RD_PROBE_INT(act, kap->event, siz
• RD_VERIFY_*/WR_VERIFY_* functions
}
• If addresses are no accessible, a fault will be
generated and kernel call will return with EFAULT
All done up-front work, ready to do the real
work
prp = act->process;
…
It’s very important to get the validation right, as a fault (due to invalid or
malicious parameter passed in to call) could be catastrophic
All content copyright QNX Software Systems
13
Microkernel Pre-emption
Kernel entry
Entry
Interrupts off
Unlocked
Kernel
Operation
which
may
include
message
pass
Locked


Kernel call preemption is important
Interrupt activity may READY a
higher priority thread to run while a
kernel call is in progress
 We want to immediately schedule
this higher priority THREAD
(minimize scheduling latency)
 QNX does this in a novel way – preemptable kernel
Pre-emptable
>
>
Defer changing global kernel state
Implementation of kernel ops is 2 stages:
do the work followed by a “commit”
On preemption, the active thread’s
IP is backed up to re-execute the
SYSENTER instruction
 Allows us to only have one kernel
stack – not one per thread
 Any memory
references/calculations done
before locking kernel must be restartable

No pre-emption
Interrupts on
Unlocked
Pre-emptable
Exit
Interrupts off
Kernel exit
All content copyright QNX Software Systems
14
Lock Kernel
Most of the preperatory work is done before locking kernel, if possible.
int kdecl ker_sched_get(THREAD *act, struct kerargs_sched_get *kap) {
PROCESS *prp;
THREAD *thp;
// Verify the target process exists.
if((prp = (kap->pid ? lookup_pid(kap->pid) : act->process)) == NULL)
return ESRCH;
// Verify the target thread exists.
if((thp = (kap->tid ? vector_lookup(&prp->threads, kap->tid-1) : act)) == NULL)
return ESRCH;
// Verify we have the right to examine the target process
if(!kerisusr(act, prp))
return ENOERROR;
Argument Verification
if(kap->param) {
WR_VERIFY_PTR(act, kap->param, sizeof(*kap->param));
WR_PROBE_OPT(act, kap->param, sizeof(*kap->param) / sizeof(int));
kap->param->sched_curpriority = thp->priority;
kap->param->sched_priority = thp->real_priority;
}
User Pointer Verification
Lock kernel to change status
lock_kernel();
SETKSTATUS(act,thp->policy);
return ENOERROR;
}
All content copyright QNX Software Systems
15
Exit Kernel

Exit kernel to run user-space thread
> Note that currently scheduled thread may have changed
 Ex: entered kernel due to HW interrupt, interrupt readied highprio thread (that high-prio thread becomes RUNNING)
 Ex: Blocking kernel causes current thread to be blocked,
another to be made RUNNING
> __ker_exit implements this

Adjust the address space if needed
> memmgr.aspace()

Do special return processing
> Deliver signals, pulses etc
> This may cause a reschedule which could cause another loop
through __ker_exit

Restore the context of the (newly) active thread
 Call SYSEXIT
All content copyright QNX Software Systems
16
What about “Non Kernel” System
Calls?

In many cases, traditional UNIX system calls are not
implemented by the micro-kernel on QNX.
> They are implemented in the process manager or in external
servers that extend procnto

In general, many of the lengthier core POSIX
operations are done by the process manager
All content copyright QNX Software Systems
17
Process Manager

First process in system
> Created by kernel (init_objects)

Provides core services to other processes
 Multi-threaded Process
> First <ncpus> threads are IDLE threads
> Additional threads are threadpool worker threads


Message driven server
Actually a collection of (almost) independent servers
 4 message handlers
 11(!) resource managers
> These resource managers are actually mini filesystems.
All content copyright QNX Software Systems
18
Process Manager
Message Handlers
Resource Managers (pathmgr/*)
proc/rsrcdbmgr_*
/dev/mem
proc/sysmgr_*
/dev/null
procmgr/*
/dev/text
/dev/tty
SYSMGR_COID
/dev/zero
/dev/shmem
/dev/tymem
memmgr/*
/dev/sem
/proc/boot
/proc
/
All content copyright QNX Software Systems
19
Process Manager
Normal process… but
 It has certain privileges

> Executes at higher processor privilege level
 This varies depending on processor architecture
> Executes in kernel address space
 Not quite true
 Because proc’s address space and user address spaces don’t
overlap, it may adopt a users address space. This makes for
faster message passes between proc and user applications.
> Has permission to use __Ring0() kernel call
All content copyright QNX Software Systems
20
Process Manager

__Ring0 Kernel Call
> Used by proc when it needs to execute code in the kernel context
 Mostly used when manipulation of kernel structures is required
 Provide atomicity of kernel state modifications to ensure
consistency
> Occasionally used when processor privilege is required
 Ex: manipulate privileged CPU registers
> Arguments are a function pointer and a data pointer
 Remember process manager shares address space with kernel
> _NTO_PF_RING0 flag needed to use this kernel call
> Only process manager has this flag set
All content copyright QNX Software Systems
21
Process Manager
Example

The process manager implements many services
which would actually be “kernel calls” in traditional
UNIX
 Example – mmap()
> mmap() is the API through which all mappings are setup by user
processes
> Malloc uses mmap() to allocate heap memory, also known as
“anonymous” memory, since it is not a mapping of a named object.
> Not implemented as a kernel call, but rather a message that is sent
to the process manager
All content copyright QNX Software Systems
22
mmap()
void *_mmap(void *addr, size_t len, int prot,
int flags, int fd, off64_t off, unsigned
align, unsigned preload, void **base,
size_t *size) {
mem_map_t
msg;
Type of operation
Parameters
Send message to procnto
requesting operation be done
msg.i.type = _MEM_MAP;
msg.i.zero = 0;
msg.i.addr = (uintptr_t)addr;
msg.i.len = len;
msg.i.prot = prot;
msg.i.flags = flags;
msg.i.fd = fd;
msg.i.offset = off;
msg.i.align = align;
msg.i.preload = preload;
msg.i.reserved1 = 0;
if(MsgSendnc(MEMMGR_COID, &msg.i, sizeof
msg.i, &msg.o, sizeof msg.o) == -1) {
return MAP_FAILED;
}
All content copyright QNX Software Systems
23
memmgr_handler()

The _MEM_MAP message type is picked up and passed to the
memmgr message handler
switch(msg->type) {
…
case _MEM_MAP:
proc_wlock_adp(prp);
status = memmgr_map(ctp, prp, &msg->map);
break;
case _MEM_CTRL:
proc_wlock_adp(prp);
status = memmgr_ctrl(prp, &msg->ctrl);
break;
All content copyright QNX Software Systems
24
Process Manager
User Process
Process Manager
malloc()
mmap()
MsgSendv()
return msg.o.addr;
_MEM_MAP
MsgReceivev()
memmgr_map()
vmm_mmap()
map_create()
pa_alloc()
pte_manipulate()
MsgReplyv()
All content copyright QNX Software Systems
25
Other Process Manager Services

Creating processes!
 The spawn() send a _PROC_SPAWN message to
create a new process
 The exec() ‘system call’ is actually a spawn message
with an SPAWN_EXEC flag set!
 The fork() ‘system call’ is a _PROC_FORK message
 Procfs debug filesystem
> Similar to unix procfs
> Used by debugger/pidin/ps
All content copyright QNX Software Systems
26
Ongoing kernel development

The kernel team is currently working on our next release
> Codename “trinity 2”

Features include:
> Memory management enhancements such as variable page support
(~15% improvement in system performance)
> POSIX PSE52 certification
> PPC 9xx processor support
> ARMv6 support
> Cross-endian QNET capabilities

Trinity 2 is currently feature complete –
bugfixing/release process underway
> Builds available on foundry27:
http://community.qnx.com/sf/wiki/do/viewPage/projects.core_os/wiki/Trinity2
All content copyright QNX Software Systems
27
Roadmap – QNX Source
Postings
Source Bundle
Release Date
Description
Networking
Nov 2007
Next Generation Networking stack, protocols, drivers
(io-pkt)
Block Filesystems
March 2008
Block Filesystems and Utilities
Flash Filesystems
March 2008
Flash (NOR/NAND) Filesystems and Utilities
Network Filesystems
March 2008
Block/Flash/Network Filesystems and Utilities
Devices and Drivers
June 2008
Serial, Audio, USB, PCI frameworks and drivers
System Services
June /2008
Additional system service managers
Window Systems
Sept 2008
High level Photon server and services
Graphics System
Sept 2008
Lower level graphics libraries and drivers
Multimedia
Nov 2008
Full multimedia stack
All content copyright QNX Software Systems
28
Brainteasers

Need something to chew on?
> Try to figure out the questions below and post the answer to the
OS_Tech forum on the OS project
> Prize for the first to answer each question
> QNX employees not eligible 
1. What does STI expand to in the MIPS kernel?
2. NEED_PREEMPT(act) checks queued_event_priority
What sets queued_event_priority?
3. In the memmgr message handler, what is the
purpose of “proc_wlock_adp(prp);”?
All content copyright QNX Software Systems
29
Want to learn more?
 Check
out the projects on www.foundry27.com
 Download the QNX Momentics suite on
www.qnx.com
 Download the microkernel source from the QNX
operating system project
 Read the tech articles and wiki pages (linked off
the project)
 Participate in the forums on the QNX operating
system project
All content copyright QNX Software Systems
30
Questions?