Transcript lesson20

Why can’t we do ‘raw’ I/O?
How the x86 stops user-programs
from directly controlling devices,
and how we devise a ‘workaround’
x86 Privilege Levels
• For multiple users doing multiple tasks in a
manner that affords each some ‘protection’
against inteference by others, any modern
CPU will implement two or more separate
levels of ‘privilege’ for its operations -- an
‘unrestricted privileges’ arena for the code
in its Master Control Program (its ‘kernel’),
and a ‘restricted privileges’ realm for code
in users’ application programs
Four Privilege Rings
Ring 3
Least-trusted level
Ring 2
Ring 1
Ring 0
Most-trusted level
Suggested purposes
Ring0: operating system kernel
Ring1: operating system services
Ring2: custom extensions
Ring3: ordinary user applications
Unix/Linux and Windows
Ring0: operating system
Ring1: unused
Ring2: unused
Ring3: application programs
IOPL
• The Intel x86 processor includes a way to
either allow or prohibit accesses to system
peripheral devices by code that executes
in the various ‘privilege rings’, by utilizing a
2-bit field within the x86 FLAGS register
which controls whether or not ‘in’ and ‘out’
are allowed to execute – the field is known
as the I/O Privilege Level field, and Linux
normally sets its value to be zero
The x86 API registers
RAX
RSP
R8
R12
RBX
RBP
R9
R13
RCX
RSI
R10
R14
RDX
RDI
R11
R15
CS
DS
RIP
ES
FS
GS
RFLAGS
Intel Core-2 Quad processor
SS
The FLAGS register
Status-flags
13
0
N
T
12
IOPL
O
F
D
F
I
F
T
F
S
F
Z
F
0
A
F
0
P
F
1
C
F
Control-flags
Legend:
ZF = Zero Flag
SF = Sign Flag
IOPL = I/O Privilege Level
CF = Carry Flag
NT = Nested Task
PF = Parity Flag
TF = Trap Flag
OF = Overflow Flag
IF = Interrupt Flag
AF = Auxiliary FlagDF = Direction Flag
‘seeflags.cpp’
• This demo-program allows us to view the
settings of bits in the RFLAGS register –
and the IOPL-field in particular (bits 13,12)
• When IOPL == 0, only ring0 code will be
able to execute ‘in’ and ‘out’ instructions
• When IOPL == 3, then code executing in
any of the rings will be able to execute I/O
• So – let’s change IOPL to 3 – but how?
‘pushfq’/’popfq’
• An idea suggested by the ‘inline’ assembly
language in our ‘seeflags.cpp’ demo would
be to just ‘pop’ a suitably designed value
from the stack into the RFLAGS register
• But the CPU is not about to allow that if it’s
currently executing ring3 code while IOPL
is set to 0 – that would compromise the
system’s intended ‘protection’
Must do it from ring0!
• Our classroom’s Linux systems will allow
us to install our own code-module, as an
‘add-on’ to the running kernel, and such
code could therefore be executed without
any restrictions (i.e., at ring0)
• This idea motivates us to explore briefly
the programming ideas needed for writing
our own LKM (Linux Kernel Module)
A module’s organization
The module’s ‘payload’
function
my_info
module_init
The module’s two required
administrative functions
module_exit
Our ‘newproc.cpp’ utility
• The type of LKM that creates a pseudo-file
in the ‘/proc’ directory, there is a ‘skeleton’
of C-language code we can start from, and
then add our own specific functionality to
that skeleton-code
• You can quickly create this ‘skeleton’ file
by using our ‘newproc.cpp’ utility-program
Software interrupts
• One way a user-program, which normally
executes in ring3, to switch to ring0 (if it’s
allowed) is by using a ‘software interrupt’
• This is how the 32-bit version of Linux did
its various system-calls, with ‘int $0x80’
• We can craft an LKM whose ‘payload’ is
an interrupt service routine that would be
able to change the IOPL from 0 to 3
Systems programming
• To accomplish this design-idea, we’ll need
an understanding of our CPU’s interrupt
mechanism, including some special datastructures located in kernel memory and
some special CPU registers which allow
the CPU to locate those data-structures
Descriptor Tables
Special processor registers
used by CPU for locating
its Descriptor Tables within
the system’s memory
Interrupt Descriptor Table
(256 Gate Descriptors)
Global Descriptor Table
(Segment Descriptors)
GDT
GDTR
IDTR
IDT
IDT Descriptor-format
32-bits
reserved (=0)
3
offset 63..32
2
offset 31..16
segment selector
P
D
P
L
0
gate
type
00000
offset 15..0
LEGEND: segment-selector (for the handler’s code-segment)
offset within code-segment to handler’s entry-point
gate-type (0xE = Interrupt Gate, 0xF = Trap Gate)
IST = Interrupt Stack Table (0..7)
P = Present (1 = yes, 0 = no)
I
S
T
1
0
IDTR register-format
80-bits
IDTR:
Base-Address of the IDT segment (64-bits)
segment
limit
Special processor instructions are used to ‘load’ this 10-byte register
from a memory-image (‘LIDT’), or to ‘store’ this register’s value (‘SIDT’)
The ‘LIDT’ instruction can only be executed by code running in Ring0,
but the ‘SIDT’ can be executed by code running at any privilege level.
Stack layout after an interrupt
64-bits
SS
RSP
32(%rsp)
24(%rsp)
RFLAGS
16(%rsp)
CS
8(%rsp)
RIP
RSP0
0(%rsp)
Ring0 stack
Our interrupt-9 handler
Our ‘iokludge.c’ kernel module uses this ‘inline’ assembly language to
generate the machine-code for handling an interrupt-9, which merely
sets the IOPL-field (in the saved image of the RFLAGS register) to 3,
and then resumes execution of the interrupted application program.
//-------------------- INTERRUPT SERVICE ROUTINE ----------------void isr_entry( void );
asm(“ .text
“);
asm(“ .type
isr_entry, @function
“);
asm(“isr_entry:
“);
asm( orq
$0x3000, 16(%rsp)
“);
asm( iretq
“);
//--------------------------------------------------------------------------------------
Core-2 Quad system
Intel Core-2 Quad processor
CPU
0
CPU
1
CPU
2
system
memory
CPU
3
system bus
I/O
I/O
I/O
I/O
I/O
‘smp_call_function()’
• This Linux kernel ‘helper’ routine allows a CPU
to request all other CPUs to execute a specified
subroutine of type: void function( void *info );
• In our current Linux kernel (vers. 2.6.26.6) this
helper-routine takes four arguments:
–
–
–
–
The address of the subroutine’s entry-point
The address of data the subroutine needs
A flag that indicates whether or not to ‘retry’
A flag that indicates whether or not to ‘wait’
• (Note: Newer kernels omit the ‘retry’ argument)
Working with LKM’s
• Create an LKM skeleton using ‘newproc’
• Compile an new LKM using ‘mmake’
• Install an LKM’s compiled ‘kernel object’
using the Linux ‘/sbin/insmod’ command
• Remove an LKM from the running kernel
using the Linux ‘/sbin/rmmod’ command
‘iokludge.c’
module_init:
1) Allocate a kernel memory page, to be used as a new Interrupt Descriptor Table
2) Save original contents of system register IDTR, so it can be restored later
3) Prepare a memory-image for the new value of register IDTR, referring to kpage
4) Setup pointers ‘oldidt’ and ‘newidt’ and copy the original IDT to our new page
5) Setup a Gate-Descriptor, to be installed as Gate 9 in our new IDT array
6) Activate the new Interrupt Descriptor Table on all the processors in our system
7) Return 0, to indicate a successful module-installation
module_exit:
1) Restore the original value to register IDTR in each of our system’s processors
2) Free the page of kernel memory that was previously allocated for use as an IDT
‘tryiopl3.cpp’
• This demo-program is a modification of
our earlier ‘seeflags.cpp’ example – but
here we included the software interrupt
instruction ‘int $9’ which, if ‘iokludge.ko’
has been installed, will allow us to check
that indeed the RFLAGS register’s IOPL
has been changed from 0 to 3 – thereby
permitting ‘in’ and ‘out’ to be executed!
Homework exercise
• Modify the ‘82573pci.cpp’ program that we
weren’t able to execute, even with ‘sudo’,
at our previous class meeting, replacing its
call to Linux’s ‘iopl()’ library-function by the
‘inline’ assembly language statement for
software interrupt 9, i.e. asm(“ int $9 “);
• Then try again to compile and execute our
‘82573.cpp’ demo-program, only this time
with our ‘iokludge.ko’ LKM installed 