Transcript UNIX

SMP, 64bit Unix and
Kernel Compilation
Guntis Barzdins
Girts Folkmanis
Kā palaist MPI
[guntisb@zars
total 392
-rw-rw-r--rw-rw-r--rw-rw-r--rw-rw-r--rwxrwxr-x
-rw-rw-r--rw-rw-r-[guntisb@zars
localhost:4
mpi]$ ls -l
1 guntisb
1 guntisb
1 guntisb
1 guntisb
1 guntisb
1 guntisb
1 guntisb
mpi]$ more
guntisb
guntisb
guntisb
guntisb
guntisb
guntisb
guntisb
mfile
122
13
344
2508
331899
3408
2132
Apr
May
May
Apr
May
Apr
May
28
17
12
28
17
28
17
07:08
14:33
09:28
07:08
14:48
07:08
14:48
Makefile
mfile
mpi.jdl
mpi.sh
passtonext
passtonext.c
passtonext.o
[guntisb@zars mpi]$
[guntisb@zars mpi]$ make
mpicc passtonext.c -o passtonext -lmpich -lm
[guntisb@zars mpi]$ mpirun -np 2 -machinefile mfile passtonext
guntisb@localhost's password:
Nodename=zars.latnet.lv Rank=0 Size=2
INFO: zars.latnet.lv (0 of 2) sent 73 value to 1 of 2
INFO: zars.latnet.lv (0 of 2) received 74 value from 1 of 2
Nodename=zars.latnet.lv Rank=1 Size=2
INFO: zars.latnet.lv (1 of 2) received 73 value from 0 of 2
INFO: zars.latnet.lv (1 of 2) sent 73+1=74 value to 0 of 2
[guntisb@zars mpi]$
Paldies Jānim Tragheimam!!
3b mājas darba variants
 Ja ir izpildits 3a mājas darbs, tad 3b daļā var
uzrakstīt un notestēt pats savu MPI programmu,
kas rēķinās ar vismaz 4 procesiem.
MPI programmu drīkst nedarbināt Gridā!
Message-passing performance comparison
4000
raw TCP
LAM/MPI
1 us
3500
500
MP_Lite
400
MPICH
LAM/MPI
300
MPI/Pro
200
PVM
100
0
1
100
10,000
1,000,000
Message size in Bytes
GigabitEthernet MPI performance
Throughput in Mbps
Throughput in Mbps
600
MP_Lite
3 us
3000
2500
PVM
33 us
2000
1500
MPICH
25 us
1000
500
0
1
100
10,000
1,000,000
Message size in Bytes
SMP MPI performance
AMD Opteron 800 HPC
Processing Node
HPC Strengths
 Flat SMP like Memory Model:
 All four reside with the same 248
memory map
 Expandable to 8P NUMA
 Glue-less Coherent multi-
processing:
 low Latency and high Bandwidth
~1600M T/sec (6.4 GB/s)
 32GB of High B/W external
memory bus (>5.3GB/sec.)
 Native high B/W memory map
I/O (>25Gbits/sec.)
Sufficiently Uniform Memory
Organization (SUMO)
 Advantages
• Software view of memory is SMP
 Latency difference between local & remote memory
is a function of the number of processors in the node
 1P and 2P look like a SMP machine
 3P and 4P are NUMA like but can still be viewed as a
ccUMA or asymmetric SMP node
 >4P can be viewed as ccUMA and depending on CACHE
hit rate, may or may not required NUMA aware OS
• Physical address space is flat and can be
viewed as fully coherent or not (MOEIS state)
• DRAM can be contiguous or interleaved
• Additional processor nodes bring true
increased memory bandwidth
• Designed for lower overall system
chip count (glue-less interface)
 Disadvantages
•3P and 4P nodes work better if the OS is “aware” of the memory map
•>4P may require a NUMA (Non-uniform memory architecture) aware OS if the CACHE
hit rate is low
•2.6.9 kernel needed to take full advantage of NUMA architecture of Opterons
Future NUMA Systems
Scaling beyond 8 Processor
Interconnect Fabric
SW0
4
P
4
P
SW1
4
P
4
P
SW2
4
P
4
P
SW3
4
P
• Scaling beyond 8P is enabled
• External Coherent HyperTransport switch
Coherent Interconnect
 Snoop filter
 Data caching
• Up to 16 processors within the same 240 SPM
memory space
4
P
SW2
4
P
4
P
SW3
4
P
4
P
SW2
4
P
4
P
SW3
4
P
4
P
High Density HPC Cluster
SprayCool Technology from ISR
• 16 cards
• 16G-flops/card
• 256G-flops peak throughput
• 64GB of memory per card
• 1TerraByte of sys. Memory
• 240 cubic inches
16”
10”
 114M-flops/cubic inch
 4.27GB of memory storage
cubic inch
• ~6K watts
14”

~3 watts/cubic inch
AMD Opteron
Beowulf 4P SMP Processing Node
To AMD 8131 Tunnel
To AMD 8131 Tunnel
200-333MHz
9 byte Reg. DDR
One 4P SMP node
AMD Opteron™
• 16G-flops
• 32GB DRAM
200-333MHz
9 byte Reg. DDR
AMD Opteron
8GB DRAM
200-333MHz
9 byte Reg. DDR
200-333MHz
9 byte Reg. DDR
AMD Opteron
AMD Opteron
8-G DRAM
PCI
Graphics
Legacy PCI
FLASH
LPC
SIO
16x16 HyperTransport @
1600MT/s
AMD-8111TM
I/O Hub
10/100 Phy
AMD-8131TM
PCI-X Tunnel
Management
100 BaseT
Management LAN
PCI-X
VGA
USB1.0
AC97
UDMA133
MII
PCI-X
• 10GB/sec. Memory BW
8GB DRAM
Single Instruction, Multiple Data (SIMD)
Extending 32-bit instruction set
 Intel and AMD scheme very
similar
 48-bit virtual address space
 64-bit General Purpose
Registers
 Support 64-bit addressing
and integer math
 Eight extra GPR added
 Eight extra XMM added
 Difference—EM64T supports
SSE3 instructions, Opteron
has 3DNow!
Status of 64bit Linux
 64-bit Linux OS is a stable operating platform
 Opteron CPU and associated platforms have sufficient
reliability
 Opteron CPU gives slightly better performance for
significantly less power draw as Xeon.
 Using 64-bit compilation and optimization can lead to
significant performance gains on AMD and Intel.
Increased Memory for 32-bit
Applications
Virtual
Memory
32-bit server, 4 GB DRAM
• OS & App share small
32-bit VM space
• 32-bit OS & applications
all share 4GB DRAM
• Leads to small dataset
sizes & lots of paging
0 GB
2 GB
4 GB
64-bit server, 12 GB DRAM
• App has exclusive use of
32-bit VM space
• 64-bit OS can allocate
each application large
dedicated portions
of 12GB DRAM
• OS uses VM space way
above 32-bits
• Leads to larger dataset
sizes & reduced paging
0 GB
32-bit
App
32-bit
OS
4GB
DRAM
Shared
Virtual
Memory
32-bit
App
0 GB
2 GB
32-bit
OS
4 GB
Virtual
Memory
12GB
DRAM
Virtual
Memory
32-bit
App
Not
shared
32-bit
App
4 GB
0 GB
4 GB
Not
shared
256 TB
64-bit
OS
Not
shared
64-bit
OS
256 TB
Compilers
 Opteron 250



Legacy executable, i386: 1290 VUPS
Gcc 3.4.2 optimized: 2440 VUPS
Pathscale compiler: 2677 VUPS
 Xeon 3.6




Legacy executable, i386: 1386 VUPS
Gcc 3.4.2 optimized: 2309 VUPS
Intel 8.1 compiler:2910 VUPS
Intel 8.1 compiler with profile feedback: 4332 VUPS
 Intel Fortran and C 8.1 uses SSE3 instructions to optimize, makes it
incompatible with Opterons.
 For comparison PentiumIII 1.0 GHz=568 VUPS.
64-bit Compilers
Computing Strategy: x86-64
 Legacy: 32-bit OS

Both AMD Athlon 64 and AMD Opteron processors run any 32-bit legacy O/S

Compatible all legacy Drivers, OS & BIOS

No application recompile required, no emulation layer
 64-bit OS
 Desired applications can be written/ported to leverage the full 64-bit
capabilities of x86-64
 Migrate only where warranted, and at the user’s pace
 32-bit applications run under 64-bit OS
 BIOS is standard x86 32-bit code.
 Transfer to 64-bit operation occurs under OS load/startup control
 64-bit mode does not use segmentation - Flat addressing
Compatibility Thunking Layer
64-bit Process
64-bit Process
IA32 Application
AMD64 Application
Thunking Layer
USER
KERNEL
AMD64 Operating System
AMD64 Device Drivers
64-bit OS & Application
Interaction
USER
32-bit Compatibility Mode
 64-bit OS runs existing 32-bit APPs with leading
edge performance
 No recompile required, 32-bit code directly
executed by CPU
 64-bit OS provides 32-bit libraries and “thunking”
translation layer for 32-bit system calls.
64-bit Mode
 Migrate only where warranted, and at the user’s
pace to fully exploit AMD64
 64-bit OS requires all kernel-level programs &
drivers to be ported.
 Any program that is linked or plugged in to a 64bit program (ABI-level) must be ported to 64-bits.
32-bit thread
64-bit thread
32-bit
Application
64-bit
Application
4GB expanded
address space
512GB (or 8TB)
address space
Translation
64-bit Operating
System
64-bit Device Drivers
KERNEL
Problems 64/32
 In 64bit Linux you can't run binary only programs which are compiled for IA32 or
applications which haven't been ported to AMD64 yet (e.g. OpenOffice.org). This
is because you can't mix 32bit applications and 64bit libraries. You would also
need the 32bit versions of the libraries to run a 32bit application.
 Multiarch



architectures like sparc64 or powerpc64, which provide lib for default 32bit libraries
and lib64 for extra 64bit libraries, default to executing 32bit applications
amd64 defaults to 64bit binaries because of the performance benefits it offers in
64bit mode.
Thus, not wanting to rewrite virtually every binary-arch package's creation rules to
install libs in lib64 not lib, and wanting to find a solution for all multiarch capable
platforms, various people are working on so-called multiarch support.
Sun Solaris 64bit
The Solaris Operating System supports both 32-bit and 64-bit hardware.
Customers with 32-bit hardware can run the Solaris Operating System and
take advantage of the many features in the Solaris Operating System that are
not explicitly related to 64-bits (e.g., dynamic reconfiguration, scalability
enhancements, performance improvements). Customers can run a 32-bit
application on 64- or 32-bit hardware with the Solaris Operating System
without any change to the application.
Note that Solaris for x86, prior to Solaris 10, supports only a 32 bit kernel.
MacOS X xcode
At the heart of Xcode 2.0 is Apple’s version of gcc 3.5, the next generation of the
industry-standard gcc compiler. The new compiler helps you get more performance
from your existing code by using a number of advanced optimization techniques. Autovectorization, a technique borrowed from the world of supercomputing, helps you to
unlock the power of the Velocity Engine in every PowerPC G4 and G5 system without
writing vectorized code.
With the new 64-bit support in Mac OS X Tiger, Xcode gives you the ability to create
applications such as computation and rendering engines that use 64-bit memory
addressing. This is ideal for data-intensive applications, which can run faster by
accessing data in memory, rather than via disk access. Xcode gives you the tools for
building and debugging 64-bit applications for PowerPC G5 and Mac OS X Tiger, as
well as letting you create Fat Binaries that contain both 32-bit and 64-bit executables.
Kernel Overview
Kernel source
 Download source from www.kernel.org or Mirrors.
 Unpack:
Version



cd /usr/src
tar xzvf linux-<Version>.tar.gz
ln –s linux-<Version>.tar.gz linux
 Source-Root:

/usr/src/linux
Patchlevel
Sublevel
Extralevel
2.6.20-pre4
Kernel source tree
.
|-|-|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|-|
|
|
|
|
|
|
|
|
|
Documentation
arch
|-- alpha
|-- arm
|-- i386
|
|-- boot
|
|
|-- compressed
|
|
`-- tools
|
|-- kernel
|
|-- lib
|
|-- math-emu
|
`-- mm
|-- ia64
|-- m68k
|-- ppc
|-- ppc64
|-- sparc
|-- sparc64
|-- um
`-- x86_64
drivers
|-- acpi
|-- atm
|-- block
|-- bluetooth
|-- cdrom
|-- char
|-- hotplug
|-- ide
|-- ieee1394
|-- input
|
|
|
|
|
|
|
|
|
|
|
|
|-|
|
|
|
|
|
|
|
|
|
|
|
|-|
|
|
|
|
|
|-- isdn
|-- macintosh
|-- net
|-- parport
|-- pci
|-- pcmcia
|-- pnp
|-- scsi
|-- sgi
|-- sound
|-- usb
`-- video
fs
|-- autofs
|-- ext2
|-- ext3
|-- fat
|-- isofs
|-- minix
|-- msdos
|-- ntfs
|-- reiserfs
|-- smbfs
|-- udf
`-- vfat
include
|-- asm -> asm-um
|-- asm-alpha
|-- asm-arm
|-- asm-i386
|-- asm-ia64
|-- asm-m68k
|
|
|
|
|
|
|
|
|
|
|
|
|-|-|-|-|-|-|
|
|
|
|
|
|
|
|
|
|
|
|
`--
|-- asm-ppc
|-- asm-ppc64
|-- asm-sparc
|-- asm-sparc64
|-- asm-um
|-- asm-x86_64
|-- linux
|-- math-emu
|-- net
|-- pcmcia
|-- scsi
`-- video
init
ipc
kernel
lib
mm
net
|-- 802
|-- appletalk
|-- atm
|-- bluetooth
|-- core
|-- ethernet
|-- ipv4
|-- ipv6
|-- ipx
|-- irda
|-- packet
|-- unix
`-- x25
scripts
Linux Source Tree Layout
init
arch
drivers
alpha
arm
i386
ia64
m68k
mips
mips64
ppc
s390
sh
sparc
sparc64
acorn
atm
block
cdrom
char
dio
fc4
i2c
i2o
ide
ieee1394
isdn
macintosh
misc
net
…
scripts
/usr/src/linux
Documentation
fs
adfs
affs
autofs
autofs4
bfs
code
cramfs
devfs
devpts
efs
ext2
fat
hfs
hpfs
…
ipc
kernel
net
lib
mm
include
asm-alpha
asm-arm
asm-generic
asm-i386
asm-ia64
asm-m68k
asm-mips
asm-mips64
linux
math-emu
net
pcmcia
scsi
video …
adfs
affs
autofs
autofs4
bfs
code
cramfs
devfs
devpts
efs
ext2
fat
hfs
hpfs …
802
appletalk
atm
ax25
bridge
core
decnet
econet
ethernet
ipv4
ipv6
ipx
irda
khttpd
lapb
…
size
directory
entries
files
loc
Sizes
(linux-2.4.0-test2)
/usr/src/linux/
19
7645
2.6M
90M
4.5M
16.5M
54M
5.6M
14.2M
28K
120K
332K
80K
356K
5.8M
400K
Documentation
arch
drivers
fs
include
init
ipc
kernel
lib
mm
net
scripts
97
12
31
70
19
2
6
25
8
19
33
26
380
1685
2256
489
2262
2
6
25
8
19
453
42
na
466K
1.5M
150K
285K
1K
4.5K
12K
2K
12K
162K
12K
linux/arch
 Subdirectories for each current port.
 Each contains kernel, lib, mm, boot and other
directories whose contents override code stubs in
architecture independent code.
 lib contains highly-optimized common utility routines
such as memcpy, checksums, etc.
 arch as of 2.4:


alpha, arm, i386, ia64, m68k, mips, mips64.
ppc, s390, sh, sparc, sparc64.
linux/drivers
Largest amount of code in the kernel tree (~1.5M).
device, bus, platform and general directories.
drivers/char – n_tty.c is the default line discipline.
drivers/block – elevator.c, genhd.c, linear.c, ll_rw_blk.c, raidN.c.
drivers/net –specific drivers and general routines Space.c and
net_init.c.
 drivers/scsi – scsi_*.c files are generic; sd.c (disk), sr.c (CD-ROM),
st.c (tape), sg.c (generic).
 General:






cdrom, ide, isdn, parport, pcmcia, pnp, sound, telephony, video.
 Buses – fc4, i2c, nubus, pci, sbus, tc, usb.
 Platforms – acorn, macintosh, s390, sgi.
linux/fs
 Contains:


virtual filesystem (VFS) framework.
subdirectories for actual filesystems.
 vfs-related files:






exec.c, binfmt_*.c - files for mapping new process images.
devices.c, blk_dev.c – device registration, block device
support.
super.c, filesystems.c.
inode.c, dcache.c, namei.c, buffer.c, file_table.c.
open.c, read_write.c, select.c, pipe.c, fifo.c.
fcntl.c, ioctl.c, locks.c, dquot.c, stat.c.
linux/include
 include/asm-*:

Architecture-dependent include subdirectories.
 include/linux:



Header info needed both by the kernel and user apps.
Usually linked to /usr/include/linux.
Kernel-only portions guarded by #ifdefs



#ifdef __KERNEL__
/* kernel stuff */
#endif
 Other directories:

math-emu, net, pcmcia, scsi, video.
linux/init
 Just two files: version.c, main.c.
 version.c – contains the version banner that prints
at boot.
 main.c – architecture-independent boot code.
 start_kernel is the primary entry point.
linux/ipc
 System V IPC facilities.
 If disabled at compile-time, util.c exports stubs
that simply return –ENOSYS.
 One file for each facility:



sem.c – semaphores.
shm.c – shared memory.
msg.c – message queues.
linux/kernel
 The core kernel code.
 sched.c – “the main kernel file”:

scheduler, wait queues, timers, alarms, task queues.
 Process control:

fork.c, exec.c, signal.c, exit.c etc…
 Kernel module support:

kmod.c, ksyms.c, module.c.
 Other operations:


time.c, resource.c, dma.c, softirq.c, itimer.c.
printk.c, info.c, panic.c, sysctl.c, sys.c.
linux/lib
 kernel code cannot call standard C library routines.
 Files:





brlock.c – “Big Reader” spinlocks.
cmdline.c – kernel command line parsing routines.
errno.c – global definition of errno.
inflate.c – “gunzip” part of gzip.c used during boot.
string.c – portable string code.


Usually replaced by optimized, architecture-dependent routines.
vsprintf.c – libc replacement.
linux/mm
 Paging and swapping:



swap.c, swapfile.c (paging devices), swap_state.c (cache).
vmscan.c – paging policies, kswapd.
page_io.c – low-level page transfer.
 Allocation and deallocation:



slab.c – slab allocator.
page_alloc.c – page-based allocator.
vmalloc.c – kernel virtual-memory allocator.
 Memory mapping:



memory.c – paging, fault-handling, page table code.
filemap.c – file mapping.
mmap.c, mremap.c, mlock.c, mprotect.c.
linux/scripts
 Scripts for:



Menu-based kernel configuration.
Kernel patching.
Generating kernel documentation.
Linux Kernel Configuration
 Download the source code and extract it under /usr/src
 Customize kernel configuration



make config
make menuconfig
make xconfig
 Two menu items: Networking options and Network device
support
 Device support

Three options: y, m, n
Kernel Config
 Get sources from
kernel.org
 Unpack in your home
directory
gzip -cd linux-2.4.XX.tar.gz | tar xvf -
 make menuconfig
 make dep
 make bzimage
Building and Installing Kernel
 Compiling the kernel.




make dep
[ make clean ]
make bzImage
make modules
 Installing the kernel.


make modules_install
[ make install ]




Kernel stored in /usr/src/linux-2.4/arch/i386/boot/bzImage and copied to /boot
cp arch/i386/boot/bzImage /boot/
Edit your lilo.conf and run /sbin/lilo
Reboot.
LILO v.s. GRUB
 LILO


Run LILO to modify mini-bootloader in the MBR
Cannot read file system itself
 GRUB


Multistage loader
Can read file-system itself
 Parameter passing (runlevel, init) to kernel

Actually hacking – modifies address and name inside kernel for
the process to start
Linux Loader (LILO)
# sample /etc/lilo.conf
boot = /dev/hda
delay = 40
password=SOME_PASSWORD_HERE
default=vmlinuz-stable
vga = normal
root = /dev/hda1
image = vmlinuz-2.5.99
label = net test kernel
restricted
image = vmlinuz-stable
label = stable kernel
restricted
other = /dev/hda3
label = Windows 2000 Professional
restricted
table = /dev/hda
GRUB
# /etc/grub.conf generated by anaconda
timeout=10
splashimage=(hd0,1)/grub/splash.xpm.gz
password --md5 $1$ÕpîÁÜdþï$J08sMAcfyWW.C3soZpHkh.
title Red Hat Linux (2.4.18-3custom)
root (hd0,1)
kernel /vmlinuz-2.4.18-3custom ro root=/dev/hda5
initrd /initrd-2.4.18-3.img
title Red Hat Linux (2.4.18-3) Emergency kernel (no afs)
root (hd0,1)
kernel /vmlinuz-2.4.18-3 ro root=/dev/hda5
initrd /initrd-2.4.18-3.img
title Windows 2000 Professional
rootnoverify (hd0,0)
chainloader +1
/boot
unix boot # pwd; ls -lRp
/boot
.:
total 1582
lrwxrwxrwx 1 root root
1 Sep 23 14:11 boot -> ./
drwxr-xr-x 2 root root 1024 Sep 23 15:34 grub/
-rw-r--r-- 1 root root 458622 Sep 23 14:58 initrd-2.4.26-gentoo-r9
-rw-r--r-- 1 root root 1137878 Sep 23 14:50 kernel-2.4.26-gentoo-r9
drwx------ 2 root root 12288 Sep 23 13:49 lost+found/
./grub:
total 846
-rw-r--r-- 1 root root 30 Sep 23 15:34 device.map
-rw-r--r-- 1 root root 11264 Sep 23 15:34 e2fs_stage1_5
-rw-r--r-- 1 root root 10256 Sep 23 15:34 fat_stage1_5
-rw-r--r-- 1 root root 9216 Sep 23 15:34 ffs_stage1_5
-rw-r--r-- 1 root root 245 Sep 23 15:34 grub.conf
-rw-r--r-- 1 root root 1495 Sep 23 15:32 grub.conf.sample
-rw-r--r-- 1 root root 11456 Sep 23 15:34 jfs_stage1_5
lrwxrwxrwx 1 root root 9 Sep 23 15:32 menu.lst -> grub.conf
-rw-r--r-- 1 root root 9600 Sep 23 15:34 minix_stage1_5
-rwxr-xr-x 1 root root 196836 Sep 23 15:32 nbgrub
-rwxr-xr-x 1 root root 197860 Sep 23 15:32 pxegrub
-rw-r--r-- 1 root root 12864 Sep 23 15:34 reiserfs_stage1_5
-rw-r--r-- 1 root root 33856 Sep 23 15:32 splash.xpm.gz
-rw-r--r-- 1 root root 512 Sep 23 15:34 stage1
-rw-r--r-- 1 root root 135148 Sep 23 15:34 stage2
-rwxr-xr-x 1 root root 196900 Sep 23 15:32 stage2.netboot
-rw-r--r-- 1 root root 8896 Sep 23 15:34 vstafs_stage1_5
-rw-r--r-- 1 root root 12840 Sep 23 15:34 xfs_stage1_5
./lost+found:
total 0
unix boot #
unix grub # more grub.conf
default 0
# How many seconds to wait before the default listing is booted.
timeout 5
title=gentoo
root (hd0,0)
kernel /kernel-2.4.26-gentoo-r9 root=/dev/ram0 init=/linuxrc ramdisk=8192 real_root=/dev/hda2
initrd /initrd-2.4.26-gentoo-r9
title GNU/Linux (2.4.25)
root (hd0,4)
kernel (hd0,4)/boot/kernel-2.4.25_pre6-gss root=/dev/ram0 init=/linuxrc real_root=/dev/hda5 vga=0x317 splash=verbose
initrd (hd0,4)/boot/initrd-2.4.25_pre6-gss
unix grub #
Build kernel for other platform
 unix linux # make menuconfig ARCH=x86_64
 /usr/src/linux/arch

alpha cris ia64 mips parisc ppc64 s390x sh64 sparc64
arm i386 m68k mips64 ppc s390 sh sparc x86_64
 Cross-compile

make HOSTCC="gcc -m32" ARCH="x86_64" bzImage

get a gcc wrapper in order to crosscompile on a i386 host
 http://www.jukie.net/~bart/debian/amd64/scripts/gcc.bart
Linux modules
 driver modules in /lib/module
[root@dafinn net]# pwd; ls
/lib/modules/2.4.22-1.2166.nptlsmp/kernel/drivers/net
3c509.o
b44.o
eepro100.o netconsole.o
pppox.o
3c59x.o
bonding epic100.o
ns83820.o
ppp_synctty.o
8139cp.o
de4x5.o ethertap.o pcmcia
r8169.o
8139too.o
dl2k.o
fealnx.o
pcnet32.o
sis900.o
82596.o
dmfe.o
irda
ppp_async.o
sk98lin
8390.o
dummy.o mii.o
ppp_deflate.o slhc.o
acenic.o
e100
natsemi.o
ppp_generic.o smc9194.o
amd8111e.o e1000
ne2k-pci.o pppoe.o
starfire.o
 Recompiling the kernel



Make the kernel smaller
Add a new device
Modify a system parameter
tg3.o
tlan.o
tulip
tun.o
typhoon.o
via-rhine.o
wireless
Hello World !
#define MODULE
#include <linux/module.h>
int init_module() {
printk("<1>Hello, world\n");
return 0;
}
void cleanup_module() {
printk("<1>Goodbye !\n");
}
Loaded and Unloaded
root# gcc -c hello.c
root# insmod ./hello.o
Hello, world
root# rmmod hello
Goodbye !
root#
 MUST be root to load/unload module
 Check /var/log/messages if nothing shown
 Can use module_init(my_init) &
module_exit(my_cleanup) in Linux 2.2 and later
BSD
Source Code Control
 The entire source code for FreeBSD is stored in a CVS
repository
 The logs, and individual changes for each file can be
traced back to 1994.
 The source tree can be checked out at any state, or
corresponding to any release
 CDs are available taking the history back a further 20
years
Building World
 The entire operating system, including all libraries and
utilities can be built with a single command : “make
world”
 The source code for the system is placed in /usr/src
during installation.
 Much easier to secure a system if a bug is found in a key
library like OpenSSL.
 More information in build(7) and Handbook.
Building Releases
 You can even build a complete release of FreeBSD,
including FTP install directories, floppy images, and ISO
images for CDROMs with one command.
 “make release” is used by many large companies to
produce special versions of FreeBSD with special
patches or additional software installed by default.
 It is also the well documented way in which the release
engineering team makes all official releases of FreeBSD.
Release Engineering
 “make release” makes it much easier to deploy
thousands of systems pre-configured for a specific
environment.
 The release engineering team for FreeBSD publishes
schedules, identifies QA issues that must be resolved
before release, and publishes documents to help other
people build FreeBSD based products.
 See release(7) and www.freebsd.org/releng
Running a FreeBSD binary
 Code like
fd = open(“/etc/passwd”, O_RDONLY);
 Becomes
syscall(5, ...)
 Kernel knows it’s a FreeBSD binary, uses
freebsd_syscalls[ ] array
freebsd_syscalls[5] = freebsd_open(…);
 File is opened
Running a Linux binary
 Code like
fd = open(“/etc/passwd”, O_RDONLY);
 Becomes
syscall(5, ...)
 Kernel knows it’s a Linux binary, uses linux_syscalls[ ]
array
linux_syscalls[5] = linux_open(…);
 File is opened
 All Linux file operations redirected to /compat/linux first
Linux modules
 List installed modules

lsmod
 Module dependencies
 The meaning of (autoclean)
 Load the module mannually:


insmod [–k] 3c509
modprobe smc-ultra
 Generate the dependency

depmod –a
 Remove module: rmmod
 Device driver can be dynamically loaded to compiled into
the kernel.
Function pointers
LKM Utilities cmd

insmod


rmmod


List currently loaded LKMs.
modinfo


Display symbols that are exported by the kernel for use by new LKMs.
lsmod


Kerneld daemon program
ksyms


Determine interdependencies between LKMs.
kerneld


Remove an LKM from the kernel.
depmod


Insert an LKM into the kernel.
Display contents of .modinfo section in an LKM object file.
modprobe

Insert or remove an LKM or set of LKMs intelligently. For example, if you must load A before loading B,
Modprobe will automatically load A when you tell it to load B.
Two ways for loading a module


manual way : using ‘insmod’ command.
automatic way : the kernel discovers the need for
loading a module (for example, user mounts a file
system) => requests ‘kerneld’ daemon to load the
needed module. => ‘kerneld’ loads module using
insmod.
The ‘insmod’ mechanism
for loading a module




reads the module into its virtual memory
fixes up unresolved references to kernel routines and
resources using the exported symbols from the kernel.
requests the kernel for enough space to hold the new kernel
the kernel allocates a new module data structure and enough
kernel memory to hold the new module and puts it at the end of
the kernel modules list
The ‘insmod’ mechanism
for loading a module




insmod copies the module into the allocated space and
relocates it so that it will run from the kernel address that it has
been allocated
the new module exports sysmbols to the kernel and insmod
builds a table of these exported symbols
if the new module depends on another module, that module
has the reference of the new module.
the kernel calls the modules initialization routine and carries on
installing the module
Unloadig a module
 Two ways for unloading a module


Manual way : uses ‘rmmod’ command
automatic way : when idle timer expires, the kerneld calls the
service routines for all unused loaded modules
 the mechanism of unloading



if the module can be unloaded, its cleanup routine is called to
free up the kernel resources that it has allocated
the module data structure is unlinked from the list of kernel
modules
all of the kernel memory that the module needed is deallocated.
 Advantages





User Space Driver
Full C library can be linked in
Run in conventional debugger
Problem in driver unlikely to hang entire system
User memory is swapping (more to use)
Still allow concurrent access to a device for well-designed driver
 Disadvantages





Interrupts are not available in user space
Direct access to memory by mmapping /dev/mem (only privileged user)
Access to I/O ports after calling ioperm or iopl (not all platform support), and
access to /dev/port can be too slow (only privileged user)
Response time is slower (context switch, driver swapped to disk)
Most important devices can’t be handled in user space (block, network)