Windows OS Internals
Download
Report
Transcript Windows OS Internals
Windows Internals
David Solomon ([email protected])
David Solomon Expert Seminars
www.solsem.com
Mark Russinovich ([email protected])
Winternals
www.winternals.com, www.sysinternals.com
About the Speaker: David Solomon
1982-1992: VMS operating systems development at
Digital
1992-present: Researching, writing, and teaching
Windows operating system internals
Frequent speaker at technical conferences
(Microsoft TechEd, IT Forum, PDCs, …)
Microsoft Most Valuable Professional (1993, 2005)
Books
Windows Internals, 4th edition
PDF version ships with Server 2003 Resource Kit
Inside Windows 2000, 3rd edition
Inside Windows NT, 2nd edition
Windows NT for OpenVMS Professionals
Live Classes
2-5 day classes on Windows Internals,
Advanced Troubleshooting
Video Training
12 hour interactive internals tutorial
Licensed by MS for internal use
2
About the Speaker: Mark Russinovich
Co-author of Inside Windows 2000, 3rd
Edition and Windows Internals, 4th edition
with David Solomon
Senior Contributing Editor to Windows IT
Pro Magazine
Co-authors Windows Power Tools column
Author of tools on www.sysinternals.com
Microsoft Most Valuable Professional
(MVP)
Co-founder and chief software architect
of Winternals Software
(www.winternals.com)
Ph.D. in Computer Engineering
3
Acknowledgements
Special thanks to:
Dave Cutler for initially granting David access to
the source code in 1993 and reviewing the book
and presentations
Rob Short & Jim Allchin for continuing to be our
“executive sponsors”
Also thanks to many others in the Windows
team (past & present) for their support and
assistance:
Landy Wang, Neil Clift, Jim Allchin, Mark Lucovsky, Brian
Andrews, Richard Ward, Steve Wood, Tom Miller, Gary
Kimura, Darryl Havens, Lou Perazzoli
4
Purpose of Tutorial
Give Windows developers a foundation
understanding of the system’s kernel
architecture
Design better for performance & scalability
Debug problems more effectively
Understand system performance issues
We’re covering a small, but important set of core
topics:
The “plumbing in the boiler room”
5
System Architecture
System Processes
Service
Control Mgr.
WinLogon
Session
Manager
Environment
Subsystems
Applications
SvcHost.Exe
WinMgt.Exe
SpoolSv.Exe
LSASS
User
Mode
Services
POSIX
Task Manager
Explorer
User
Application
Services.Exe
Subsystem DLLs
OS/2
Windows
NTDLL.DLL
System
Threads
Kernel
Mode
System Service Dispatcher
(kernel mode callable interfaces)
Local
Procedure
Call
Configuration Mgr
(registry)
Processes
&
Threads
Virtual
Memory
Security
Reference
Monitor
Power
Mgr.
Object
Mgr.
File
System
Cache
Device &
File Sys.
Drivers
Plug and
Play Mgr.
I/O Mgr
Windows
USER,
GDI
Graphics
Drivers
Kernel
Hardware Abstraction Layer (HAL)
hardware interfaces (buses, I/O devices, interrupts,
interval timers, DMA, memory cache control, etc., etc.)
6
Tools Used To Dig In
Many tools available to dig into Windows
OS internals without requiring source code
Helps to see internals behavior “in action”
Many of these tools are used in labs in the
video and the book
Several sources of tools
Support Tools (on Windows OS CD-ROM in
\support\tools)
Resource Kit Tools
Sysinternals tools (www.sysinternals.com)
Windows Debugging Tools
7
Live Kernel Debugging
Useful for investigating internal system
state not available from other tools
Previously, required 2 computers
(host and target)
Target would be halted while host debugger
in use
XP & later supports live local kernel
debugging
Technically requires system to be booted
/DEBUG to work correctly
But, not all commands work
8
LiveKD
LiveKd makes more commands work on a
live system
Works on NT4, Windows 2000, Windows XP,
Server 2003, and Vista
Was originally shipped on Inside Windows 2000
book CD-ROM – now is free on Sysinternals
Tricks standard Microsoft kernel debuggers
into thinking they are looking at a crash dump
Does not guarantee consistent view of
system memory
Thus can loop or fail with access violation
Just quit and restart
9
Outline
1.
2.
3.
4.
System Architecture
Processes and Thread Internals
Memory Management Internals
Security Internals
10
System Architecture
Process Execution Environment
Kernel Architecture
Interrupt Handling
Object Manager
System Threads
Process-based code
Summary
11
Processes And Threads
What is a process?
Represents an instance of a
running program
Per-process
address space
Thread
Thread
You create a process to run a
program
Starting an application creates a
process
Process defined by
Thread
Address space
Resources (e.g., open handles)
Security profile (token)
System call
Primary argument to
CreateProcess is image file
name (or command line)
System-wide
address space
12
Processes And Threads
Per-process
address space
What is a thread?
An execution context within a process
Unit of scheduling (threads run, processes
don’t run)
All threads in a process share the same
per-process address space
Thread
Thread
Services provided so that threads can
synchronize access to shared resources
(critical sections, mutexes, events,
semaphores)
All threads in the system are scheduled as
peers to all others, without regard to their
“parent” process
Thread
System call:
Primary argument to CreateThread is a
function entry point address
Linux:
No threads per-se
Tasks can act like Windows threads by
sharing handle table, PID and address
space
System-wide
address space
13
Processes And Threads
Every process starts with one thread
First thread executes the program’s “main” function
Can create other threads in the same process
Can create additional processes
Why divide an application into multiple threads?
Perceived user responsiveness, parallel/background execution
Examples: Word background print – can continue to edit during print
Take advantage of multiple processors
On an MP system with n CPUs, n threads can literally run at the
same time
Question: Given a single threaded application, will adding a second
processor make it run faster?
Does add complexity
Synchronization
Scalability well is a different question…
Number of multiple runnable threads versus number CPUs
Having too many runnable threads causes excess context switching
14
32-bit x86 Address Space
32-bits = 4 GB
Default
2 GB
User
process
space
2 GB
System
Space
3 GB user space
3 GB
User
process
space
1 GB
System Space
15
64-bit Address Spaces
64-bits = 17,179,869,184 GB
x64 today supports 48 bits virtual = 262,144 GB
IA-64 today support 50 bits virtual = 1,048,576 GB
x64
Itanium
8192 GB
(8 TB)
User
process
space
7152 GB
(7 TB)
User
process
space
6657 GB
System
Space
6144 GB
System
Space
16
Memory Protection Model
No user process can touch another user process address
space (without first opening a handle to the process,
which means passing through NT security)
Separate process page tables prevent this
“Current” page table changed on context switch from a thread in 1
process to a thread in another process
No user process can touch kernel memory
Page protection in process page tables prevent this
OS pages only accessible from “kernel mode”
x86: Ring 0, Itanium: Privilege Level 0
Threads change from user to kernel mode and back (via a secure
interface) to execute kernel code
Does not affect scheduling (not a context switch)
17
Process Explorer (Sysinternals)
“Super Task Manager”
Shows full image path, command line, environment
variables, parent process, thread details, security access
token, open handles, loaded DLLs & mapped files
18
System Architecture
Process Execution Environment
Kernel Architecture
Interrupt Handling
Object Manager
System Threads
Process-based code
Summary
19
Windows Kernel Evolution
Basic kernel architecture has remained
stable while system has evolved
Windows 2000: major changes in I/O
subsystem (plug & play, power management,
WDM), but rest similar to NT4
Windows XP & Server 2003: modest upgrades
as compared to the changes from NT4 to
Windows 2000
Internal version numbers confirm this:
Windows 2000 was 5.0
Windows XP is 5.1
Windows Server 2003 is 5.2
Windows Vista is 6.0
20
Kernel Architecture
Is Windows NT/2000/XP/2003 a microkernel-based OS?
No – not using the academic definition (OS components and
drivers run in their own private address spaces, layered on a
primitive microkernel)
All kernel components live in a common shared address space
Therefore no protection between OS and drivers
But it does have some attributes of a microkernel OS
OS personalities running in user space as separate processes
Kernel-mode components don't reach into one another’s
data structures
Use formal interfaces to pass parameters and access and/or modify data
structures
Therefore the term “modified microkernel”
Why not pure microkernel?
Performance – separate address spaces would mean context
switching to call basic OS services
Linux has the same monolithic kernel architecture
So do most Unix’s, VMS, …
21
Example
Invoking a Win32 Kernel API
Windows application
WriteFile
in Kernel32.Dll
NtWriteFile
in NtDll.Dll
call WriteFile(…)
call NtWriteFile
return to caller
Int 2E or SYSENTER or SYSCALL
return to caller
software interrupt
Win32specific
used by all
subsystems
U
K
KiSystemService
in NtosKrnl.Exe
call NtWriteFile
dismiss interrupt
NtWriteFile
in NtosKrnl.Exe
do the operation
return to caller
22
API Differences
Windows DLLs versus NtDll.Dll
Windows “kernel” APIs exported by Kernel32.Dll are different from the
“native API” in NtDll.Dll
Different entry point names
Arguments are different (but similar)
Routines in Kernel32.Dll rearrange (“marshal”) the arguments and call
routines in NtDll.Dll
NtDll.Dll uses change mode mechanism (INT 2E, SYSCALL) to invoke
services in NtosKrnl.Exe in kernel mode
NtDll.Dll versus NtosKrnl.Exe
1400 exported symbols (285 start with “Nt”)
Entry point names, arguments, etc., are the same between NtDll.Dll and
NtosKrnl.Exe
I.e., a user-mode routine in the native API can also be called from kernel
mode
The DDK describes many “Zw” routines such as ZwReadFile, callable
from kernel mode – this is the same location in memory as NtReadFile
from user mode
Kernel mode code could also call NtReadFile directly
23
Symmetric Multiprocessing (SMP)
No master processor
All the processors share just one memory space
Interrupts can be serviced on any processor
Any processor can cause another processor to reschedule what it’s
running
CPUs
L2-Cache
Memory
I/O
Windows Server 2003 supports NUMA (non uniform
memory architecture) systems
24
New MP Configurations
Hyperthreading support
CPU fools OS into thinking there are multiple CPUs
Example: dual Xeon with hyperthreading can support 2 logical processors
XP & Windows Server 2003 are hyperthreading aware
Logical processors don’t count against physical CPU limits
E.g. XP Home will use 2 logical processors; XP Pro will use 4
Scheduling algorithms take into account logical vs physical processors
Dual Core
Processor licensing is per-socket
NUMA (non uniform memory architecture)
Groups of physical processors (called “nodes”) that have “local
memory”
Still an SMP system (e.g. any processor can access all of memory)
But node-local memory is faster
Scheduling algorithms take this into account
25
Kernel Synchronization
Kernel synchronization primitives
Spinlocks
Queued Spinlocks
Pushlocks
Executive Resources
Fast Mutexes, Guarded Mutexes
Kernel Dispatcher Mutexes & Semaphores
Scalability improvements
Elimination of locks
Locks held shorter durations
Scheduling database now per-CPU
26
Increased System Memory Limits
Key system memory limits raised in XP and 2003
Windows 2000 limit of 200 GB of mapped file
data eliminated
Previously limited size of files that could be backed up
Variable system PTEs can now describe 1.3 GB
(960 MB contiguous)
Windows 2000 limit was 660 MB (220 MB contiguous)
Max device driver size was 220 MB, now 960 MB
Registry limit of 376MB removed
Was a limit on number of terminal server users
No longer in paged pool – now a memory-mapped file
No registry quota any more
SYSTEM hive limited to 200 MB or ¼ of RAM,
whichever is lower (max was 12 MB)
27
Increased Limits in 64-bit
Windows
User Address Space
Page file limit
Max page file space
System PTE Space
System Cache
Paged pool
MB
Non-paged pool
IA64
7152 GB
16 TB
x64
8192 GB
16 TB
256 TB
128 GB
1 TB
128 GB
256 TB
128 GB
1 TB
128 GB
x86
2-3 GB
4095 MB
PAE: 16 TB
~64 GB
1.2 GB
960 MB
470-650
128 GB
128 GB
256 MB
28
Many Packages…
1. Windows XP Home Edition
Licensed for 1 CPU die, 4GB RAM
2. Windows 2000 & XP Professional
Desktop version (but also is a fully functional server system)
Licensed for 2 CPU dies, 4GB RAM (128GB for 64-bit edition on x64)
3. Windows Server 2003, Web Server
Reduced functionality Standard Server (no domain controller)
Licensed for 2 CPU dies, 2GB RAM
4. Windows Server 2003, Standard Edition (formerly Windows 2000 Server)
Adds server and networking features (active directory-based domains, host-based
mirroring and RAID 5, NetWare gateway, DHCP server, WINS, DNS, …)
Licensed for 4 CPU dies, 4GB RAM (128GB on x64)
5. Windows Server 2003, Enterprise Edition
(formerly Windows 2000 Advanced Server )
3GB per-process address space option, Clusters (8 nodes)
32-bit: 8 CPU dies, 32GB RAM; 64-bit: 64GB
6. Windows 2000 Datacenter Server & Windows 2003 Server, Datacenter Edition
32-bit: 32 processors, 64GB RAM; 64-bit: 64 processors & 1024GB RAM
NOTE: this is not an exhaustive list
XP: Tablet PC edition, Media Center Edition, Starter Edition, N Edition
Server: Small Business Server, Storage Server, …
29
...One OS Kernel
Windows XP & 2003 for x64 (5.2) and all Windows 2000
versions have identical core operating system
executables
NTOSKRNL.EXE, HAL.DLL, xxxDRIVER.SYS, etc.
XP & Server 2003 have different kernel versions (5.1 vs 5.2)
Registry indicates system type (set at install time)
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control
\ProductOptions
ProductType: WinNT=Workstation, ServerNT=Server not a domain
controller, LanManNT=Server that is a Domain Controller
ProductSuite: indicates type of Server (Advanced, Datacenter, or for
Windows NT 4.0: Enterprise Edition, Terminal Server, …)
Code in the operating system tests these values and
behaves slightly differently in a few places
Licensing limits (number of processors, number of inbound network
connections, etc.)
Boot-time calculations (mostly in the memory manager)
Default length of time slice
30
NTOSKRNL.EXE
Core operating system image
Contains Executive and Kernel
Four retail variations:
NTOSKRNL.EXE
NTKRNLMP.EXE
Uniprocessor
Multiprocessor
32-bit Windows PAE versions (for DEP & >4GB
RAM):
NTKRNLPA.EXE
addressing support
NTKRPAMP.EXE
addressing support
Uniprocessor w/extended
Multiprocessor w/extended
Vista: no uniprocessor kernel
31
Debug Version
“Checked Build”
Special debug version of system called “Checked Build”
Provided with MSDN
Primarily for driver testing, but can be useful for catching timing bugs in multithreaded
applications
Built from same source files as “free build” (a.k.a., “retail build”)
“DBG” compile-time symbol defined which enables:
Error tests for “can’t happen” conditions in kernel mode (ASSERTs)
Validity checks on arguments passed from one kernel mode routine to another
#ifdef DBG
if (something that should never happen has happened)
KeBugCheckEx(…)
#endif
Multiprocessor kernel (of course, runs on UP systems)
Can capture kernel debugger output with Dbgview from Sysinternals.com
See Knowledge base article 314743 (HOWTO: Enable Verbose Debug
Tracing in Various Drivers and Subsystems)
32
System Architecture
System Processes
Services
Service
Control Mgr.
SvcHost.Exe
WinMgt.Exe
SpoolSv.Exe
LSASS
WinLogon
User
Mode
Session
Manager
Environment
Subsystems
Applications
POSIX
Task Manager
Explorer
User
Application
Services.Exe
OS/2
Windows
Subsystem DLLs
NTDLL.DLL
System
Threads
Kernel
Mode
System Service Dispatcher
(kernel mode callable interfaces)
Local
Procedure
Call
Configuration Mgr
(registry)
Processes
&
Threads
Virtual
Memory
Security
Reference
Monitor
Power
Mgr.
Object
Mgr.
File
System
Cache
Device &
File Sys.
Drivers
Plug and
Play Mgr.
I/O Mgr
Windows
USER,
GDI
Graphics
Drivers
Kernel
Hardware Abstraction Layer (HAL)
hardware interfaces (buses, I/O devices, interrupts,
interval timers, DMA, memory cache control, etc., etc.)
Original copyright by Microsoft Corporation. Used by permission.
33
Executive
Upper layer of the operating system
Provides “generic operating system” functions (“services”)
Process Manager
Object Manager
Cache Manager
LPC (local procedure call) Facility
Configuration Manager
Memory Manager
Security Reference Monitor
I/O Manager
Power Manager
Plug-and-Play Manager
Almost completely portable C code
Runs in kernel (“privileged”, ring 0) mode
Most interfaces to executive services not documented
34
Kernel
Lower layers of the operating system
Implements processor-dependent functions (x86 versus Itanium,
etc.)
Also implements many processor-independent functions that are
closely associated with processor-dependent functions
Main services
Thread waiting, scheduling, and context switching
Exception and interrupt dispatching
Operating system synchronization primitives
(different for MP versus UP)
A few of these are exposed to user mode
Not a classic “microkernel”
shares address space with rest of kernel-mode components
35
HAL – Hardware Abstraction Layer
Responsible for a small part of “hardware
abstraction”
Components on the motherboard not handled by drivers
System timers, Cache coherency, and flushing
SMP support, Hardware interrupt priorities
Subroutine library for the kernel and device drivers
Isolates OS & drivers from platform-specific details
Presents uniform model of I/O hardware interface to
drivers
Reduced role in Windows 2000
Bus support moved to bus drivers
Majority of HALs are vendor-independent
36
Digging Into NTOSKRNL.EXE
Exported symbols
Functions and global variables Microsoft wants visible outside the
image (e.g., used by device drivers)
About 1500 symbols exported, of which about 400 are
documented in the DDK
Ways to list:
Dependency Walker (File->Save As)
Visual C++ “link /dump /exports ntoskrnl.exe”
Global symbols
Over 9000 global symbols in XP/2003 (Windows NT 4.0 was
4700)
Many variables contain values related to performance and memory
policies
Ways to list:
Visual C++: “dumpbin /symbols /all ntoskrnl.exe” (names only)
Kernel debugger: “x nt!*”
Module name of NTOSKRNL is “NT”
37
Naming Convention For Internal
NTOSKRNL Routines
Two- or three-letter component code in beginning of function name
Executive
Ex
Exp
Cc
Mm
Rtl
FsRtl
- General executive routine
- Executive private (not exported)
- Cache manager
- Memory management
- Run-Time Library
- File System Run-Time Lib
Ob
Io
Se
Ps
Lsa
Zw
- Object management
- I/O subsystem
- Security
- Process structure
- Security Authentication
- File access, etc.
Kernel
Ke
Ki
- Kernel
- Kernel internal (not available outside the kernel)
HAL
Hal
- Hardware Abstraction Layer
READ_, WRITE_ - I/O port and register access
38
System Architecture
Process Execution Environment
Kernel Architecture
Interrupt Handling
Object Manager
System Threads
Process-based code
Summary
39
Interrupt Dispatching
user or kernel
mode
code
interrupt !
kernel mode
Note, no thread or
process context switch!
Interrupt dispatch routine
Disable interrupts
Interrupt service routine
Record machine state (trap
frame) to allow resume
Mask equal- and lower-IRQL
interrupts
Find and call appropriate ISR
Tell the device to stop
interrupting
Interrogate device state, start
next operation on device, etc.
Request a DPC
Return to caller
Dismiss interrupt
Restore machine state
(including mode and enabled
interrupts)
40
Interrupt Precedence Via IRQLs
IRQL = Interrupt Request Level
The “precedence” of the interrupt with
respect to other interrupts
Different interrupt sources have
different IRQLs
Not the same as IRQ
31
30
29
28
2
1
0
High
Power fail
Interprocessor Interrupt
Clock
Device n
.
.
.
Device 1
Dispatch/DPC
APC
Passive
IRQL is also a state of the
processor
Servicing an interrupt raises
processor IRQL to that
interrupt’s IRQL
This masks subsequent interrupts
at equal and lower IRQLs
User mode is limited to IRQL 0
No waits or page faults at
IRQL >= DISPATCH_LEVEL
Hardware interrupts
Deferrable software interrupts
normal thread execution
41
Deferred Procedure Calls (DPCs)
Used to defer processing from higher (device) interrupt level to a lower
(dispatch) level
Driver (usually ISR) queues request
One queue per CPU; DPCs are normally queued to the current processor,
but can be targetted to other CPUs
Executes specified procedure at dispatch IRQL (or “dispatch level”, also
“DPC level”) when all higher-IRQL work (interrupts) completed
Used heavily for driver “after interrupt” functions
Also used for quantum end and timer expiration
queue head
DPC object
DPC object
DPC object
42
IRQLs on 64-bit Systems
x64
15
14
13
12
4
3
2
1
0
IA64
High/Profile
High/Profile/Power
Interprocessor Interrupt/Power
Interprocessor Interrupt
Clock
Synch (Srv 2003)
Device n
.
.
Device 1
Dispatch/DPC
APC
Passive/Low
Clock
Synch (MP only)
Device n
.
Device 1
Correctable Machine Check
Dispatch/DPC & Synch (UP only)
APC
Passive/Low
43
System Architecture
Process Execution Environment
Kernel Architecture
Interrupt Handling
Object Manager
System Threads
Process-based code
Summary
44
Object Manager
Executive component for managing
system-defined “objects”
Objects are data structures with optional names
“Objects” managed here include Windows Kernel
objects, but not Windows User or GDI objects
Object manager implements user-mode handles and
the process handle table
Object manager is not used for all OS data
structures
Generally, only those types that need to be shared,
named, or exported to user mode
Some data structures are called “objects” but are not
managed by the object manager (e.g., “DPC objects”)
45
Object Manager
In part, a heap manager…
Allocates memory for data structure from system-wide,
kernel space heaps (pageable or nonpageable)
…With a few extra functions
Assigns name to data structure (optional)
Allows lookup by name
Objects can be protected by ACL-based security
Provides uniform naming, sharing, and protection
scheme
Simplifies C2 security certification by centralizing all object
protection in one place
Maintains counts of handles and references (stored
pointers in kernel space) to each object
Object cannot be freed back to the heap until all handles and
references are gone
46
Handles And Security
Process handle table
Is unique for each process
But is in system address space, hence cannot be
modified from user mode
Hence, is trusted
Security checks are made when handle table
entry is created
i.e. at CreateXxx time
Handle table entry indicates the “validated” access
rights to the object
Read, Write, Delete, Terminate, etc.
No need to revalidate on each request
47
Examining Handles: MS Tools
Two tools:
XP & 2003: openfiles /query command
Resource Kit “oh” (Open Handles) tool
Both of these require a special NT “global flag”
registry bit to be set
Requires reboot to take effect
See HKEY_LOCAL_MACHINE\System\CurrentControlSet
\Control\Session Manager\GlobalFlag
Can view this bitmask with the GFLAGS tool
Uses 8 bytes extra for each open handle
48
Examining Open Handles:
Sysinternals Tools
Process Explorer (GUI version) or Handle
(character cell version) from
www.sysinternals.com
Uses a device driver to walk handle table, so doesn’t
need Global Flag set
49
Viewing Open Handles
Handle View
By default, shows named objects
Click on Options->Show Unnamed Objects
Uses:
Solve file locked errors
Can search to determine what process is holding a file or
directory open
Can even close an open files (be careful!)
Understand resources used by an application
Detect handle leaks using refresh difference
highlighting
View the state of synchronization objects (mutexes,
semaphores, events)
50
Viewing Handles With Kernel
Debugger
If looking at a dump, use !handle in Kernel
Debugger (see help for options)
lkd> !handle 0 f 9e8 file
Searching for Process with Cid == 9e8
Searching for handles of type file
PROCESS 82ce72d0 SessionId: 0 Cid: 09e8 Peb: 7ffdf000 ParentCid: 06e
DirBase: 06602000 ObjectTable: e1c879c8 HandleCount: 430.
Image: POWERPNT.EXE
…
0280: Object: 82c5e230 GrantedAccess: 00120089
Object: 82c5e230 Type: (82fdde70) File
ObjectHeader: 82c5e218
HandleCount: 1 PointerCount: 1
Directory Object: 00000000 Name:
\slides\ntint\new\4-systemarchitecture.ppt {HarddiskVolume1}
51
Object Manager Namespace
System and session-wide internal
namespace
View with Winobj from
www.sysinternals.com
52
Object Manager Namespace
Namespace
Hierarchical directory structure (based on file system model)
System-wide (not per-process)
With Terminal Services, Windows objects are per-session by default
Vista: console no longer is session 0
Can override this with “global\” prefix on object names
Volatile (not preserved across boots)
Namespace can be extended by secondary object managers
(e.g., file system)
Hook mechanism to call external parse routine (method)
Supports case sensitive or case blind
Supports symbolic links (used to implement drive letters, etc.)
Lookup done on object creation or access by name
Not on access by handle
Not all objects managed by the object manager are named
E.g., file objects are not named
Un-named objects are not visible in WinObj
53
System Architecture
Process Execution Environment
Kernel Architecture
Interrupt Handling
Object Manager
System Threads
Process-based code
Summary
54
System Threads
Functions in OS and some drivers that need to run as real
threads
E.g., need to run concurrently with other system activity, wait on
timers, perform background “housekeeping” work
Always run in kernel mode
Not non-preemptible (unless they raise IRQL to 2 or above)
For details, see DDK documentation on PsCreateSystemThread
What process do they appear in?
“System” process (Windows NT 4.0: PID 2,
Windows 2000: PID 8, Windows XP: PID 4)
In Windows 2000 and later, windowing system threads (from
Win32k.sys) appear in “csrss.exe”
(Windows subsystem process)
55
Examples Of System Threads
Memory Manager
Modified Page Writer for mapped files
Modified Page Writer for paging files
Balance Set Manager
Swapper (kernel stack, working sets)
Zero page thread (thread 0, priority 0)
Security Reference Monitor
Command Server Thread
Network
Redirector and Server Worker Threads
Threads created by drivers for their exclusive use
Examples: Floppy driver, parallel port driver
Pool of Executive Worker Threads
Used by drivers, file systems, …
Accessed via ExQueueWorkItem
56
Identifying System Threads
If System threads are consuming CPU time,
need to find out what code is running, since it
could be any one of a variety of components
Pieces of OS (Ntoskrnl.exe)
File server worker threads (Srv.sys)
Other drivers
To really understand what’s going on, must find
which driver a thread “belongs to”
57
Identifiying System Threads
Process Explorer:
Double click on System
process
Go to Threads tab and
sort by CPU
To view call stack, must use
kernel debugger
Note: several threads run
between clock ticks (or at
high IRQL) and thus don’t
appear to run
Watch context switch count
58
System Architecture
Process Execution Environment
Kernel Architecture
Interrupt Handling
Object Manager
System Threads
Process-based code
Summary
59
Process-Based Code
OS components that run in separate executables
(.exes), in their own processes
Started by system
Not tied to a user logon
Three types
Environment subsystems (already described)
System startup processes
Note: “system startup processes” is not an official Microsoft
defined name
Windows Services
Let’s examine the system process “tree”
Use Tlist /T or Process Explorer
60
Process-Based NT Code
System Startup Processes
First two processes aren’t real processes
Not running a user mode .EXE
No user-mode address space
Different utilities report them with different names
Data structures for these processes (and their initial threads) are
“pre-created” in NtosKrnl.Exe and loaded along with the code
(Idle)
Process id 0
Part of the loaded system image
Home for idle thread(s) (not a real process nor real threads)
Called “System Process” in many displays
(System)
Process id 2 (8 in Windows 2000; 4 in XP)
Part of the loaded system image
Home for kernel-defined threads (not a real process)
Thread 0 (routine name Phase1Initialization) launches the first
“real” process, running smss.exe...
...and then becomes the zero page thread
61
Process-Based NT Code
System Startup Processes
smss.exe
csrss.exe
winlogon.exe
services.exe
lsass.exe
userinit.exe
explorer.exe
Session Manager
The first “created” process
Takes parameters from
\HKEY_LOCAL_MACHINE\System\CurrentControlSet
\Control\Session Manager
Launches required subsystems (csrss) and then winlogon
Windows subsystem
Logon process: Launches services.exe & lsass.exe; presents first
login prompt
When someone logs in, launches apps in
\Software\Microsoft\Windows NT\WinLogon\Userinit
Service Controller; also, home for many NT-supplied services
Starts processes for services not part of services.exe (driven by
\Registry\Machine\System\CurrentControlSet\Services )
Local Security Authentication Server
Started after logon; starts Explorer.exe (see
\Software\Microsoft\Windows NT\CurrentVersion\WinLogon\Shell)
and exits (hence Explorer appears to be an orphan)
and its children are the creators of all interactive apps
62
Logon Process
Winlogon sends username/password to Lsass
Either on local system for local logon, or to Netlogon service on a domain
Windows XP enhancement: Winlogon doesn’t wait for Workstation
service to start if
Account doesn't depend on a roaming profile
Domain policy that affects logon hasn't changed since last logon
Controller for a network logon
Creates a process to run
HKLM\Software\Microsoft\Windows NT
\CurrentVersion\WinLogon\Userinit
By default: Userinit.exe
Runs logon script, restores drive-letter mappings, starts shell
Userinit creates a process to run
HKLM\Software\Microsoft\Windows NT
\CurrentVersion\WinLogon\Shell
By default: Explorer.exe
There are other places in the Registry that control
programs that start at logon
63
Processes Started at Logon
Displays order of processes configured to start at log on time
Also can use new XP built-in tool called
“System Configuration Utility”
To run, click on Start->Help, then “Use Tools…”, then System
Configuration Utility
Only shows what’s defined to start vs Autoruns which shows all places
things CAN be defined to start
Autoruns (Sysinternals)
Msconfig
(in \Windows\pchealth\helpctr\binaries)
64
Windows Services
An overloaded generic term
A process created and managed by the Service
Control Manager (Services.exe)
E.g. Solitaire can be configured as a service, but is
killed shortly after starting
Similar in concept to Unix daemon processes
Typically configured to start at boot time (if started
while logged on, survive logoff)
Typically do not interact with the desktop
Note: Prior to Windows 2000 this is one way to
start a process on a remote machine (now you
can do it with WMI)
65
Life Of A Service
Install time
Setup application tells Service Controller
about the service
Setup
Application
Registry
CreateService
System boot/initialization
SCM reads registry, starts
services as directed
Management/maintenance
Control panel can start
and stop services and
change startup parameters
Service
Controller/
Manager
(Services.Exe
)
Service
Processes
Control
Panel
66
Viewing Service Processes
Process Explorer can highlight
Service Processes
Click on Options->Highlight Services
67
Svchost Mechanism
Windows 2000 introduced generic Svchost.exe
Groups services into fewer processes
Improves system startup time
Conserves system virtual memory
Not user-configurable as to which services go in which processes
3rd parties cannot add services to Svchost.exe processes
Windows XP/2003 have more Svchost processes due to
two new less privileged accounts for built-in services
LOCAL SERVICE, NETWORK SERVICE
Less rights than SYSTEM account
Reduces possibility of damage if system compromised
On XP/2003, four Svchost processes (at least):
SYSTEM, SYSTEM (2nd instance – for RPC), LOCAL SERVICE,
NETWORK SERVICE
68
Mapping Services To Service
Processes
Tlist /S (Debugging
Tools) or Tasklist /svc
(XP/2003) list internal
name of services inside
service processes
Process Explorer shows
more: external display
name and description
69
System Architecture
Process Execution Environment
Kernel Architecture
Interrupt Handling
Object Manager
System Threads
Process-based code
Summary
70
Four Contexts For Executing Code
Full process and thread context
User applications
Windows Services
Environment subsystem processes
System startup processes
Have thread context but no “real” process
Threads in “System” process
Routines called by other threads/processes
Subsystem DLLs
Executive system services (NtReadFile, etc.)
GDI32 and User32 APIs implemented in Win32K.Sys (and graphics
drivers)
No process or thread context (“arbitrary thread context”)
Interrupt dispatching
Device drivers
71
System Architecture
System Processes
Service
Control Mgr.
LSASS
WinLogon
User
Mode
Session
Manager
Services
Environment
Subsystems
Applications
SvcHost.Exe
WinMgt.Exe
SpoolSv.Exe
POSIX
Task Manager
Explorer
User
Application
Services.Exe
OS/2
Windows
Subsystem DLLs
NTDLL.DLL
System
Threads
Kernel
Mode
System Service Dispatcher
(kernel mode callable interfaces)
Local
Procedure
Call
Configuration Mgr
(registry)
Processes
&
Threads
Virtual
Memory
Security
Reference
Monitor
Power
Mgr.
Object
Mgr.
File
System
Cache
Device &
File Sys.
Drivers
Plug and
Play Mgr.
I/O Mgr
Windows
USER,
GDI
Graphics
Drivers
Kernel
Hardware Abstraction Layer (HAL)
hardware interfaces (buses, I/O devices, interrupts,
interval timers, DMA, memory cache control, etc., etc.)
Original copyright by Microsoft Corporation. Used by permission.
72
Outline
1.
2.
3.
4.
System Architecture
Processes and Thread Internals
Memory Management Internals
Security Internals
73
Processes And Threads
Data Structures
Priority Spectrum
Scheduling Decisions
Priority Adjustments
Multiprocessor Considerations
74
Processes And Threads
Each process has its own…
Virtual address space (including program global storage, heap storage,
threads’ stacks)
Processes cannot corrupt each other’s address space by mistake
Working set (physical memory “owned” by the process)
Access token (includes security identifiers)
Handle table for Windows kernel objects
These are common to all threads in the process, but separate and protected
between processes
Each thread has its own…
User-mode stack (automatic storage, call frames, etc.)
Kernel-mode stack
Scheduling state (Wait, Ready, Running, etc.) and priority
Current access mode (user mode or kernel mode)
Saved CPU state if it isn’t running
Access token (optional – overrides process’s if present)
75
Process And Thread Identifiers
Every process and every thread has an identifier
Generically: “client ID” (debugger shows as “CID”)
A.k.a., “process ID” and “thread ID”, respectively
Process IDs and thread IDs are in the same “number space”
These identify the requesting process or thread to its subsystem
“server” process, in API calls that need the server’s help
Visible in PerfMon, Task Manager (for processes),
Process Viewer (for processes), kernel debugger, etc.
IDs are unique among all existing processes
and threads
But might be reused as soon as a process or thread
is deleted
76
Jobs
Processes
Job
Kernel object to manage groups
of processes
Set limits on a process or group of processes
Quotas and restrictions:
Quotas: total CPU time, # active processes, per-process CPU
time, memory usage
Run-time restrictions: priority of all the processes in job;
processors threads in job can run on
Security restrictions: limits what processes can do
Not acquire administrative privileges
Not accessing windows outside the job, no reading/writing the
clipboard
Scheduling class: number from 0-9 (5 is default) - affects length
of thread timeslice (or quantum - t.b.d.)
E.g. can be used to achieve “class scheduling” (partition CPU)
77
Jobs
How do processes become of a job?
Job object has to be created
Then processes are explicitly added
Processes by processes in a job automatically are part of the job
Unless restricted, processes can “break away” from a job
Only Datacenter Server has a built-in tool to take
advantage of jobs
“Process Control Manager” – allows creating definitions for jobs
and associating processes with them
Uses of jobs in OS:
Add/Remove Programs (“ARP Job”)
WMI provider
RUNAS service (SecLogon) uses jobs to terminate processes at
log out
SU from NT4 ResKit didn’t do this
78
Demo: WMI Job
Jobs are used by WMI
Example: run Psinfo (Sysinternals) and pause output
79
Processes And Threads
Internal Data Structures
Access Token
VAD
Process
Object
VAD
VAD
Virtual Address Space Descriptors
Handle Table
See kernel debugger
commands:
dt (see next slide)
!process
!thread
!token
!handle
!object
object
object
Thread
Thread
Thread
...
Access Token
80
Dumping Structures With
Kernel Debugger
!process and !thread show subset of information
in a process & thread block
“dt” (“Display Type”) command can format all the
fields
Syntax: “dt StructureName address –r”
dt nt!_* - displays all OS structures known to dt
Process/thread-related structures
nt!_EPROCESS
nt!_ETHREAD
81
Process Block Layout
lkd> dt nt!_EPROCESS
+0x000 Pcb
: _KPROCESS
+0x06c ProcessLock
: _EX_PUSH_LOCK
+0x070 CreateTime
: _LARGE_INTEGER
+0x078 ExitTime
: _LARGE_INTEGER
+0x080 RundownProtect : _EX_RUNDOWN_REF
+0x084 UniqueProcessId : Ptr32 Void
+0x088 ActiveProcessLinks : _LIST_ENTRY
+0x090 QuotaUsage
: [3] Uint4B
+0x09c QuotaPeak
: [3] Uint4B
+0x0a8 CommitCharge : Uint4B
+0x0ac PeakVirtualSize : Uint4B
+0x0b0 VirtualSize
: Uint4B
.
.
NOTE: Add “-r” to recurse through substructures
82
Thread Block (!strct ethread)
lkd> dt nt!_ETHREAD
+0x000 Tcb
: _KTHREAD
+0x1c0 CreateTime
: _LARGE_INTEGER
+0x1c0 NestedFaultCount : Pos 0, 2 Bits
+0x1c0 ApcNeeded
: Pos 2, 1 Bit
+0x1c8 ExitTime
: _LARGE_INTEGER
+0x1c8 LpcReplyChain : _LIST_ENTRY
+0x1c8 KeyedWaitChain : _LIST_ENTRY
+0x1d0 ExitStatus
: Int4B
+0x1d0 OfsChain
: Ptr32 Void
+0x1d4 PostBlockList : _LIST_ENTRY
+0x1dc TerminationPort : Ptr32 _TERMINATION_PORT
+0x1dc ReaperLink
: Ptr32 _ETHREAD
NOTE: Add “-r” to recurse through substructures
83
Processes And Threads
Data Structures
Priority Spectrum
Scheduling Decisions
Priority Adjustments
Multiprocessor Considerations
84
Scheduling Priorities
Realtime Time Critical
31
Realtime
Realtime
Levels 16-31
Realtime Idle
24
High
16
15
Above Normal
13
Normal
10
Dynamic
Levels 1-15
Below Normal
8
8
Idle
6
4
Dynamic Idle
System Idle
0
85
Thread Scheduling
Priority driven, preemptive
UP: highest priority thread always runs
MP: One of the highest priority runnable thread will be running
somewhere
Event-driven; no guaranteed execution period before preemption
No attempt to share processor(s) “fairly” among processes,
only among threads
Time-sliced, round-robin within a priority level
Order 1 (no scan of all threads)
Linux 2.4 is Order N (2.6 is O1)
86
Thread Scheduling
The “code that does scheduling” is not a thread
i.e. there is no always-instantiated routine called “the
scheduler”
Scheduling routines are called whenever events
occur that change the state of a thread
interval timer interrupts (for quantum end)
interval timer interrupts (for timed wait completion)
other hardware interrupts (for I/O wait completion)
one thread changes the state of a waitable object upon
which other thread(s) are waiting
a thread waits on one or more dispatcher objects
a thread priority is changed
87
Scheduling Scenarios: Preemption
Preemption is strictly event-driven
does not wait for the next clock tick
no guaranteed execution period before preemption
threads in kernel mode may be preempted (unless they raise IRQL to >=
2)
Running Ready
from Wait state
18
17
16
15
14
13
A preempted thread goes back to the head of its ready queue
also, if in real-time priority range, its quantum is reset
88
Scheduling Scenarios
Ready After Wait Resolution
If newly-ready thread is not of higher priority than the
running thread…
…it is put at the tail of the ready queue for its current
priority
If in real-time priority range, its quantum is reset
Running Ready
18
17
16
15
14
13
from Wait state
89
Scheduling Scenarios
Voluntary Switch
When the running thread gives up the CPU…
…Schedule the thread at the head of the next non-empty
“ready” queue
Running Ready
18
17
16
15
14
13
to Waiting state
90
Scheduling Scenarios
Quantum End
When the running thread exhausts its CPU quantum, it goes to the
end of its ready queue
Applies to all threads (even if in kernel mode if IRQL<2)
Quantums can be disabled for a thread by a kernel function
Default quantum on Professional is 2 clock ticks, 12 on Server
standard clock tick is 10 msec; might be 15 msec on some MP Pentium systems
If no other ready threads at that priority, same thread continues running
(just gets new quantum)
If running at boosted priority, priority decays at quantum end (described
later)
18
17
16
15
14
13
Running Ready
91
Quantum Stretching
Resulting quantum:
“Maximum” = 6 ticks
(middle) = 4 ticks
“None” = 2 ticks
Running Ready
8
Quantum stretching does not happen on
NT Server
Quantum on NT Server is 12 ticks
92
Quantum Selection
As of Windows 2000, can choose short quantums
on Server (e.g. for terminal servers)
Windows 2000:
Windows XP:
93
Controlling Quantum
If a process is a member
of a job, quantum can be
adjusted by setting the
“Scheduling Class”
Only applies if process is
>Idle priority class
Only applies if system
running with fixed
quantums (the default
on Servers)
Values are 0-9
5 is default
Scheduling
class
Quantum units
0
6
1
2
3
4
12
18
24
30
5
6
7
36
42
48
8
9
54
60
94
Thread Scheduling States
Init (0)
preempt
Ready (1)
Transition (6)
preemption,
quantum end
Standby (3)
Running (2)
voluntary
switch
Waiting (5)
wait resolved
after kernel
stack made
pageable
Terminate (4)
Ready = thread eligible to be scheduled to run
Standby = thread is selected to run on CPU
95
Processes And Threads
Data Structures
Priority Spectrum
Scheduling Decisions
Priority Adjustments
Multiprocessor Considerations
96
Priority Adjustments
Priority boosts are applied to threads in
“dynamic” classes (1-15)
No automatic adjustments in “real-time” class (16 or
above)
Can disable with SetThreadPriorityBoost or
SetProcessPriorityBoost
Five types:
I/O completion
Wait completion on events or semaphores
When threads in the foreground process complete a
wait
When GUI threads wake up for windows input
For CPU starvation avoidance
97
Priority Boosting
After an I/O: specified by device driver
IoCompleteRequest( Irp, PriorityBoost )
After a wait on executive event or
semaphore
KeSetEvent( Event, PriorityBoost…)
Boost value of 1 is used for these objects
Server 2003: setting thread loses boost
(lock convoy issue)
Common boost values
(see NTDDK.H)
1: disk, CD-ROM,
parallel,
Video
2: serial, network, named
pipe, mailslot
6: keyboard or mouse
8: sound
After any wait on a dispatcher object by a thread in the foreground
process:
Boost value of 2
Goal: improve responsiveness of interactive apps
GUI threads that wake up to process windowing input (e.g. windows
messages) get a boost of 2
This is added to the current, not base priority
Goal: improve responsiveness of interactive apps
98
Priority Boost And Decay
Behavior of these boosts:
Boost is applied to thread’s base priority
Will not take you above priority 15
After a boost, you get one quantum
Then decays 1 level, runs another quantum
Then decays another level, etc. until back to base priority
quantum
Priority decay
at quantum end
Priority
Base
Priority
Boost
upon
wait
complete
Run
Wait
Round-robin at
base priority
Run
Preempt
(before
quantum
end)
Run
Time
99
CPU Starvation Avoidance
Balance Set Manager system thread looks for
“CPU starved” threads
Wakes up once per second and examines Ready
queues
Looks for threads that have been Ready for 300 clock
ticks
Such threads get a big boost to 15 and
quantum is doubled
12
Wait
7
Run
4
Ready
At quantum end, returns to previous priority (no
gradual decay) and normal quantum
To minimize overhead:
Scans up to 16 Ready threads per priority level
each pass
Boosts up to 10 Ready threads per pass
Like all priority boosts, does not apply in the
real-time range (priority 16 and above)
100
Processes And Threads
Data Structures
Priority Spectrum
Scheduling Decisions
Priority Adjustments
Multiprocessor Considerations
101
Multiprocessor Scheduling
Fully distributed (no “master processor”)
Any processor can interrupt another processor to
schedule a thread
Scheduling database:
Pre-Windows Server 2003: single system-wide list of
ready queues
Windows Server 2003: per-CPU ready queues
Threads can run on any CPU, unless specified
otherwise
Tries to keep threads on same CPU (“soft affinity”)
Setting of which CPUs a thread will run on is called
“hard affinity”
102
Hard Processor Affinity
Threads can run on any CPU, unless affinity specified otherwise
Affinity specified by a bit mask
Each bit corresponds to a CPU number
Can alter with SetThreadAffinityMask or SetProcessAffinityMask or in
the job object
Thread affinity mask must be subset of process affinity mask, which in turn
must be a subset of the active processor mask
“Hard Affinity” can lead to threads’ getting less CPU time than they
normally would
More applicable to large MP systems running dedicated server apps
Note: OS may in some cases need to run your thread CPUs other than
your hard affinity setting
E.g. flushing DPCs, setting system time
103
Hard Processor Affinity
On MP systems, the
process affinity mask
can be examined and
changed via Task
Manager
Can also set an image
affinity mask
Imagecfg tool in
Windows 2000 Server
Resource Kit
Supplement 1
Can also set
“uniprocessor only”:
sets affinity mask to
one processor (rotates
round robin at each
process creation)
104
Soft Processor Affinity
Every thread has an “ideal processor”
System selects ideal processor for first thread in
process (round robin across CPUs)
Next thread gets next CPU relative to the process
seed
Can override with:
SetThreadIdealProcessor (
HANDLE hThread,
DWORD dwIdealProcessor);
// handle to thread
// processor number
Hard affinity changes update ideal processor settings
Used in selecting where a thread runs next (see next
slides)
105
Choosing A CPU For A
Ready Thread (Windows 2000)
When a thread becomes ready to run (e.g. its wait completes, or it is just
beginning execution), need to choose a processor for it to run on
First, it sees if any processors are idle that are in the thread’s hard affinity
mask:
If its “ideal processor” is idle, it runs there
If the previous processor it ran on is idle, it runs there
Else if the current processor is idle, it runs there
Else it picks the highest numbered idle processor in the thread’s affinity mask
If no processors are idle:
If the ideal processor is in the thread’s affinity mask, it selects that
Else if the the last processor is in the thread’s affinity mask, it selects that
Else it picks the highest numbered processor in the thread’s affinity mask
Finally, it compares the priority of the new thread with the priority of the thread
running on the processor it selected (if any) to determine whether or not to
perform a preemption
106
Selecting A Thread To Run On
A CPU (Windows 2000)
System needs to choose a thread to run on a specific CPU at:
At quantum end
When a thread enters a wait state
When a thread removes its current processor from its hard affinity mask
When a thread exits
Win2000: With dispatcher lock held, starting with the first thread in the highest
priority non-empty ready queue, it scans the queue for the first thread that has
the current processor in its hard affinity mask and:
Ran last on the current processor, or
Has its ideal processor equal to the current processor, or
Has been in its Ready queue for more than 2 quantums, or
Has a priority >=24
If it cannot find such a candidate, it selects the highest priority thread that can
run on the current CPU (whose hard affinity includes the current CPU)
Note: this may mean going to a lower priority ready queue to find a candidate
107
Server 2003 Enhancements
Idle processor selection further refined to:
If a NUMA system: if there are idle CPUs in the node
containing the thread’s ideal processor, reduce to
that set
If a hyperthreaded system: if one of the idle processors
is a physical processor with all logical processors idle,
reduce to that set
Then try to eliminate idle CPUs that are sleeping
If thread ran last on a member of the set, pick
that CPU
Else pick lowest numbered CPU in remaining set
108
Server 2003 Enhancements
Threads always go into the ready queue of their ideal
processor
Instead of locking the dispatcher database to look for a
candidate to run, per-CPU ready queue is checked first
(locks PRCB spinlock)
If a thread has been selected to run on the CPU, does the context
swap
Else begins scan of other CPU’s ready queues looking for a thread
to run
This scan is done OUTSIDE the dispatcher lock
Dispatcher lock still acquired to wait or unwait a thread
and/or change state of a dispatcher object
Bottom line: dispatcher lock is now held for a MUCH
shorter time
109
Outline
1.
2.
3.
4.
System Architecture
Processes and Thread Internals
Memory Management Internals
Security Internals
110
Memory Management
Core Memory Management Services
Working Set Management
Unassigned Memory
Page Files
111
Memory Manager Features
Demand paged virtual memory
Pages are read in on demand and written out when
necessary (to make room for other memory needs)
Provides flat virtual address space
32-bit: 4 GB, 64-bit: 16 Exabytes (theoretical)
Shared memory with copy on write
Mapped files (fundamental primitive)
Provides basic support for file system
cache manager
112
Virtual Address Space Allocation
Virtual address space is sparse
Address spaces contain reserved, committed, and
unused regions
Unit of protection and usage is one page
Page size can vary
On x86, default page size for applications is 4 KB
On Itanium, default page size is 8 KB
Large pages
If a “large memory system”, large (4 MB on x86; 16MB
on Itanium) pages are used to map the OS and HAL
Disables kernel write protection
New in 2003: applications can VirtualAlloc large pages
with MEM_LARGE_PAGE flag
113
Shared Memory
Like most modern OSs,
Windows provides a way for
processes to share memory
High speed IPC (used by LPC,
which is used by RPC)
Threads share address space, but
applications may be divided into
multiple processes for stability
reasons
Processes can also create shared
memory sections
Called page file backed file
mapping objects
Full Windows security
It does this automatically for
shareable pages
E.g., code pages in an .EXE
114
Mapped Files
A way to take part of a file and map it to a range of
virtual addresses
(Address space is 2 GB, but files can be much larger)
Called “file mapping objects” in Windows API
Bytes in the file then correspond one-for-one with
bytes in the region of virtual address space
Read from the “memory” fetches data from the file
Pages are kept in physical memory as needed
Changes to the memory are eventually written back to the file
(can request explicit flush)
Initial mapped files in a process include
The executable image (EXE)
One or more Dynamically Linked Libraries (DLLs)
Processes can map additional files as desired (data
files or additional DLLs)
115
Section Objects
Mapped files
Called “file mapping objects” in Windows API
Files may be mapped into v.a.s.
// first, do EITHER ...
hMapObj = CreateFileMapping (hFile, security, protection,sizeHigh, sizeLow,
mapname);
// … OR …
hMapObj = OpenFileMapping (accessMode, inheritflag, mapname);
// … then, pass the resulting handle to a mapping object (section) to ...
lpvoid = MapViewOfFile (hMapObj, accessMode,
offsetHigh, offsetLow, cbMap);
Bytes in the file then correspond one-for-one with bytes in the region
of virtual address space
Read from the “memory” fetches data from the file
Changes to the memory are written back to the file
Pages are kept in physical memory as needed
If desired, can map to only a part of the file at a time
116
Copy-On-Write Pages
Used for sharing between process
address spaces
Pages are originally set up as shared,
read-only, faulted from the common file
Access violation on write attempt alerts pager
Pager makes a copy of the page and allocates it privately to
the process doing the write, backed to the paging file
So, only need unique copies for the pages in the
shared region that are actually written (example of
“lazy evaluation”)
Original values of data are still shared
E.g., writeable data initialized with C initializers
117
How Copy-On-Write Works
Before
Orig. Data
Page 1
Orig. Data
Page 2
Page 3
Process
Address
Space
Physical
memory
Process
Address
Space
118
How Copy-On-Write Works
After
Orig. Data
Page 1
Mod’d. Data
Page 2
Page 3
Process
Address
Space
Copy of page 2
Physical
memory
Process
Address
Space
119
Physical Memory
32-bit Windows supports systems with 64GB
physical memory
But, the virtual address space is still 4 GB, so
how can this memory be used?
1.
2.
3.
Although each process can only address 2 (or 3) GB,
many may be in memory at the same time
(e.g., 5 * 2 GB processes = 10 GB)
New Address Windowing Extensions allow Win32
processes to use more than 2 GB of memory
Files in system cache remain in physical memory
Although file cache doesn’t know it, memory manager keeps
unmapped data in physical memory
120
Address Windowing Extensions
AWE functions allow
Win32 processes to
allocate large amounts of
physical memory and then
map “windows” into that
memory
Applications: Database
servers can cache large
databases
Up to programmer to
control
Like DOS enhanced
memory (EMS) with more
bits…
64-bit Windows removes
this need
121
File System Virtual Block Cache
Virtual block cache (not logical block)
Managed in terms of blocks within files, not blocks within partition
Caching occurs above file system, not below
Advantages
Permits access to cached data without translation of file to sector
Allows maintaining coherency between normal file I/O and memory
mapped file I/O
Intelligent read-ahead
Predicts next read location based on history of last 2 reads
Shared by all file systems
Local or remote
Includes file data and file system metadata (e.g. MFT, file
attributes, …)
Write back cache
Data held in memory and written later by mapped page writer
system thread
122
Cache Virtual Structure
Virtual size: 64-960mb
In system virtual address space, so
visible to all processes
Divided into 256kb “views”
Cache slots are mapped to 256kb
segments of cached files
Uses same services as Win32 memory
mapped files
But remember, this is virtual, not
physical
Relies on memory manager to read and
write actual file data via normal paging
Virtual size of the cache is not related
to amount of cached file data
Memory manager will still “cache”
unmapped file data on the standby list
So larger cache size just reduces # of
mapping/unmappings
123
Controlling The Cache
Per-file basis
File open flags affect how cache influences the memory
manager on what data to keep in RAM
If nothing specified, automatic asynchronous read-ahead
Predicts next read location based on history of last 2 reads
Touches the pages to fault them in
FILE_FLAG_SEQUENTIAL increases size of read-ahead
And, causes cache to re-use same cache slot (instead of filling
cache)
Also puts unmapped pages at end of standby list
FILE_FLAG_RANDOM_ACCESS disables read ahead
Can disable file cache completely on a per-file open basis
CreateFile with FILE_FLAG_NO_BUFFERING
Requires reads/writes to be done on sector boundaries
Buffers must be aligned in memory on sector boundaries
124
Memory Management
Core Memory Management Services
Working Set Management
Unassigned Memory
Page Files
125
Working Set
Working set: All the physical pages “owned”
by a process
Essentially, all the pages the process can reference without
incurring a page fault
Working set limit: The maximum pages the process can
own
When limit is reached, a page must be released for every page
that’s brought in (“working set replacement”)
Default upper limit on size for each process
System-wide maximum calculated and stored in
MmMaximumWorkingSetSize
Approximately RAM minus 512 pages (2 MB on x86) minus min size of
system working set (1.5 MB on x86)
Interesting to view (gives you an idea how much memory you’ve “lost”
to the OS)
True upper limit: 2 GB minus 64 MB
126
Birth Of A Working Set
Pages are brought into memory as a result of page faults
Prior to Windows XP, no pre-fetching at image startup
But readahead is performed after a fault
See MmCodeClusterSize, MmDataClusterSize, MmReadClusterSize
Can see with Filemon
If the page is not in memory, the appropriate block in the associated
file is read in
Physical page is allocated
Block is read into the physical page
Page table entry is filled in
Exception is dismissed
Processor re-executes the instruction that caused the page fault (and this
time, it succeeds)
The page has now been “faulted into” the process “working set”
127
Working Set List
newer pages
older pages
PerfMon
Process “WorkingSet”
A process always starts with an empty
working set
It then incurs page faults when referencing a page that
isn’t in its working set
Many page faults may be resolved from memory (to be
described later)
128
Working Set Replacement
PerfMon
Process “WorkingSet”
When working set max reached (or working set trim
occurs), must give up pages to make room for new pages
Local page replacement policy (most Unix systems
implement global replacement)
To standby
or
modified
page list
E.g. a single process cannot take over all of physical memory
Page replacement algorithm is least recently accessed
(pages are aged)
On UP systems only in Windows 2000 – done on all systems in
Windows XP/2003
New VirtualAlloc flag in XP/2003: MEM_WRITE_WATCH
129
Working Set System Services
Min/Max set on a per-process basis
Can view with !process in Kernel Debugger
Can adjust with SetProcessWorkingSetSize –
but has little effect
Limits are “soft” (many processes larger than max)
Memory Manager decides when to grow/shink
working sets
New function in 2003 Server:
SetProcessWorkingSetSizeEx
Supports hard working set limits
Can also self-initiate working set trimming
Pass -1, -1 as min/max working set size
(minimizing a window does this for you)
130
Locking Pages
Pages may be locked into the process working set
Pages are guaranteed in physical memory (“resident”) when any thread in
process is executing
Windows:
status = VirtualLock(baseAddress, size);
status = VirtualUnlock(baseAddress, size);
Number of lockable pages is a fraction of the maximum working set
size
Changed by SetProcessWorkingSetSize
Pages can be locked into physical memory (by kernel mode code
only)
Pages are then immune from “outswapping” as well as paging
MmProbeAndLockPages
131
Process Memory Information
Task ManagerProcesses tab
1
“Mem Usage” = physical
memory used by process
(working set size, not
working set limit)
Ø Note: Shared pages are
counted in each process
l2 “VM Size” = private (not
shared) committed virtual
space in processes ==
potential pagefile usage
l3 “Mem Usage” in status bar
is not total of “Mem
Usage” column (see later
slide)
2
l1
3
Screen snapshot from:
Task Manager | Processes
tab
132
Process Memory Information
PerfMon –
Process Object
“Virtual Bytes” = committed
+ reserved virtual space,
including shared pages
“Working Set” = working
set size (not limit)
(physical)
“Private Bytes” = private
virtual space (same as
“VM Size” from Task
Manager Processes list)
Also: In Threads object,
look for threads in
Transition state - evidence
of swapping (usually
caused by severe memory
pressure)
Screen snapshot from: Performance Monitor
counters from Process object
133
Viewing The Working Set
Working set size counts shared pages in
each working set
Vadump (Resource Kit) can dump the
breakdown of private, shareable, and
shared pages
C:\> Vadump –o –p 3968
Module Working Set Contributions in pages
Total Private Shareable Shared Module
14
3
11
0 NOTEPAD.EXE
46
3
0
43 ntdll.dll
36
1
0
35 kernel32.dll
7
2
0
5 comdlg32.dll
17
2
0
15 SHLWAPI.dll
44
4
0
40 msvcrt.dll
134
Prefetch Mechanism
File activity is traced and used to prefetch data
the next time
First 10 seconds are monitored
Pages referenced & directories opened
Prefetch “trace file” stored in \Window\Prefetch
Name of .EXE-<hash of full path>.pf
Also applies to system boot
First 2 minutes of boot process logged
Stops 30 seconds after the user starts the shell or 60 seconds
after all services
are started
Boot trace file: NTOSBOOT-B00DFAAD.pf
135
Prefetch Mechanism
When application run again, system
automatically
Reads in directories referenced
Reads in code and file data
Reads are asynchronous
But waits for all prefetch to complete
In addition, every 3 days, system automatically
defrags files involved in each application startup
Bottom line: Reduces disk head seeks
This was seen to be the major factor in slow
application/system startup
136
Memory Management
Core Memory Management Services
Working Set Management
Unassigned Memory
Page Files
137
Managing Physical Memory
System keeps unassigned physical pages on
one of several lists
Free page list
Modified page list
Standby page list
Zero page list
Bad page list – pages that failed memory test at
system startup
Lists are implemented by entries in the “PFN
database”
Maintained as FIFO lists or queues
138
Paging Dynamics
demand zero
page faults
page read from
disk or kernel
allocations
Standby
Page
List
Working
Sets
“global
valid”
faults
“soft”
page
faults
working set
replacement
modified
page
writer
Free
Page
List
zero
page
thread
Zero
Page
List
Bad
Page
List
Modified
Page
List
Private pages
at process exit
139
Standby And Modified Page Lists
Modified pages go to modified (dirty) list
Avoids writing pages back to disk too soon
Unmodified pages go to standby
(clean) list
They form a system-wide cache of
“pages likely to be needed again”
Pages can be faulted back into a process
from the standby and modified page list
These are counted as page faults, but not
page reads
140
Modified Page Writer
Moves pages from modified to standby list, and
copies their contents to disk
I.e., this is what writes the paging file and updates
mapped files (including the file system cache)
Two system threads
One for mapped files, one for the paging file
Triggered when
Memory is over-committed (too few free pages)
Or modified page threshold is reached
Does not flush entire modified page list
141
Free And Zero Page Lists
Free Page List
Used for page reads
Private modified pages go here on process exit
Pages contain junk in them (e.g., not zeroed)
On most busy systems, this is empty
Zero Page List
Used to satisfy demand zero page faults
References to private pages that have not been created
yet
When free page list has 8 or more pages, a priority
zero thread is awoken to zero them
On most busy systems, this is empty too
142
Memory Management Information
Task Manager
Performance tab
6
“Available” = sum of free,
standby, and zero page
lists (physical)
Majority are likely standby
pages
“System Cache” = size of
standby list + size of
system working set (file
cache, paged pool,
pageable OS/driver code
& data)
6
Screen snapshot from:
Task Manager | Performance tab
143
PFN Database
Only way to get actual size of physical memory lists is to
use !memusage in Kernel Debugger
lkd> !memusage
loading PFN database
Zeroed:
0
Free:
3
Standby: 98248
Modified:
563
ModifiedNoWrite:
0
Active/Valid: 93437
Transition:
1
Unknown:
0
TOTAL: 192252
(
0
(
12
(392992
( 2252
(
0
(373748
(
4
(
0
(769008
kb)
kb)
kb)
kb)
kb)
kb)
kb)
kb)
kb)
Screen snapshot from:kernel debugger
144
!memusage command
Memory Management
Core Memory Management Services
Working Set Management
Unassigned Memory
Page Files
145
Page Files
What gets sent to the paging file?
Not code – only modified data (code can be re-read from image
file anytime)
When do pages get paged out?
Only when necessary
Page file space is only reserved at the time pages are written
out
Once a page is written to the paging file, the space is occupied
until the memory is deleted (e.g., at process exit), even if the
page is read back from disk
Can run with no paging file
Windows NT4/Windows 2000: Zero pagefile size actually
created a 20MB temporary page file (\temppf.sys)
146
Sizing The Page File
Given understanding of page file usage, how big should the total
paging file space be?
(Windows supports multiple paging files)
Size should depend on total private virtual memory used by
applications and drivers
Therefore, not related to RAM size (except for taking a full memory
dump)
Worst case: Windows has to page all private data out to make room
for code pages
To handle, minimum size should be the maximum of VM usage
(“Commit Charge Peak”)
Hard disk space is cheap, so why not double this
Normally, make maximum size same as minimum
But, max size could be much larger if there will be infrequent demands
for large amounts of page file space
Performance problem: Page file extension will likely be very fragmented
Extension is deleted on reboot, thus returning to a contiguous page file
147
Memory Management Information
Task Manager
Performance tab
3
4
Total committed private virtual
memory (total of “VM Size” in
process tab + Kernel Memory
Paged)
Not all of this space has actually
been used in the paging files; it is
“how much would be used if it was
all paged out”
“Commit charge limit” = sum of
physical memory available for
processes + current total size of
paging file(s)
Does not reflect true maximum
page file sizes (expansion)
When “total” reaches “limit”, further
VirtualAlloc attempts by any
process will fail
3
3
4
3
Screen snapshot from:
Task Manager | Performance tab
4
148
When Page Files Are Full
When page file space runs low
1.
“System running low on virtual memory”
First time: Before pagefile expansion
Second time: When committed bytes reaching commit limit
2.
“System out of virtual memory”
Page files are full
Look for who is consuming pagefile space
Process memory leak: Check Task Manager, Processes tab, VM
Size column
Or Perfmon “private bytes”, same counter
Paged pool leak: Check paged pool size
Run poolmon to see what object(s) are filling pool
Could be a result of processes not closing handles – check process
“handle count” in Task Manager
149
Outline
1.
2.
3.
4.
System Architecture
Processes and Thread Internals
Memory Management Internals
Security Internals
150
Security
Introduction
Components
Logon
Protecting Objects
Privileges
151
Windows Security Support
Microsoft’s goal was to achieve C2, which requires:
Secure Logon: NT provides this by requiring user name and
password
Discretionary Access Control: fine grained protection over
resources by user/group
Security Auditing: ability to save a trail of important security
events, such as access or attempted access of a resource
Object reuse protection: must initialize physical resources that are
reused e.g. memory, files
Certifications achieved:
Windows NT 3.5 (workstation and server) with SP3 earned C2 in
July 1995
In March 1999 Windows NT 4 with SP3 earned e3 rating from
UK’s Information Technology Security (ITSEC) – equivalent to C2
In November 1999 NT4 with SP6a earned C2 in stand-alone and
networked environments
152
Windows Security Support
Windows meets two B-level requirements:
Trusted Path Functionality: way to prevent trojan
horses with “secure attention sequence” (SAS) - CtrlAlt-Del
Trusted Facility Management: ability to assign different
roles to different accounts
Windows does this through account privileges (TBD later)
153
Common Criteria
New standard, called Common Criteria (CC), is the new
standard for software and OS ratings
Consortium of US, UK, Germany, France, Canada, and the
Netherlands in 1996
Became ISO standard 15408 in 1999
For more information, see http://www.commoncriteriaportal.org/
and http://csrc.nist.gov/cc
CC is more flexible than TCSEC trust ratings
Protection Profile collects security requirements
Security Target (ST) are security requirements that can be made
by reference to a PP
Windows 2000 was certified as compliant with the CC
Controlled Access Protection Profile (CAPP) in October
2002
Windows XP and Server 2003 are undergoing evaluation
154
Security
Introduction
Components
Logon
Protecting Objects
Privileges
155
Security Components
WinLogon
MSGINA
LSASS
LSA
Policy
User
Mode
NetLogon
Active
Directory
LSA
Server
SAM
Server
Event
Logger
Active
Directory
MSVC1_0.dl
Kerberos.dll
SAM
System
Threads
Kernel
Mode
System Service Dispatcher
(kernel mode callable interfaces)
Local
Procedure
Call
Configuration Mgr
(registry)
Processes
&
Threads
Virtual
Memory
Security
Reference
Monitor
Power
Mgr.
Object
Mgr.
File
System
Cache
Device &
File Sys.
Drivers
Plug and
Play Mgr.
I/O Mgr
Windows
USER,
GDI
Graphics
Drivers
Kernel
Hardware Abstraction Layer (HAL)
NtosKrnl.Exe
hardware interfaces (buses, I/O devices, interrupts,
interval timers, DMA, memory cache control, etc., etc.)
156
Original copyright by Microsoft Corporation. Used by
Security Reference Monitor
Performs object access checks,
manipulates privileges, and generates
audit messages
Group of functions in Ntoskrnl.exe
Some documented in DDK
Exposed to user mode by Windows API calls
Demo: Open Ntoskrnl.exe with
Dependency Walker and view functions
starting with “Se”
157
Demo: Viewing Security
Processes
Run Process Explorer
Collapse Explorer process tree and focus
on upper half (system processes)
158
Security Components
Local Security Authority
User-mode process (\Windows\System32\Lsass.exe)
that implements policies (e.g. password, logon),
authentication, and sending audit records to the
security event log
LSASS policy database: registry key
HKLM\SECURITY
WinLogon
MSGINA
LSA
Policy
LSASS
NetLogon
Active
Directory
LSA
Server
SAM
Server
Event
Logger
Active
Directory
MSVC1_0.dl
Kerberos.dll
SAM
159
LSASS Components
SAM Service
A set of subroutines (\Windows\System32\Samsrv.dll ) responsible
for managing the database that contains the usernames and
groups defined on the local machine
SAM database: A database that contains the defined local users
and groups, along with their passwords and other attributes. This
database is stored in the registry under HKLM\SAM.
Password crackers attack the local user account password hashes
stored in the SAM
Demo: look at SAM service
Open Lsass.exe process properties – click on services tab
Click Find DLL – search for Samsrv.dll
160
Demo: Looking at the SAM
Look at HKLM\SAM permissions
SAM security allows only the local system account to access it
Run Regedit
Look at HKLM\SAM - nothing there?
Check permissions (right click->Permissions)
Close Regedit
Look in HKLM\SAM
Running Regedit in the local system account allows you to view the SAM:
psexec –s –i –d c:\windows\regedit.exe
or
sc create cmdassystem type= own type= interact
binpath= "cmd /c start cmd /k“
sc start cmdassystem
View local usernames under
HKLM\SAM\SAM\Domains\Account\Users\Names
Passwords are under Users key above Names
161
LSASS Components
Active Directory
A directory service that contains a database that stores
information about objects in a domain
A domain is a collection of computers and their associated
security groups that are managed as a single entity
The Active Directory server, implemented as a service,
\Windows\System32\Ntdsa.dll, that runs in the Lsass process
Authentication packages
DLLs that run in the context of the Lsass process and that
implement Windows authentication policy:
LanMan: \Windows\System32\Msv1_0.dll
Kerberos: \Windows\System32\Kerberos.dll
Negotiate: uses LanMan or Kerberos, depending on which is most
appropriate
162
LSASS Components
Net Logon service (Netlogon)
A Windows service (\Windows\System32\Netlogon.dll) that runs
inside Lsass and responds to Microsoft LAN Manager 2 Windows
NT (pre-Windows 2000) network logon requests
Authentication is handled as local logons are, by sending them to
Lsass for verification
Netlogon also has a locator service built into it for locating
domain controllers
WinLogon
MSGINA
LSA
Policy
LSASS
NetLogon
Active
Directory
LSA
Server
SAM
Server
Event
Logger
Active
Directory
MSVC1_0.dl
Kerberos.dll
SAM
163
Winlogon
Logon process (Winlogon)
A user-mode process running \Windows\System32\Winlogon.exe
that is responsible for responding to the SAS and for managing
interactive logon sessions
Graphical Identification and Authentication (GINA)
A user-mode DLL that runs in the Winlogon process and that
Winlogon uses to obtain a user's name and password or smart
card PIN
Default is \Windows\System32\Msgina.dll
WinLogon
MSGINA
LSA
Policy
LSASS
NetLogon
Active
Directory
LSA
Server
SAM
Server
Event
Logger
Active
Directory
MSVC1_0.dl
Kerberos.dll
SAM
164
Security
Introduction
Components
Logon
Protecting Objects
Privileges
165
What Makes Logon Secure?
Before anyone logs on, the visible desktop is Winlogon’s
Winlogon registers CTRL+ALT+DEL, the Secure
Attention Sequence (SAS), as a standard hotkey
sequence
SAS takes you to the Winlogon desktop
No application can deregister it because only the thread
that registers a hotkey can deregister it
When Windows’ keyboard input processing code sees
SAS it disables keyboard hooks so that no one can
intercept it
166
Logon
After getting security identification (account name,
password), the GINA sends it to the Local Security
Authority Sub System (LSASS)
LSASS calls an authentication package to verify the logon
If the logon is local or to a legacy domain, MSV1_0 is the
authenticator. User name and password are encrypted and
compared against the Security Accounts Manager (SAM)
database
Cached domain logons are also handled by MSV1_0
If the logon is to a AD domain the authenticator is Kerberos, which
communicates with the AD service on a domain controller
If there is a match, the SIDs of the corresponding user
account and its groups are retrieved
Finally, LSASS retrieves account privileges from the
Security database or from AD
167
Logon
LSASS creates a token for your logon session
and Winlogon attaches it to the first process of
your session
Tokens are created with the NtCreateToken API
Every process gets a copy of its parent’s token
SIDs and privileges cannot be added to a token
A logon session is active as long as there is at
least one token associated with the session
Lab
Run “LogonSessions –p” (from Sysinternals) to view
the active logon sessions on your system
168
Security
Introduction
Components
Logon
Protecting Objects
Privileges
169
The Access Validation
Algorithm
Access validation is a security equation that
takes three inputs:
Desired Access
Process Token
Or Thread’s token if the thread is “impersonating”
The object’s Security Descriptor, which contains a
Discretionary Access Control List (DACL)
The output is access allowed or access denied
170
Tokens
The main components of a token are:
SID of the user
SIDs of groups the user account belongs to
Privileges assigned to the user (described in
next section)
Account SID
Group 1 SID
Group n SID
Privilege 1
Privilege 1
171
Labs: Viewing Access Tokens
Process Explorer: double click on a
process and go to Security tab
Examine groups list
Use RUNAS to create a CMD
process running under another
account (e.g. your domain account)
Examine groups list
Viewing tokens with the Kernel
Debugger
Run !process 0 0 to find a process
Run !process <PID> 1 to dump the
process
Get the token address and type
!token –n <token address>
Type dt _token <token address> to
see all fields defined in a token
172
Impersonation
Lets an application adopt the security profile another user
Used by server applications
Impersonation is implemented at the thread level
The process token is the “primary token” and is always accessible
Each thread can be impersonating a different client
Can impersonate with a number of client/server
networking APIs – named pipes, RPC, DCOM
Client
Process
Server
Process
Object
Server
Threads
173
Process And Thread Security Structures
1 ACL
5
Process
Access Token
User’s SID
ACL 3
Group SIDs
Privileges
Owner SID
Primary Group SID
Default ACL
2 ACL
Thread 1
Thread 2
6
4 ACL
Thread 3
Access Token
Access Token
User’s SID
User’s SID
Group SIDs
Group SIDs
Privileges
Privileges
Owner SID
Owner SID
Primary Group SID
Primary Group SID
Default ACL
Default ACL
Thread tokens (where present) completely supersede
process token (basis for “security impersonation”)
174
SIDs
Windows uses Security Identifers (SIDs) to identify security
principles:
Users, Groups of users, Computers, Domains
SIDs consist of:
A revision level e.g. 1
An identifier-authority value e.g. 5 (SECURITY_NT_AUTHORITY)
One or more subauthority values
Who assigns SIDs?
Setup assigns a computer a SID
Dcpromo assigns a domain a SID
Users and groups on the local machine are assigned SIDs that are
rooted with the computer SID, with a Relative Identifier (RID) at the end
RIDs start at 1000 (built-in account RIDs are pre-defined)
Some local users and groups have pre-defined SIDs (eg. World = S1-1-0)
175
Demo: SIDs
Example SIDs
Domain SID:
S-1-5-21-34125455-5125555-1251255
First account:
S-1-5-21-34125455-5125555-1251255-1000
Admin account: S-1-5-21-34125455-5125555-1251255-500
System account: S-1-5-18
Demo: run PsGetSid (Sysinternals) to view the
SID of your username and of the computer
176
Security Descriptors
Descriptors are associated with objects: e.g.
files, Registry keys, application-defined
Descriptors are variable length
Owner SID
Defined for POSIX
Primary Group
DACL
pointer
SACL
pointer
DACL
SACL
177
DACLs
DACLs consist of zero or more Access Control
Entries
A security descriptor with no DACL allows all access
A security descriptor with an empty (0-entry) DACL
denies everybody all access
An ACE is either “allow” or “deny”
ACE Type
SID
Access
Mask
Read, Write,
Delete, ...
178
Demo: Viewing a Security
Descriptor Structure
Get the address of an EPROCESS block with
!process
Type !object on that address
Type “dt _OBJECT_HEADER” on the object
header address to get the security descriptor
address
Type !sd <security descriptor address> & -8 1
179
Access Check
The Security Reference Monitor (SRM)
implements an explicit allow model
ACEs in the DACL are examined in order
Does the ACE have a SID matching a SID in the
token?
If so, do any of the access bits match any remaining
desired accesses?
If so, what type of ACE is it?
Deny: return ACCESS_DENIED
Allow: grant the specified accesses and if there are no
remaining accesses to grant, return ACCESS_ALLOWED
If we get to the end of the DACL and there are
remaining desired accesses, return
ACCESS_DENIED
180
Access Check Example
Token
Mark
Access Request
Authors
Write
Developers
Privilege 1
Privilege n
DACL
Deny
Object
Authors
Read
Allow
Mark
All
181
ACE Ordering
The order of ACEs is important!
Low-level security APIs allow the creation of DACLs with ACEs in
any order
All security editor interfaces and higher-level APIs order ACEs
with denies before allows
Example:
Token
Mark
Authors
Developers
DACL
Privilege 1
DACL
Deny
Privilege n
Allow
Authors
Mark
Read
Allow
Mark
All
Access Request
Read
All
Deny
Authors
Read
182
Demo: ACE ordering
Go to a NTFS file
Add an Everyone deny-all to a file
Will the Administrator be able to look at the file?
Verify your answer by checking Effective Permissions
183
Access Special Cases
An object’s owner can always open an
object with WRITE_DACL and
READ_CONTROL permission
An account with “take ownership” privilege
can claim ownership of any object
An account with backup privilege can open
any file for reading
An account with restore privilege can open
any file for write access
184
Controllable Inheritance
In NT 4.0, objects only inherit ACEs from a parent
container (e.g. Registry key or directory) when
they are created
No distinction made between inherited and noninherited ACES
No prevention of inheritance
In Windows 2000 and higher inheritance is
controllable
SetNamedSecurityInfoEx and SetSecurityInfoEx
Will apply new inheritable ACEs to all child objects
(subkeys, files)
Directly applied ACEs take precedence over inherited
ACEs
185
Security
Introduction
Components
Logon
Protecting Objects
Privileges
186
Privileges
Specify which system actions a
process (or thread) can perform
Privileges are associated with groups
and user accounts
There are sets of pre-defined
privileges associated with built-in
groups (e.g. System, Administrators)
Examples include:
Backup/Restore
Shutdown
Debug
Take ownership
Privileges are disabled by default
and must be programmatically turned
on with a system call
187
Demo: Privileges
Run Secpol.msc and examine full list
Click on Local Policies->User Rights assignment
Process Explorer: double click on a process, go
to security tab, and examine privileges list
Watch changes to privilege list:
1.
2.
3.
4.
5.
Run Process Explorer – put in paused mode
Open Control Panel applet to change system time
Go back to Process Explorer & press F5
Examine privilege list in new process that was created
Notice in privilege list that system time privilege is
enabled
188
Powerful Privileges
There are several privileges that gives an account that has them full
control of a computer:
Debug: can open any process, including System processes to
Inject code
Modify code
Read sensitive data
Take Ownership: can access any object on the system
Replace system files
Change security
Restore: can replace any file
Load Driver
Drivers bypass all security
Create Token
Can spoof any user (locally)
Requires use of undocumented NT API
Trusted Computer Base (Act as Part of Operating System)
Can create a new logon session with arbitrary SIDs in the token
189
Demo: Powerful Privileges
View the use of the backup privilege:
Make a directory
Create a file in the directory
Use the security editor to remove inherited security and give Everyone full access
to the file
Remove all access to the directory (do not propagate)
Start a command-prompt and do a “dir” of the directory
Run \Sysint\Solomon\PView and enable the Backup privilege for the command
prompt
Do another “dir” and note the different behavior
View the use of the Bypass-Traverse Checking privilege (internally called
“Change Notify”)
From the same command prompt run notepad to open the file (give the full path)
in the inaccessible directory
Extra credit: disable Bypass-Traverse Checking so that you get access denied
trying to open the file (hint: requires use of secpol.msc and then RUNAS)
190
The End!
Thanks for coming!
For more information:
Windows Internals, 4th edition
5th edition will be updated for Vista (will ship
when Vista ships )
We’ll stay for questions (we’re not here the
rest of the week )
Or, email us (see slide 1 for addresses)
191
© 2005 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
192