Windows Crash Dump Analysis

Download Report

Transcript Windows Crash Dump Analysis

Windows Crash Dump
Analysis
Daniel Pearson
David Solomon Expert Seminars
Daniel Pearson
• Started working with Windows NT 3.51
• Three years at Digital Equipment Corporation
• Supporting Intel and Alpha systems running Windows NT
• Seven years at Microsoft
• Senior Escalation Lead in Windows base team
• Worked in the Mobile Internet sustained engineering team
• Instructor for David Solomon, co-author of the Windows Internals
book series
Agenda
• Causes of Windows crashes
• What happens during a crash
• Configuring Windows crash options
• Writing a crash dump
• Automated and manual crash analysis
• Using Driver Verifier to detect errors
• Attaching a kernel debugger
* Portions of this session are based on material developed by
Mark Russinovich and David Solomon
Why Analyze a Crash?
• When Windows Error Reporting has no solution or when it blames “a
device driver”
Why Does Windows Crash
• A device driver or part of the operating system incurs an
unhandled exception
• A device driver or part of the operating system explicitly crashes the
system due to an unrecoverable condition
• A page fault occurs at an interrupt request level of dispatch or higher
• A hardware condition such as a nonmaskable interrupt or faulty
memory, disk, etc.
Causes of Windows Crashes
Percentage of Top 500 Crashes for Windows
Vista with Service Pack 11
6%
11%
13%
Third-party device drivers
Microsoft code
Crash too corrupt for analysis
Hardware errors
70%
1. Microsoft Corporation. 2008. Online Crash Analysis research performed in
September of 2008.
What Happens During a Crash
• When a condition is detected that requires a crash, the kernel API
KeBugCheckEx is called
• KeBugCheckEx accepts a bugcheck code that indicates the reason for
the crash and four parameters that supply additional information
KeBugCheckEx(
IN ULONG BugCheckCode,
IN ULONG_PTR BugCheckParameter1,
IN ULONG_PTR BugCheckParameter2,
IN ULONG_PTR BugCheckParameter3,
IN ULONG_PTR BugCheckParameter4
);
Inside of KeBugCheckEx
• KeBugCheckEx performs several functions
• Disables interrupts
• Notifies other CPUs to halt execution
• Notifies registered drivers
• Writes crash dump information to disk*
• Restarts the system*
* Only if the system is configured to do so
The Windows Stop Screen
1
2
3
4
5
Bugcheck Codes
• Shared by many components and drivers
• The Windows Driver Kit currently documents over 250 unique
bugcheck codes
Memory Dump Types
• Small memory dump
• Records the smallest set of useful information
• Kernel memory dump*
• Records only kernel memory, which speeds up the process of writing
a crash dump
• Complete memory dump*
• Records the entire contents of system memory
* If either a Kernel or Complete memory dump is selected, the system will also
create a minidump and store it in the %SystemRoot%\minidump directory
Configuring Debugging
Information Options
Writing a Crash Dump
• Crash dump information is written to the paging file on the boot volume
or to a dedicated dump file if specified
• Too risky to create a new file on the system
• How does the system know its safe?
• The boot volume paging file’s on-disk mapping is obtained when the
system starts
• Critical crash components are checksummed
• When a crash occurs, if the checksum doesn’t match, a memory
dump is not written
Why Would You Not Get a Dump?
• Problems with page file configuration
• The paging file on the boot volume is too small or one does not exist
• The system crashed before the paging file was initialized
• Critical crash components are corrupted
• Windows didn’t crash!
• The system spontaneously restarted
• The system is hung
Analyzing a Crash Dump
• The Microsoft kernel debuggers can be used to open and analyze a
crash dump
• kd, a command line tool and WinDbg, a GUI tool
• Available as part of the Debugging Tools for Windows
http://www.microsoft.com/whdc/devtools/debugging/default.mspx
• Configure the debugger to point to symbols
srv*C:\SYMBOLS*http://msdl.microsoft.com/download/symbols
Automated Analysis
• When you open a crash dump with WinDbg or kd, the debugger
performs basic crash analysis*
• Displays stop code and parameter information
• Takes a guess at the offending driver
• The analysis is the result of the automated execution of the !analyze
debugger command
• !analyze uses the bugcheck parameters and a set of heuristics to
determine what component is the likely cause of the crash
* Set the environment variable DBGENG_NO_BUGCHECK_ANALYSIS=1
to disable
Automated Analysis
Using !analyze
Memory Corruption
• Occurs when a driver goes past the end, called an overrun, or the
beginning, an underrun, of it’s memory allocation
• Usually detected when overwritten data is referenced by the kernel or
another driver
• It’s possible there’s a long delay between corruption and detection
Viewing the Effects of
Memory Corruption
Crash Transformation
• For crashes that are difficult to analyze
• The “victim” crashed the system, not the culprit
• The debugger points to ntoskrnl.exe, win32k.sys or other
Windows components
• You get many different crash dumps all pointing at different causes
• Your goal isn’t to analyze difficult crashes …
It’s to try to make an “unanalyzable” crash into one that can be easily
analyzed
Driver Verifier
• Useful for identifying code defects in drivers
• Performs more thorough checks on the system and device drivers as
well as simulating failures
• Support is built into the operating system
• The requirements for the Windows logo program state that a driver must
not fail while running under Driver Verifier
Using Driver Verifier to Catch
a Buffer Overrun
Manual Analysis
• Sometimes !analyze isn’t enough
• It might not tell you anything useful
• You want to know in more detail what was happening at the time of
the crash
• Several useful commands and techniques
• Verify the time of the crash, .time
• A short uptime value can mean frequent problems
• Check the stack on each CPU, stacks are read from the bottom to
the top
• !cpuinfo will display a list of all the CPUs
• Use ~s to switch to a different CPU for investigation
• k to display the stack
Manual Analysis
• Several useful commands and techniques
• Look at memory usage, !vm
• Make sure memory pools are not depleted or contain errors
• Use !poolused to identify large users
• Check the currently running thread, !thread
• May or may not be related to the crash
• Check pending I/O requests using !irp
• List all processes on the system, !process 0 0
• Make sure you understand what was running at the time
• List loaded drivers, lm t n
• Make sure all the drivers are recognizable and up to date
Manual Analysis of a
Crash Dump
Attaching a Kernel Debugger
• Required for debugging initialization failures and crashes where no
dump file is created
• Requires that the system be started with the debugger enabled to work
• Support for using a null-modem, IEEE 1394 and USB 2.0 cable as well
as virtual machines and over the network in Windows 7
• Limited support for local kernel debugging
Attaching a Kernel Debugger
to a Live System
Hung Systems
• Sometimes systems becomes unresponsive
• Keyboard and mouse frozen
• Two types of hangs
• Instant lockup
• Kernel synchronization deadlock
• Infinite loop at a high IRQL or a very high priority thread
• Slowly grinding to a halt
• Resource depletion
Initiating a Manual Crash
• Using the keyboard
• Requires a PS/2 keyboard + registry key
• HKLM\SYSTEM\CurrentControlSet\Services\i8042prt\
Parameters\CrashOnCtrlScroll
• Using an NMI button
• Requires specialized hardware + registry key
• HKLM\SYSTEM\CurrentControlSet\Control\
CrashControl\NMICrashDump
• Using the debugger
• Break in and execute the .crash command
Debugging a Hung System
Additional Information
• Windows Internals 5th edition
• Debugging Tools for Windows documentation
• Mark Russinovich’s Blog
• http://blogs.technet.com/markrussinovich
• Advanced Windows Debugging Blog
• http://blogs.msdn.com/ntdebugging
• Crash Dump Analysis and Debugging Portal
• http://www.dumpanalysis.org
Additional Information
• David Solomon Expert Seminars offers training on Windows Internals
both as public and private workshops and public webinars via
the Internet
• Currently scheduled up and coming classes
• Public workshop in London, April 12th – April 16th
• Public webinar, April 26th & April 28th
• Public workshop in New York, May 3rd – May 7th
• Public workshop in San Francisco, November 8th – November 12th
• Visit http://www.solsem.com for further course descriptions and up to
date information