Windows crashes (i.e.: stops executions and displays the blue screen) for many different reasons: a reference to a memory address that causes an access violation, an unexpected exception or trap, a faulting kernel mode driver and so on. It's important to
understand that Windows could go on even in presence of serious problems during its execution, isolating the error and trying to recover someway: but the detected problem could be caused by a more deep and serious error that could result in more exceptions
raised during the operating system processing that could finally lead to RAM and/or disk data corruption. This is unacceptable, of course, so Windows adopts
a sort of "fail, fast and safe policy" that consists in stopping the execution, switching the display in a low-resolution VGA mode, painting a blue background, writing memory status and crash informations to a file (the memory dump file) and
displaying a stop code containing a message and some indications to the user. "Blue Screen Of Death", "Bugcheck" and "Stop errors" are different words that represent the same class of unhandled exception that occurs in kernel mode execution and causes the
system to shut down (and possibly reboot). The source of the issue can be anything from a power fluctuation in the system to a damaged component or a software/hardware bug.
In Windows 7 and previous versions, the BSOD looks like the following
whereas in Windows 8 Developer Preview it actually looks like the following (a little less "scary" than the previous one)
It's interesting to observe the distribution of the bugcheck according to their causes: the book "Windows Internals, 5th Edition" provides the following chart displaying the
distribution of error categories for Windows Vista SP1 in September 2008.
Back to top
Blue screen: when the system encounters a hardware problem, data inconsistency, or similar error, it may display a blue screen containing information that can be used to determine the cause of the error. This information includes the STOP
code and whether a crash dump file was created. It may also include a list of loaded drivers and a stack trace.
Crash dump file: you can configure the system to write information to a crash dump file on your hard disk whenever a STOP code is generated. The file (memory.dmp) contains information the debugger can use to analyze the error. This file can
be as big as the physical memory contained in the computer. By default, it's located in the Windows\Minidump folder.
Debugger: a program designed to help detect, locate, and correct errors in another program. It allows the user to step through the execution of the process and its threads, monitoring memory, variables, and other elements of process and thread
Kernel mode: the processor mode in which system services and device drivers run. All interfaces and CPU instructions are available, and all memory is accessible.
Minidump file: a minidump is a smaller version of a complete, or kernel memory dump. Usually Microsoft will want a kernel memory dump. But the debugger will analyze a mini-dump and quite possibly give information needed to resolve. If it's
all you have, then debug it, rather than waiting for the machine to crash again. Open the file in the debugger (see below) just as opening memory.dmp in the demonstration.
STOP code: the error code that identifies the error that stopped the system kernel from continuing to run. It is the first set of hexadecimal values displayed on the blue screen. At a minimum, frontline Admins should be required to note this
code, and the four other codes displayed in parenthesis and any drivers identified on the screen. Often, this is all you really need.
Symbol files: all system applications, drivers, and DLLs are built such that their debugging information resides in separate files known as symbol files. Therefore, the system is smaller and faster, yet it can still be debugged if the symbol
files are available. You don't need the Symbol files to debug: the debugger will automatically access the ones it needs from Microsoft's public site.
Regardless of the reason for a system crash, the function that actually performs the crash is
KeBugCheckEx, documented in the Windows Driver Kit (WDK). This function takes a stop code (also called a bugcheck code) and four parameters that must be interpreted on a per–stop code basis. After KeBugCheckEx masks out all interrupts on
all processors of the system, it switches the display into a low-resolution VGA graphics mode (one implemented by all Windows-supported video cards), paints a blue background and displays the stop code, followed by some text suggesting what
the user can do. Finally, KeBugCheckEx calls any registered device driver bugcheck callbacks (registered by calling the KeRegisterBugCheckCallback
function), allowing drivers an opportunity to stop their devices. It then calls registered reason callbacks (registered by calling the KeRegisterBugCheckReasonCallback
function), which allow drivers to append data to the crash dump or write crash dump information to alternate devices. KeBugCheckEx displays the textual representation of the stop code near the top of the blue screen as well as the numeric stop code and the
four parameters at the bottom of the blue screen: the first line in the Technical Information section lists the stop code and the four additional parameters passed to KeBugCheckEx; a text line near the top of the screen provides the text equivalent of the
stop code’s numeric identifier (sometimes it's even possible that system data structures have been so seriously corrupted that the blue screen isn’t displayed).
Many different types of Stop errors occur: each has its own possible causes and requires a unique troubleshooting process; therefore, the first step in troubleshooting a Stop error is to identify the Stop error. You need the following information about the
Stop error to begin troubleshooting:
The Stop message reports informations about the Stop error and assists the system administrator (who understands how to interpret the information) in isolating and eventually resolving the problem that caused the Stop error. The Stop message
provides a great deal of useful information, including the Stop error number, or bugcheck code. The Stop message uses a full-screen character mode format and consists of several major sections, as shown in
Figure 1, which display the following informations:
Most modern desktop installations of Windows are configured to collect small memory dumps automatically. The file dump generation settings can be configured in the "Advanced" tab of the "System Properties" window, as you can see in the
Table 1 summarizes the different locations that Windows uses to store the memory dump files (also read the Microsoft Knowledge Base article KB254649 "Overview of memory dump file options
for Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7 and Windows Server 2008 R2").
Memory Dump Type
Default Location (variable)
Default Location (typical)
Paging File Requirements
Table 1: memory dump file location and size.
The first step is getting the Debugging Tools you need to analyze the crash dump files produced after a system crash.
Older versione of the Debugging Tools were provided as standalone installers, that you can download from the
Microsoft Windows Hardware Dev Center, paying attention to download and
install the appropriate version according to your system's architecture (32 bit or 64 bit); modern versions are included with the Microsoft Windows SDK and the Windows
If you decide to install the Windows SDK, be sure to check the check box to include the Debugging Tools in the installation process, as you can see in
After installation, the symbols path needs to be set to ensure that there are enough symbols for the debugger to determine what actually occurred and what was loaded. The entire symbol collection offered to the public can be downloaded and placed on a local
drive, or an Internet location can be specified to pull the symbols on demand. I suggest you to pull them from the Internet: the correct version of the symbols will be
downloaded on demand and will not become outdated by installation of hotfixes and service packs. The Microsoft Knowledge Base article "Use the Microsoft Symbol Server to obtain
debug symbol files" (KB311503) provides you with the instructions to follow to use the Microsoft Symbol Server to obtain debug symbol files: basically, you can create a folder (for example, C:\Symbols) and set the environment variable
_NT_SYMBOL_PATH = srv*c:\Symbols*http://msdl.microsoft.com/download/symbols
as you can see in Figure 6.
_NT_SYMBOL_PATH = srv*c:\Symbols*http://msdl.microsoft.com/download/symbols
Start WinDbg from the Start menu (the exact position of WinDbg will vary according to your Windows version) and select
File -> Open Crash
Dump... (or press CTRL+D): select the appropriate .DMP file and let the debugger perform its initial operations: the kernel symbols are loaded and the debugger displays some basic informations about the analyzed system and the reported bugcheck, along
with the indication of the module that probably made the system crash.
After that, you need to get detailed informations about the current exception or bug check: in the lower pane of the Command windows, type the command "!analyze -v" and hit ENTER (the "-v" option displays verbose output).
As you can see, the system crashed because of a DRIVER_IRQL_NOT_LESS_OR_EQUAL bugcheck, whose Stop code is 0x000000D1. The faulting module seems to be "e1k6232" (the image file is e1k6232.sys): we enter the "lm" command with some options
("v" causes the display to be verbose, including the symbol file name, the image file name, checksum information, version information, date stamps, time stamps, and information about whether the module is managed code; "m"
specifies a pattern that the module name must match) as in the following
and we can get more informations about that module.
Then we perform a quich search on the web (http://systemexplorer.net/db/e1k6232.sys.html) and discover that "e1k6232.sys" is a driver belonging to the Intel Gigabit Adapter developed by Intel Corporation:
in this case, we could fix the issue by downloading and installing an updated version of this driver (this DMP file comes from a PC really affected by this problem and updating the driver effectively solved the issue). Further troubleshooting is dependent
on the specific error. Some errors may require the driver verifier to be enabled to determine a root cause: this tool verifies that drivers are not making illegal function calls or causing system corruption and it can identify conditions such as memory corruption,
mishandled I/O request packets (IRPs), invalid direct memory access (DMA) buffer usage and possible deadlocks. The
!verifier extension in the kernel debugger can be used to monitor and report on statistics related to Driver Verifier in context of a debugging session.
The following Stop error descriptions can help you to troubleshoot problems that cause Stop errors.
The Stop 0xA message indicates that a kernel-mode process or driver attempted to access a memory location to which it did not have permission or at a kernel IRQL that was too high. A kernel-mode process can access only other processes that have an IRQL lower
than or equal to its own. This Stop message is typically the result of faulty or incompatible hardware or software. This Stop message has four
If the last parameter is within the address range of a device driver used by the system, the driver itself can be determined by reading the line that begins with
**Address 0xZZZZZZZZ has base at <address>- <driver name>
If the third parameter is the same as the first parameter, a special condition exists in which a system worker routine—carried out by a worker thread to handle background tasks known as work items—returned at a higher IRQL. In that case, some of the four
parameters take on new meanings
To resolve an error caused by a faulty device driver, system service or basic input/output system (BIOS), follow these steps
To resolve an error caused by an incompatible device driver, system service, virus scanner or backup tool, follow these steps
If the Stop 0xA message is encountered while upgrading to a newer Windows version, the problem might be due to an incompatible driver, system service, virus scanner or backup. To avoid problems while upgrading, simplify hardware configuration and remove
all third-party device drivers and system services (including virus scanners) prior to running setup. After successfully installing Windows, contact the hardware manufacturer to obtain compatible updates.
If the Stop error occurs when resuming from hibernation or suspend, read the Microsoft Knowledge Base articles
If the Stop error occurs when starting a mobile computer that has the lid closed, refer to the Microsoft Knowledge Base article
The Stop 0xD1 message indicates that the system attempted to access pageable memory using a kernel process IRQL that was too high. Drivers that have used improper addresses typically cause this error. This Stop message has four
Stop 0xD1 messages can occur after you install faulty drivers or system services. If a driver is listed by name, disable, remove, or roll back that driver to resolve the error. If disabling or removing drivers resolves the error, contact
the manufacturer about a possible update. Using updated software is especially important for backup programs, multimedia applications, antivirus scanners, DVD playback, and CD mastering tools.
The Stop 0x00000124 message occurs when Windows has a problem handling a PCI-Express device. Most often, this occurs when adding or removing a hot-pluggable PCI-Express card; however, it can occur with driver- or hardware-related problems for PCI-Express
To troubleshoot 0x00000124 stop errors, first make sure you have applied all Windows updates and driver updates. If you recently updated a driver, roll back the change. If the stop error continues to occur, remove PCI-Express cards one by one to identify
the problematic hardware. When you have identified the card causing the problem, contact the hardware manufacturer for further troubleshooting assistance. The driver might need to be updated, or the card itself could be faulty.
The meanings of the parameters are described in Table 2.
Cause of error
Address of WHEA_ERROR_RECORD structure.
High 32 bits of MCi_STATUS MSR for the MCA bank that had the error.
Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error.
A machine check exception occurred.
These parameter descriptions apply if the processor is based on the x64 architecture, or the x86 architecture that has the MCA feature available (for example, Intel Pentium Pro, Pentium IV, or Xeon).
A corrected machine check exception occurred.
A corrected platform error occurred.
A nonmaskable Interrupt (NMI) error occurred.
An uncorrectable PCI Express error occurred.
A generic hardware error occurred.
Address of WHEA_ERROR_RECORD structure
An initialization error occurred.
A BOOT error occurred.
A Scalable Coherent Interface (SCI) generic error occurred.
Length, in bytes, of the SAL log.
Address of the SAL log.
An uncorrectable Itanium-based machine check abort error occurred.
A corrected Itanium-based machine check error occurred.
A corrected Itanium platform error occurred.
Table 2: meanings of the parameters.
Back to top
This article is also available in the following languages:
Great article! Good detail!
Thanks Ed. Of course, I'll review it and improve it.
I've noticed the "Has Table" tag: is that a way of grouping all the articles that have a TOC?
Thank you very much for spending your time to create a Great article like that.
Thanks, Rashid. As you can see, this article is still something like a work in progress: I update it every time I find something useful or important to add to what I wrote.
thanks for this useful informations,
Really good article Luigi !