Overview

The following wiki is an example of how WinDbg could be used to extract information about a failed process.  The example is taken from a situation where an ASP.NET WCF solution running in an Azure Cloud Service faulted with a stack overflow exception.  This failure is particularly challenging as the worker process fails and the application pool is torn down completely.  This means any .NET error trapping could not be used to capture the exception.

WinDbg is a utility that can be used to view a crash dump file (*.dmp).  WinDbg is a kernel-mode and user-mode debugger that is included in Debugging Tools for Windows.

Example

The result of the failure was masked a bit by the fault tolerant nature of the Azure Cloud Service because of the multiple instances of the web roles.  This meant that in the hosted website, the intermittent failures did not completely disrupt the user experience.  The issue was instead reported by a HTTP client receiving connection reset exceptions from an API endpoint on the same website.  These errors were logged in the Windows EventLog as:

Log Name: Application
Source: Windows Error Reporting
Date: 8/25/2017 10:33:54 AM
Event ID: 1001
Task Category: None
Level: Information
Keywords: Classic
User: N/A
Computer: xxxxxxxxxx
Description:
Fault bucket , type 0
Event Name: CLR20r3
Response: Not available
Cab Id: 0
Problem signature:
P1: w3wp.exe
P2: 8.5.9600.16384
P3: 5215df96
P4: System.Core
P5: 4.0.30319.36389
P6: 58d87805
P7: 11a0
P8: 0
P9: System.StackOverflowException
P10:

The website error handling and logging did not report any errors.  As stated earlier, this was due to the IIS application pool worker process (w3wp.exe) faulting and tearing down without triggering the Catch(Exception) block.

Finding the Crash Dump

The first step was finding the crash dump files.  This was located at c:\ProgramData\Microsoft\Windows\WER\ReportQueue as illustrated below:


Open Crash Dump File

After installing the Debugging Tools for Windows, WinDbg can be launched and under the File menu, the crash dump opened:



It is important to use the correct version of WinDbg (x64 or x86) when loading the crash dump and this can be detected as shown below:


 

Finding the stack trace

The goal is to get a stack trace to indicate where the failure happened.  The first step it verify the exact version of .NET.  This can be done by entering in the lmDvmclr command:



Once the exact version is determined, the corresponding SOS.dll (love the name!) can be used to analyze the memory dump.  This can be done using the .load command:
.load c:\windows\Microsoft.NET\Framework64\v4.0.30319\sos.dll

The first step is to get the list of threads running at the time of the crash using !sos.threads:



One thing to note is you can get additional information by clicking the far right hyperlink and the Lock Count indicates the active threads.  Also note the 40 in the first column.  This can be used to get the debugger to focus on the active thread using the ~40s command.

The stack trace can then be shown using the !sos.clrstack command:


Conclusion

Production related errors can be extremely difficult to diagnose and even more so when there was not any specific error information logged.  A stack overflow exception is one example of a fault that can fault the very reliable ASP.NET work process.  

Hopefully this wiki will help some devops or developer in the future!