none
Periodic server core BSOD/Reboot, need help with bugcheck report RRS feed

  • Question

  • Hi,

    I have a brand new Dell PowerEdge R710 and it has been crashing occasionally for the last while.  I SUSPECT it's RAM related (as I removed and readded RAM before the problem started occuring), and I will be reseating the modules as soon as I can (during off hours) however I thought I would post up my debug report and see if anyone could make something of it, because it sounds like a processor issue from what I see, then again, I have no idea how to interpret this.


    0: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************

    CLOCK_WATCHDOG_TIMEOUT (101)
    An expected clock interrupt was not received on a secondary processor in an
    MP system within the allocated interval. This indicates that the specified
    processor is hung and not processing interrupts.
    Arguments:
    Arg1: 000000000000000d, Clock interrupt time out interval in nominal clock ticks.
    Arg2: 0000000000000000, 0.
    Arg3: fffff88002200180, The PRCB address of the hung processor.
    Arg4: 000000000000000a, 0.

    Debugging Details:
    ------------------


    BUGCHECK_STR:  CLOCK_WATCHDOG_TIMEOUT_10_PROC

    DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

    PROCESS_NAME:  System

    CURRENT_IRQL:  d

    STACK_TEXT: 
    fffff800`02c2f938 fffff800`0166f443 : 00000000`00000101 00000000`0000000d 00000000`00000000 fffff880`02200180 : nt!KeBugCheckEx
    fffff800`02c2f940 fffff800`016cb5f7 : fffffa80`00000000 fffff800`0000000a 00000000`00026160 fffff880`035cc958 : nt! ?? ::FNODOBFM::`string'+0x4e3e
    fffff800`02c2f9d0 fffff800`01612895 : fffff800`01637460 fffff800`02c2fb80 fffff800`01637460 fffff800`00000000 : nt!KeUpdateSystemTime+0x377
    fffff800`02c2fad0 fffff800`016bf3f3 : 00000000`00000000 fffff800`02c2fb80 fffffa80`18e1a900 fffff800`02c2fb90 : hal!HalpHpetClockInterrupt+0x8d
    fffff800`02c2fb00 fffff800`01625652 : fffff800`016d0a3a 00000000`00000000 0000cc2c`00000001 fffff800`01849c00 : nt!KiInterruptDispatchNoLock+0x163
    fffff800`02c2fc98 fffff800`016d0a3a : 00000000`00000000 0000cc2c`00000001 fffff800`01849c00 fffffa80`1a09a6e8 : hal!HalProcessorIdle+0x2
    fffff800`02c2fca0 fffff800`016cb6cc : fffff800`0183be80 fffff800`00000000 00000000`00000000 fffff880`035cc010 : nt!PoIdle+0x53a
    fffff800`02c2fd80 00000000`00000000 : fffff800`02c30000 fffff800`02c2a000 fffff800`02c2fd40 00000000`00000000 : nt!KiIdleLoop+0x2c


    STACK_COMMAND:  kb

    SYMBOL_NAME:  ANALYSIS_INCONCLUSIVE

    FOLLOWUP_NAME:  MachineOwner

    MODULE_NAME: Unknown_Module

    IMAGE_NAME:  Unknown_Image

    DEBUG_FLR_IMAGE_TIMESTAMP:  0

    FAILURE_BUCKET_ID:  X64_CLOCK_WATCHDOG_TIMEOUT_10_PROC_ANALYSIS_INCONCLUSIVE

    BUCKET_ID:  X64_CLOCK_WATCHDOG_TIMEOUT_10_PROC_ANALYSIS_INCONCLUSIVE

    Followup: MachineOwner
    ---------

    Wednesday, September 2, 2009 10:00 PM

Answers

  • Hi,

     

    According to the error message, we find it seems to be system crash issue and we need to analyze the crash dump file to narrow down the root cause of the issue. Unfortunately, it is not effective for us to debug the crash dump file here in the forum. Therefore, I would like to suggest that you contact Microsoft Customer Service and Support (CSS) via telephone so that a dedicated Support Professional can assist with your request.

     

    To obtain the phone numbers for specific technology request please take a look at the web site listed below:

     

    http://support.microsoft.com/default.aspx?scid=fh;EN-US;OfferProPhone#faq607

     

    Hope the issue will be resolved soon.

     

    Best Regards,

    Vincent Hu

     

    Friday, September 4, 2009 4:47 AM
    Moderator

All replies

  • Mike -

    Hi. We are having the EXACT same issue you are having on our new Dell R710.  What version of Windows are you running, R2? Any luck with this?
    Thursday, September 3, 2009 10:20 PM
  • hi there,

    based on dump i see that the operating system you are using is vista, there is a dedicated forums to answer vista related queries, please post your query under the below link

    http://social.technet.microsoft.com/Forums/en-US/category/windowsvistaitpro


    best of luck.
    sainath !analyze
    Friday, September 4, 2009 3:15 AM
    Moderator
  • Hi,

     

    According to the error message, we find it seems to be system crash issue and we need to analyze the crash dump file to narrow down the root cause of the issue. Unfortunately, it is not effective for us to debug the crash dump file here in the forum. Therefore, I would like to suggest that you contact Microsoft Customer Service and Support (CSS) via telephone so that a dedicated Support Professional can assist with your request.

     

    To obtain the phone numbers for specific technology request please take a look at the web site listed below:

     

    http://support.microsoft.com/default.aspx?scid=fh;EN-US;OfferProPhone#faq607

     

    Hope the issue will be resolved soon.

     

    Best Regards,

    Vincent Hu

     

    Friday, September 4, 2009 4:47 AM
    Moderator
  • Hi, I also had these problem with my six R710. The only role I have on the servers is Hyper-V.

    After installing the latest BIOS 1.2.6, released 31 of august 2009. After that it has worked as expected = no reboot. Look at the Windows Server 2008 R2-section on the Dell support site.

    Regards,
    Kent

    Monday, September 14, 2009 10:32 PM
  • Same BSODs here.

    What is insteresting: I have the same STACK TRACE from "Bugchek Analysis" as Mike Mackie (all times kernel memory dumps were created without a problem).

    The only non-microsoft drivers are: intel gigabit 8257EB network, matrox G200e video, adaptec 5805 raid. All drivers are up-to-date. Latest motherboard (Intel 5520HC) and adaptec controller's BIOSes.

    It looks like this is NOT a hardware problem.
    Thursday, September 24, 2009 1:48 AM
  • This seems related to an issue covered in KB955076:

     

    http://support.microsoft.com/kb/955076/en-us

    We are having this issue on a Hyper-V host running Windows 2008 R2.  The KB covers a hotfix for Windows 2008, but MS has not release one for Windows 2008 R2 yet.

    We had to go back to our previous installs of Windows 2008 R2 and replace with Windows 2008 without R2 to resolve the issue.

    There are thoughts that disabling the C-STATE (power state) in the BIOS will prevent this issue from happening on R2.  We have set this at one of our Windows 2008 R2 installations and have not seen a reboot for (2) days, but without this setting we had gone (2) days without a reboot.  I'll re-post if this setting prevents the issue for more than 4 days.

    Does Microsoft have a projected date that this hotfix will be available for Windows 2008 R2?

    Saturday, October 3, 2009 3:28 PM
  • I'm having the same issue with some Dell R710's, Server 2008 r2 w/ Hyper-V. I tried disabling the C-STATE in the BIOS as suggested and it did not help. Still getting blue screens with the same bugcheck results
    Thursday, October 8, 2009 12:56 PM
  • same exact issue here too
    Monday, October 12, 2009 7:48 AM
  • Has anyone found a good fix to this issue yet?   I'm having the exact problems as everyone else.  The BIOS update did not correct our issue.   Is everyone downgrading back to SP2 to correct the issue?   Thanks!
    Monday, October 12, 2009 9:10 PM
  • Have 3 r710's in a cluster.  Have you seen a reboot after applying the bios changes?
    Tuesday, October 13, 2009 2:16 PM
  • Have 3 r710's in a cluster.  Have you seen a reboot after applying the bios changes?

    After changing the setting in the BIOS one of my R710's blue screened. The other has not. Both have been stable for a few days now. So i'm unsure of whether the BIOS changes helped or not.

    Anyone else experience a BSOD after BIOS changes?
    Tuesday, October 13, 2009 2:22 PM
  • I made the following changes and so far no problems but it's too early to tell for sure:

    Disable ALL of the following in CPU options:

    Clock spread spectrum
    EIST
    C1E Support
    C-STATE
    Hardware prefetcher
    adjacent cache line prefetcher

    of course watchdog should already be disabled too




    Tuesday, October 13, 2009 6:25 PM
  • I have not had any issues since the above changes.  John Paul Cook also posted the official fix for this issue in another thread:

    See this KB article http://support.microsoft.com/kb/975530 for a hotfix for Intel Nehalem processors.
    Friday, October 16, 2009 11:21 PM
  • Does anyone one if this update will work on a Hyper V 2008 R2 Stand alone server?
    Saturday, November 14, 2009 9:22 PM
  • In our case the patch didn't work. We had that problem on Dell R710 running 2008 R2 Datacenter, so I installed the patch. A week later the server blue-screened again with the same bugcheck. I verified that the patch was indeed installed and file versions are correct. Now I disabled processor C-states in the BIOS and the server has been OK for couple days. If it blue-screens again, I'm going to disable CPU options as BarrySDCA suggested plus disable power management in the registry as described in KB 975530.
    Friday, November 20, 2009 7:16 AM
  • I have a R710 and also with that problem :-(

    Will install the patch this night.
    Friday, November 20, 2009 11:08 AM
  • the last two days, I got the same BSOD on my lab server (also running Server 2008 R2)
    the thing is, it does not have a xeon 5500, but a Xeon 3460 cpu on an intel 3420gp mainboard with 16GB of RAM is the hotfix as in kb 975530 also applicable ?

    Sunday, November 22, 2009 2:10 PM
  • Any update?
    MSP - Microsoft Student Partner (2009-2010) http://www.yusufozturk.info
    Wednesday, November 25, 2009 10:26 PM
  • We are running a Dell PowerEdge R710 and the only thing I did was disable the C1E state in the BIOS.

    I've never had a BSOD since disabling this setting and it's been a couple of months now since making this change.

    My personal preference is to make a BIOS setting as opposed to applying an ineffective hotfix.

    Note: You'll need to upgrade to the latest BIOS firmware to be able to disable the C1E state specifically.
    Saturday, January 16, 2010 3:50 PM
  • Looks like I spoke too soon.

    After disabling the C1E state in the server's BIOS our Dell R710 running Hyper-V R2 on 2008 R2 bluescreened last night.

    It was about 7 weeks since the last bluescreen due to this bug.

    Is anyone else still seeing bugchecks because of this problem even after disabling C1E in the BIOS?
    Friday, January 29, 2010 3:16 PM
  • I have 3 R710s running 2008R2 Core in pre-production testing.   I disabled the C1E state last week without applying the hotfix or registry key and haven't had any bugchecks since.   Before the change I was able to go for a couple weeks without problems, so it's a bit early to see if the BIOS setting made a difference or not.  

    Has anyone gotten any feedback from Dell on this issue?  After sending a complete detailed explanation of the issue with references to the KB article etc their response was less than helpful.  (Quoted below for your entertainment.)

    "without knowing the error or symptoms you have, you can go with the hotfix since you think it is related.  But if you are getting that exact error as described in the MS KB article then I say you should go for it."

    Thursday, February 4, 2010 10:26 PM
  • I have 3 R710s running 2008R2 Core in pre-production testing.   I disabled the C1E state last week without applying the hotfix or registry key and haven't had any bugchecks since.   Before the change I was able to go for a couple weeks without problems, so it's a bit early to see if the BIOS setting made a difference or not.  

    Has anyone gotten any feedback from Dell on this issue?  After sending a complete detailed explanation of the issue with references to the KB article etc their response was less than helpful.  (Quoted below for your entertainment.)

    "without knowing the error or symptoms you have, you can go with the hotfix since you think it is related.  But if you are getting that exact error as described in the MS KB article then I say you should go for it."


    To be honest I haven't even brought this up to Dell because I have no confidence in their ability to resolve it and\or they'll just blame Microsoft or Intel on the issue.

    In fairness, the root cause of this problem is an errata in the Intel CPU, which Dell tried to fix by giving us the option in BIOS rev 1.3.6 by disabling the C1E state. Then, Microsoft comes out with a hotfix in an attempt to resolve the problem.

    To me it seems like the problem is still lingering although not as often as before.
    Friday, February 5, 2010 5:50 PM
  • Hi Mike, how are you ?

    I have the same issue are you having with DELL PowerEdge R710 (Hyper-V Server 2008 R2). After disbaled C1E and C-States instructions in the BIOS we resolve the problem.

    I will expect  thart works to you too.

    Any questions, post here...
    • Proposed as answer by Rafa Poch Tuesday, March 9, 2010 11:34 AM
    Tuesday, March 2, 2010 7:54 PM
  • Hi Mike, how are you ?

    I have the same issue are you having with DELL PowerEdge R710 (Hyper-V Server 2008 R2). After disbaled C1E and C-States instructions in the BIOS we resolve the problem.

    I will expect  thart works to you too.

    Any questions, post here...
    Interesting timing because yes we are still having problems after disabling the C1E state in the BIOS on our Dell R710.

    Therefore, it's time to bring out all weapons against this issue.

    I have no choice but to do the following and keep my fingers crossed:

    1) Windows Update.
    2) Disable All C states in the BIOS (including C1E).
    3) Install KB975530.
    4) Add the following registry entry although I think it's unnecessary since I'll be disabling all C states.

    reg add HKLM\System\CurrentControlSet\Control\Processor /v Capabilities /t REG_DWORD /d 0x0007c044

    I'm still not sure about disabling the following settings. I'm a bit more apprehensive about disabling these settings so I may try the above steps first and use the below steps in case I still get bluescreens:

    Clock spread spectrum
    EIST
    Hardware prefetcher
    adjacent cache line prefetcher
    watchdog
    Tuesday, March 2, 2010 8:36 PM
  • Hi Mike,

    what happened when you disabled these instructions in the BIOS ?


    Monday, March 8, 2010 1:19 PM
  • Hi Mike,

    what happened when you disabled these instructions in the BIOS ?



    So far everything is stable and there have been no more bluescreens. It appears that disabling just the C1E state is insufficient and you will be required to disable all C States in order to stop the bluescreens.
    Monday, March 8, 2010 4:54 PM
  • Fo here, these instructions after disable everything was stable again too.

    now i'll try find why these instructions are incompatible with Windows Hyper-V Server 2008 R2 and Power Edge R710.

    so glad everything went well

    Tuesday, March 9, 2010 11:40 AM
  • NOTHING went well. One of the reasons buying Intel Xeon 5500 processors is to use their efficient power management (and other features like turbo boost). And now we HAVE TO disable all of it completely? Is this a joke? Is Intel joking on customers all over the world?
    Sunday, March 14, 2010 1:05 PM
  • No kidding.  Must be a joke.  I'm having the same issue with our R710 running R2 and HyperV role.  Frustrating.
    Friday, March 19, 2010 6:43 PM
  • EDIT: FIXED! (Sort of)

    After disabling all C-States in BIOS, the computer has been running fine for two days so far. As opposed to 1-10 minutes before.

    Original post for reference:

    "Hi guys, found this thread from a different thread http://social.technet.microsoft.com/Forums/en-US/w7itproperf/thread/9e71f600-7c62-4869-8236-964e93d17936

    Just wanted to add that I'm having the same issue with a Core i5 750 desktop system in Windows 7 x64 Ultimate. Tried disabling all C-state functionality in BIOS and will get back with an updated status.

    Suspecting either defunct CPUs (unlikely) or defunct CPU power management support in Windows 7/2008 R2 - possibly related to I/O processing?

    To trigger the errors I simply start copying 300 GBs of data via the LAN and it freezes and reboots within 1-10 minutes with C-states enabled in BIOS.

    I have changed ALL parts except for CPU, system disk (Intel SSD) and the chassis, and it still failed on me today. I don't think it's the CPU hardware failing, but that's the next part I'll try to change if disabling C-states doesn't prove to work long time."

    • Edited by TNator Thursday, April 15, 2010 8:21 PM Found a solution
    Wednesday, April 14, 2010 2:01 AM
  • I encountered this issue on the Dell Power Edge R510.  Installing the latest version of 975530 (v4) looks to have resolved the issue.  No other changes, BIOS or otherwise were made.
    Wednesday, May 19, 2010 4:31 AM
  • Has anyone managed to resolve this issue as yet?

    I have the same problem with the same hardware ( Dell R710 )

     

    Wednesday, June 16, 2010 1:23 AM
  •  

    For my misfortune I have a Dell R710, so it would appear this is a prolific problem with no fix. My blue screen followed by a system dump is also random, but what do you tell your client? Sorry the System build is at fault, unfortunately the nature of today’s technology is to put profit first so we can socialise the misery. It is a disgrace at best the technology should have been tested further before the end user had the misfortune to get to this stage.

    I too am waiting on Dell to get back to me. Can’t wait to see what they have to say on this issue.  

    I have applied Dell updates sent to me at 2pm, 21<sup>st</sup> June. Since then the server blue screened on 23/06/2010 at 05:51:09 Minidump was created. Prior to the mini dump the Kernel-processor-power looked to be doing some unusual things?

    The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

    The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000101 (0x000000000000000d, 0x0000000000000000, 0xfffff88001f46180, 0x0000000000000004).

     

     

    Wednesday, June 23, 2010 1:51 PM
  •  Somewhat same issue here with the R710. I had a co-worker that had similar issue with a Hyper-V server on an HP DL380. Fix so far (been 2 weeks, issue was happening every 4 days or so) was to disable the C-State option in the BIOS. Apparently an issue with that feature and Windows 2008. I applied the fix today and waiting to see. Anyone else attempt this workaround?

    Wednesday, June 30, 2010 5:09 PM
  • I have managed to resolve this issue on the Dell R710 running in a cluster with CSV

    Disable the Following:

    C-State
    C1E
    Clock spread spectrum
    EIST
    Hardware prefetcher
    adjacent cache line prefetcher
    Watchdog

    Then install KB975530.

     

    I found you also need to remove the broadcom management Suite 3 of the machine and delete the teams, By doing this it also resolved the issue of Status_Connection_disconnected.

    • Proposed as answer by braden Voigt Thursday, July 1, 2010 12:46 AM
    Thursday, July 1, 2010 12:40 AM
  • Hmmm, Probably should have followed up on this a long time ago.  All I did was disable OS Watchdog Timer in the BIOS and it has been stable since then.  Sorry for not replying earlier.
    Monday, August 16, 2010 8:26 PM
  • I've had the same issue on a HP Proliant DL180 G6.
    Again random BSOD and reboot. Windows Server 2008 Standard R2 with Hyper-V role (and dfs). Seems to be the trend here.

    Will try disabling that OS Watchdog Timer service if it exists. My dump is below.

    Debugging Details:
    ------------------

    PEB is paged out (Peb.Ldr = 000007ff`fffd5018).  Type ".hh dbgerr001" for details
    PEB is paged out (Peb.Ldr = 000007ff`fffd5018).  Type ".hh dbgerr001" for details

    BUGCHECK_STR:  CLOCK_WATCHDOG_TIMEOUT_10_PROC

    DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

    PROCESS_NAME:  dfsrs.exe

    CURRENT_IRQL:  d

    Tuesday, September 14, 2010 1:42 PM
  • Hello,

    I have the same problem with our HP DL180 G6 having 2 Intel "Nehalem"-Processors installed. The OS is 2008R2 with Hyper-V-Role and I found the following hotfix:

    http://support.microsoft.com/kb/975530 

    I will try it at the weekend. I hope that I don´t have to disable any functions/features of the processors...

     

    • Proposed as answer by cloehr Tuesday, January 25, 2011 1:37 PM
    Tuesday, January 25, 2011 12:40 PM
  • cloehr: KB975530 has been superseded by this KB:

    http://support.microsoft.com/kb/2264080

    Monday, March 7, 2011 4:42 PM