locked
Hyperv Server 2008 R2 bluescreen...probably CLOCK_WATCHDOG_TIMEOUT

    Question

  • I've recently installed two new IBM x3400 M2 with Hyperv Server 2008 R2 and both machines, sometimes and an impredctable way, go into the following bluescreen error.

    Problem signature:
      Problem Event Name: BlueScreen
      OS Version: 6.1.7600.2.0.0.272.42
      Locale ID: 1033

    Additional information about the problem:
      BCCode: 101
      BCP1: 000000000000000D
      BCP2: 0000000000000000
      BCP3: FFFFF880023C4180
      BCP4: 000000000000000E
      OS Version: 6_1_7600
      Service Pack: 0_0
      Product: 272_3

    On event viewer I can find the following event:

    The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000101 (0x000000000000000d, 0x0000000000000000, 0xfffff88002200180, 0x000000000000000a) .

    I think the problem is related to CLOCK_WATCHDOG_TIMEOUT bug described on KB955076 (http://support.microsoft.com/kb/955076/en-us) that is solved by the KB967170.
    The problem is that the previous KBs are related to Windows Server 2008 not to Windows Server 2008 R2.

    Can you help me ?

    best regards

    Sunday, September 27, 2009 1:58 PM

Answers

  • See this KB article http://support.microsoft.com/kb/975530 for a hotfix for Intel Nehalem processors.
    • Proposed as answer by BarrySDCA Friday, October 16, 2009 11:20 PM
    • Marked as answer by Vincent HuModerator Monday, October 19, 2009 7:41 AM
    Friday, October 16, 2009 6:50 PM
  • Hi,

     

    According to the description, we find it seems to be system crash issue and we need to analyze the crash dump file to narrow down the root cause of the issue. Unfortunately, it is not effective for us to debug the crash dump file here in the forum. Therefore, I would like to suggest that you contact Microsoft Customer Service and Support (CSS) via telephone so that a dedicated Support Professional can assist with your request.

     

    To obtain the phone numbers for specific technology request please take a look at the web site listed below:

     

    http://support.microsoft.com/default.aspx?scid=fh;EN-US;OfferProPhone#faq607

     

    Hope the issue will be resolved soon.

     

    Best Regards,

    Vincent Hu

     

    Monday, September 28, 2009 8:06 AM
    Moderator

All replies

  • We're getting the same error repeatedly on a hp dl180 g6 with Windows 2008 R2 and HyperV.

    Problem signature:
      Problem Event Name:   BlueScreen
      OS Version:   6.1.7600.2.0.0.274.10
      Locale ID:    1031
     
    Additional information about the problem:
      BCCode:       101
      BCP1: 000000000000000D
      BCP2: 0000000000000000
      BCP3: FFFFF88001FE8180
      BCP4: 0000000000000006
      OS Version:   6_1_7600
      Service Pack: 0_0
      Product:      274_3

    The event log entry is accordingly:
    The computer has rebooted from a bugcheck. The bugcheck was: 0x00000101 (0x000000000000000d, 0x0000000000000000, 0xfffff88002200180, 0x000000000000000a).


    I hope someone can help with this bluescreen as it blocks our Win2008 R2 HyperV rollout.
    I'm happy to provide mini- and memdumps or detailed system info if it helps to tackle the issue.

    best regards and thanks in advance

    Sunday, September 27, 2009 8:30 PM
  • Hi,

     

    According to the description, we find it seems to be system crash issue and we need to analyze the crash dump file to narrow down the root cause of the issue. Unfortunately, it is not effective for us to debug the crash dump file here in the forum. Therefore, I would like to suggest that you contact Microsoft Customer Service and Support (CSS) via telephone so that a dedicated Support Professional can assist with your request.

     

    To obtain the phone numbers for specific technology request please take a look at the web site listed below:

     

    http://support.microsoft.com/default.aspx?scid=fh;EN-US;OfferProPhone#faq607

     

    Hope the issue will be resolved soon.

     

    Best Regards,

    Vincent Hu

     

    Monday, September 28, 2009 8:06 AM
    Moderator
  • Confirmed, I have the same error on an HP DL380 G6.  We called Microsoft PSS, and they confirmed that there is a fix for Windows 2008 for this issue, but not for Windows 2008 R2.  Like the previous poster, this is holding up our rollout of Windows 2008 R2 to many clients.  In fact, we have been pushing out Windows 2008 R2 to ALL Hyper-V installations since R2 went gold, and they are sizable installs.

    This only appears to affect the new Intel Xeon 5500, and is covered in Intel Errata document 321324. All of these facts were given to us by Microsoft PSS.  We have Windowsw 2008 R2 installed on HP DL380 G5s without this issue.

    As an MS Partner, we now have to go back to our previous 5 Hyper-V installs and roll the OS back to Windows 2008 without R2.  While this is easy enough (export the VMs to external media, reinstall Windows, and re-import the VMs back), it probably means that these clients will NEVER get to run Windows 2008 R2 on the hyper-V hosts, and they will run Windows 2008 without R2 for the lifetime of the server.

    Does Microsoft have a projected release date for the same hotfix that addresses this issue in Windows 2008 (KB955076) for Windows 2008 R2?  We can't respobsibly push Windows 2008 R2 until this is resolved, and we are rolling out many Hyper-V installs.

    • Proposed as answer by Naveed iop Tuesday, June 15, 2010 10:14 AM
    Saturday, October 03, 2009 3:20 PM
  • Hello;

    We confirm the same exact error with 2K8/R2 HyperV servers with 5500 Series Xeons.

    The problem seems intermittent and hits us at least every other day when the machines are under load.

    We are losing credibility quickly with our customers after selling them on the stability benefits of R2 and the incredible performance advantages.

    Please continue to post status on this issue and if there are any workarounds.

    Thanks.


    Tuesday, October 13, 2009 12:27 PM
  • Hi,

    Same problem here. We have an Win2k8 R2 HyperV running on 2x E5520. The problem is intermittent... "Fixing" the problem by removing one cpu. Waiting for a bugfix :S
    Tuesday, October 13, 2009 7:37 PM
  • Tuesday, October 13, 2009 11:16 PM
  • See this KB article http://support.microsoft.com/kb/975530 for a hotfix for Intel Nehalem processors.
    • Proposed as answer by BarrySDCA Friday, October 16, 2009 11:20 PM
    • Marked as answer by Vincent HuModerator Monday, October 19, 2009 7:41 AM
    Friday, October 16, 2009 6:50 PM
  • BEAUTIFUL!  THANK YOU!
    Friday, October 16, 2009 11:20 PM
  • Thanks John for the link!

    Monday, October 19, 2009 7:41 AM
    Moderator
  • I believe the "new" x3400 series Intel Xeons also have this BSoD problem.

    update:  After installing this hotfix the ~every 12 hour BSoD on the x3400's has also disappeared.

    48 hour update:  No crashes what-so-ever.

    Perhaps we can get MS to update the KB for the x3400s as well as the x5500?
    Monday, October 26, 2009 6:52 PM
  • Sorry to bump an old thread, but we are about to order new servers and I need to make sure that Nehalem (X5550) is going to be worth it (vs cheaper AMD).

    1) I believe the hotfix disables the turbo boost feature of the Nehalem processors - how much of a performance impact is this going to have?
    2) Is anyone here running Nehalem without the hotfix?
    3) Any other stability issues on Nehalem that anyone is experiencing?

    We've been an AMD shop for a long time and have never had any issues. I've only found VMWare benchmarks for AMD vs Intel, and the Nehalem destroys the Opteron in these tests. Has anyone done performance comparisons of Intel vs AMD for Hyper-V?
    Friday, January 15, 2010 6:08 AM
  • we continue to use these boxes.  works great with the patch.  would not think of running w/out the patch.  has been exceptionally stable.  good luck
    Friday, January 15, 2010 7:06 PM
  • If your server's BIOS supports it you can disable the C1E state in the BIOS and it will also resolve the problem.

    I personally prefer this method over installing a hotfix.

    The C1E state can be disabled in a Dell PowerEdge R710. I'm not sure about other make\models.

    Once the C1E state was disabled the problem went away.
    Saturday, January 16, 2010 3:47 PM
  • This worked for us, but at the expense of the power saving features of C1E.  Waiting for a fix from Dell or Intel.....
    Thursday, February 11, 2010 10:27 PM
  • This worked for us, but at the expense of the power saving features of C1E.  Waiting for a fix from Dell or Intel.....
    There is a hotfix available from Microsoft but it is ineffective.

    There will be no fix from either Dell or Intel because the hotfix must come from the operating system developer (i.e. Microsoft).

    Dell provided a ways to bypass this problem by enabling the ability to disable the C1E state in the BIOS.

    MCCCSam: Are you seeing BSOD's with this issue?
    Friday, February 12, 2010 2:45 AM
  •   Not really. The hotfix should come from Intel, because that is where the problem lies (but that won't happen). The next best fix would be a BIOS upgrade. It is a hardware problem, not an OS problem.


    Bill
    Friday, February 12, 2010 9:52 AM
  • Yes, we were seeing BSOD's.  None so far with the MS Hotfix.
    Tuesday, February 16, 2010 2:22 PM
  • I second the hotfix is definitely ineffective, even the latest V4 of KB975530 doesn't fix the problem on our Tyan S7012D Intel X5540 servers (4 of them)
    We modify the registry to fix the bug instead since the hotfix doesn't address the issue at least not completely:
    reg add HKLM\System\CurrentControlSet\Control\Processor /v Capabilities /t REG_DWORD /d 0x0007c044
    Wednesday, February 17, 2010 6:47 AM
  • So, what's the status of this?

    Our Dell R710 is still bluescreening with Hyper-V R2.

    This is even with the C1E state disabled in the BIOS.

    Is anyone still having issues with BSOD's?
    Tuesday, March 02, 2010 7:30 PM
  • The server was bluescreening way too often so I implemented the following changes:

    1) Disabled all C-States in the BIOS (C1E was already disabled).
    2) Installed hotfix for KB975530.

    http://support.microsoft.com/kb/975530

    (keeping fingers crossed)
    Wednesday, March 03, 2010 3:55 PM
  • MikeSTL:  How are your servers coming along?  Did any of those steps help?

    I also followed those fixes, and reloaded fresh firmware,drivers, and OS to 2 of our R710's, and it's still randomly rebooting.  The odd thing is that one of our other R710's is running fine with out disabling any processor settings, and does not have hotfix KB975530 applied. 

    What gives?

    Monday, March 08, 2010 7:23 PM
  • MikeSTL:  How are your servers coming along?  Did any of those steps help?

    I also followed those fixes, and reloaded fresh firmware,drivers, and OS to 2 of our R710's, and it's still randomly rebooting.  The odd thing is that one of our other R710's is running fine with out disabling any processor settings, and does not have hotfix KB975530 applied. 

    What gives?


    That is odd.

    I can report that after disabling all C-States in the BIOS and applying KB975530 the problem has gone away.

    However, it has only been since March 3 (5 days) so I would like a longer time period before saying the problem is resolved for sure.
    Monday, March 08, 2010 8:32 PM
  • MikeSTL:  How are your servers coming along?  Did any of those steps help?

    I also followed those fixes, and reloaded fresh firmware,drivers, and OS to 2 of our R710's, and it's still randomly rebooting.  The odd thing is that one of our other R710's is running fine with out disabling any processor settings, and does not have hotfix KB975530 applied. 

    What gives?


    That is odd.

    I can report that after disabling all C-States in the BIOS and applying KB975530 the problem has gone away.

    However, it has only been since March 3 (5 days) so I would like a longer time period before saying the problem is resolved for sure.

    I guess I'm not having any such luck.  Our servers reboot approximately every 2hrs.  Thanks for the info MikeSTL
    Tuesday, March 09, 2010 3:03 PM
  • With v3 of 975530 update and C2 & C3 disabled in BIOS server on 2 Xeon 5500 processors has approximately 1 BSOD in 2-3 weeks.
    Latest BIOS (Intel MB) doesn't help at all:(.
    I'll install v4 of update and see what happens :-\.

    P.S. Looks like Intel doesn't care about customers at all... next time server will use AMD processors...

    Sunday, March 14, 2010 12:53 PM
  • Update:  Our HyperV-R2 servers running on the Dell PowerEdge 710's were constantly rebooting, so I rebuilt one of the R710 servers with a full installation of Windows 2008 R2 conifgured with HyperV Role.  While the server with the "FREE" HyperV-R2 continues to reboot, the server with the full Windows 2008 R2 installation has been running for an entire week.  I have also built another R710 server with the original HperV release that has been running stable.  I don't understand, especially with 1 of the R710's running stable on HyperV R2 without hotfix or firmware change.

    MikeSTL:  Are you using HyperVR2 free or Windows 2008 with HyperV?

    Friday, March 19, 2010 8:42 PM
  • I've got 4x Dell R610s running Windows Server 2008 R2 Ent Hyper-V.  I didn't have any problems for 5 months - then two of the servers crashed with the watchdog BSOD within a few weeks of each other.

    I've now patched (v4) all four servers and not had a problem since - that was about a month ago.  Its hard to know if it's fixed the problem as I've only ever had two BSODs.

     

    Those of you having extremely frequent crashes - is your average CPU load quite high?

    • Proposed as answer by Megantheweasel Thursday, April 08, 2010 2:16 PM
    Saturday, March 20, 2010 10:27 PM
  • I've now patched (v4) all four servers and not had a problem since - that was about a month ago.  Its hard to know if it's fixed the problem as I've only ever had two BSODs.

    DJL, how are C2 & C3 levels configured?

     

    BTW, my crashes happened mostly at ~0% CPU load. I think this is because all Nehalem's power saving technologies become active at this moment.

    Monday, March 22, 2010 12:30 AM
  • All C states are enabled.  Disabling C states in the BIOS was going to be my next move if things didn't settle down.

     

    Both of our crashes happened in the afternoon on nodes hosting one of our Sharepoint SQL VM's (+ others) so CPU usage would have been reasonably high - although I guess not on all cores.

    Monday, March 22, 2010 9:14 PM
  • Then, should I install the HotFix? Or should I apply the workarround (Disabling the C states / editing the registry as says the KB)???

    I have a DL Proliant 380 G6 with two Intel Xeon 5530, Windows Server 2008 R2 with Hyper-V, and I don't Know what to do. Some of you say that the HotFix does nothing to solve the issue, and others that it does...

    One question: If I disable the C states, as the KB says, my servers will increase the energy consumption.. is it true? Is it the only disadvantage?

    Please, answer as soon as posible! Thanks.

     

    Tuesday, April 06, 2010 4:04 PM
  • Luiggi,  I would start with the patch, if that doesn't help then try disabling C-states.

    Disadvantages to disabling C-states are increased power consumption and therefore increased heat output - not a huge problem with only one server. 

    Tuesday, April 06, 2010 7:31 PM
  • Hello Luiggi,

    DL380 G6 2x x5550 Windows 2008 R2 with Hyper-V same issue here, resolved by disabling c-states. See DJL for energy question. No patches from this article installed.

    This issue also happens with a DL460c G6 by the way.

    Gr,

    Peter

    Saturday, May 29, 2010 10:22 AM
  • I am also experiencing this issue.  I'm running 2008 R2 and the Hyper-V role on a Dell XPS 8100 - the CPU is Intel Core i7-860 - a candidate for the BSOD.  I went into the BIOS and changed the ACPI setting from S3 to S1 - is this the same as disabling the 2/3 C-states?  Although I was getting the STOP errors every few hours for that last 4 days, since I changed this BIOS setting 3 hours ago, no issue yet.  The real test will be if it makes it through the night...
    Sunday, June 06, 2010 5:12 PM
  • I am also experiencing this issue.  I'm running 2008 R2 and the Hyper-V role on a Dell XPS 8100 - the CPU is Intel Core i7-860 - a candidate for the BSOD.  I went into the BIOS and changed the ACPI setting from S3 to S1 - is this the same as disabling the 2/3 C-states?  Although I was getting the STOP errors every few hours for that last 4 days, since I changed this BIOS setting 3 hours ago, no issue yet.  The real test will be if it makes it through the night...


    I actually ended-up disabling all of the C-States in addition to installing the hotfix.

    As I understand it, the C-States are only there to reduce power consumption.

    Monday, June 07, 2010 4:31 PM
  •  

    I am also experiencing this issue...

    HP DL180 G6 2x Xeon 5520 running Windows 2008 R2 with Hyper-V, i've installed hotfix for KB975530 but the issue persist...

    Critical 6/7/2010 12:18:03 PM Kernel-Power 41 (63)
    Critical 6/1/2010 6:27:45 AM Kernel-Power 41 (63)
    Critical 5/29/2010 5:03:42 AM Kernel-Power 41 (63)
    Critical 4/6/2010 9:07:30 PM Kernel-Power 41 (63)
    Critical 3/22/2010 7:58:49 AM Kernel-Power 41 (63)
    Critical 1/9/2010 2:21:45 PM Kernel-Power 41 (63)
    Critical 12/31/2009 2:39:25 PM Kernel-Power 41 (63)

    Someone can comfirm if must be resolved by disabling C1 state?

     

     

     

    Tuesday, June 08, 2010 7:42 PM
  • Apparently the ACPI setting I changed in BIOS is not for c-state - another BSOD that night.  So I changed it back, and went the reg add route instead (no hotfix yet - saving that for last resort) - so far, so good!
    Wednesday, June 09, 2010 12:16 PM
  • Reg settings didn't work either.  Going for the hotfix...
    Friday, June 18, 2010 12:01 PM
  • For my misfortune I have a Dell R710, so it would appear this is a prolific problem with the new Intel Xeon 5500 with a Windows 2008 R2, Hyper-V installation. My blue screen followed by a system dump is also random, but what do you tell your client? "Sorry the System build is at fault, but unfortunately the nature of today’s technology roalout is to put profits first, so that we can all socialise from this blunder." It is a disgrace at best the technology should have been tested further before the end user had the misfortune to get to this stage.

      I am waiting on Dell to get back to me. Can’t wait to see what they have to say on this issue.

    I have applied the Dell updates sent to me on 21st June. Since then the server blue screened on 23/06/2010 at 05:51:09. Prior to the mini dump the Kernel-processor-power looked to be doing some unusual things?

    My logs.

    The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

    The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000101 (0x000000000000000d, 0x0000000000000000, 0xfffff88001f46180, 0x0000000000000004). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 061610-12776-01.

    Unexpected failure. Error code: D@01010004

     

     

    Wednesday, June 23, 2010 1:27 PM
  • got my first bsod today, r610 Xeon X5670 2.93 12M cache 1333, hyper v 2008 r2... it did the restart after installing a printer on one VM
    Friday, June 25, 2010 12:24 AM