none
BSOD after creating DAG

    Question

  • Hi,

    since i have created a DAG consisting of 2 Mailbox servers, one of the hosts throws a BSOD every few hours. likely culprit seems to be the Health Monitoring (again, i might add) - analyzing the dump shows this:

    CRITICAL_PROCESS_DIED (ef)
            A critical system process died
    Arguments:
    Arg1: fffffa800708f980, Process object or thread object
    Arg2: 0000000000000000, If this is 0, a process died. If this is 1, a thread died.
    Arg3: 0000000000000000
    Arg4: 0000000000000000

     

    Debugging Details:
    ------------------


    PROCESS_OBJECT: fffffa800708f980

    IMAGE_NAME:  wininit.exe

    DEBUG_FLR_IMAGE_TIMESTAMP:  0

    MODULE_NAME: wininit

    FAULTING_MODULE: 0000000000000000

    PROCESS_NAME:  MSExchangeHMWo

    BUGCHECK_STR:  0xEF_MSExchangeHMWo

    DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

    CURRENT_IRQL:  0

    MANAGED_STACK: !dumpstack -EE
    OS Thread Id: 0x0 (0)
    TEB information is not available so a stack size of 0xFFFF is assumed
    Current frame:
    Child-SP         RetAddr          Caller, Callee

     

    STACK_COMMAND:  kb

    FOLLOWUP_NAME:  MachineOwner

    FAILURE_BUCKET_ID:  0xEF_MSExchangeHMWo_IMAGE_wininit.exe

    BUCKET_ID:  0xEF_MSExchangeHMWo_IMAGE_wininit.exe

    Followup: MachineOwner

    any idea what to do next? can i disable this service?

    thanks!



    edit: the machine is VM on a hyperv-host (both server 2012)
    • Edited by steve_mail Wednesday, April 24, 2013 8:21 AM
    Wednesday, April 24, 2013 8:16 AM

Answers

  • OK, this will provide you will relief for this issue for now.

    Disable the responder “ActiveDirectoryConnectivityConfigDCRestart” to prevent forced reboots of Exchange server. The change will be effective for 60 days from date of creation.

    1. Run following command only once on one of Exchange 2013 servers:

    Add-GlobalMonitoringOverride -Identity Exchange\ActiveDirectoryConnectivityConfigDCServerReboot  -ItemType Responder -PropertyName Enabled -PropertyValue 0 -Duration 60.00:00:00

    2. Force or wait for AD replication to complete.  After AD replication is complete, Exchange will pick up change in about 10 minutes.

    3. Use following command to make sure the responder is disabled:

    (Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/responderdefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ?{$_.Name -like “ActiveDirectoryConnectivityConfigDCServerReboot"} | ft name,enabled

    If the responder is disabled, Enabled should have a value of 0, as shown below.

    Responder showing Enabled = 0, which means responder is disabled


    This posting is provided "AS IS" with no warranties, and confers no rights.

    Sunday, August 18, 2013 11:16 PM

All replies

  • Hi
       You can disable this service. BSOD seems to be memory problem.
        You can increase memory and test it again.

    Terence Yu

    TechNet Community Support

    Thursday, April 25, 2013 8:18 AM
  • thank you, actually i started to experience BSODs on both DAG members and decided to just disable the health monitoring service. servers are stable since then, so i will leave it turned off. unfortunately nobody can really tell me what the consequences of disabling this service are...
    Thursday, April 25, 2013 9:31 AM
  • Hi,

    same Problem here. Are you experiencing any problems, since disabled the service?

    cheers

    peter

    Monday, July 22, 2013 6:43 AM
  • Hi!

    We're experiencing the same problem after installation of CU2.
    So any updates would be helpful. I'll try disabling the service on one of the servers to check.

    Apparently this monitoring uses some mailboxes
    Maybe this is related? (get the error for all monitoring mailboxes)

    [PS] C:\Windows\system32>Get-Mailbox -Monitoring

    Name                      Alias                ServerName       ProhibitSendQuota
    ----                      -----                ----------       -----------------
    HealthMailbox424535c2d... HealthMailbox4245... exch01       Unlimited
    WARNING: The object customer.com/Microsoft Exchange System Objects/Monitoring
    Mailboxes/HealthMailbox428635c2d0d0443db6f4e689cc704ee2 has been corrupted, and it's in an inconsistent state. The
    following validation errors happened:
    WARNING: Database is mandatory on UserMailbox.
    WARNING: Database is mandatory on UserMailbox.
    • Edited by dgoossens Thursday, August 01, 2013 8:21 AM
    Thursday, August 01, 2013 8:16 AM
  • Same problem here, but no DAG in our case..
    Friday, August 02, 2013 12:08 PM
  • Sorry for this, but in my opinion Exchange 2013 is the biggest Bug, Microsoft released ever...
    Friday, August 02, 2013 12:29 PM
  • The Exchange Health Manager service doesn't start up automatically in 2013 CU2 Version 2. Maybe to fix this issue ;-)

    Rajith Enchiparambil | http://www.howexchangeworks.com |

    HowExchangeWorks.Com

    Friday, August 02, 2013 9:49 PM
  • For both issues, I would recommend investigating the status of the HealthMailbox on each of your servers, or simply recreate the mailbox.  You can safely
    delete and recreate health mailboxes. Be aware that any local Managed Availability probes that are using the these mailboxes will fail until the Microsoft Exchange Health Manager is restarted.  Once that service is restarted, it will recreate any mailboxes that it needs.  Hopefully that will resolve these issues.

    This posting is provided "AS IS" with no warranties, and confers no rights.

    Monday, August 05, 2013 7:32 PM
  • Hi!

    I just removed the mailboxes, and restarted the service.
    I'll let you know if it worked.

    In the meantime all servers are still rebooting regularly, caused by this service.

    Wednesday, August 07, 2013 7:28 AM
  • Servers are still rebooting randomly, and still caused by the "MSExchangeHMWo" service.
    After installing CU2v2 some exchange services don't even startup anymore after the reboot.

    Thursday, August 08, 2013 9:08 AM
  • We've been experienced exactly the same issue reported by D_Goossens (and others...) with persistent BSOD on our 2 member DAG servers. I can clearly state that this issue appeared after the CU2 installation and continues with CU2 V2. We had several weeks with CU1 without any related or visible problems.

    Already tried to delete and recreate the health mailboxes, as suggested by Scott, but still the problem remains.

    Monday, August 12, 2013 9:59 AM
  • We have seen this occur too and currently have a support case open with MS.  Will report back anything from the MS support case as soon as it progresses.

    Just to confirm Stopping and disabling the Health Service on both nodes has settled the servers down. 

    Wednesday, August 14, 2013 4:57 PM
  • I agree wholeheartedly, authentication issues, adding a brand new pc with outlook 2013 - trying to connect to the brand new server -  crashed it.

    This is insane, never -  in 15 years of exchange -  have I seen such a buggy mess.  Shame on me for being a beta tester for Microsoft.

    I usually WAIT until at least a year has gone by (2014) and then think about it, dove in after 8 months

    thinking it should be safe and BOOM !!!

    Now I am stuck with this unfinished piece of junk that I am afraid to apply patches too - just read the issues

    on CU2, it is ridiculous..

    Wednesday, August 14, 2013 6:33 PM
  • OK, this will provide you will relief for this issue for now.

    Disable the responder “ActiveDirectoryConnectivityConfigDCRestart” to prevent forced reboots of Exchange server. The change will be effective for 60 days from date of creation.

    1. Run following command only once on one of Exchange 2013 servers:

    Add-GlobalMonitoringOverride -Identity Exchange\ActiveDirectoryConnectivityConfigDCServerReboot  -ItemType Responder -PropertyName Enabled -PropertyValue 0 -Duration 60.00:00:00

    2. Force or wait for AD replication to complete.  After AD replication is complete, Exchange will pick up change in about 10 minutes.

    3. Use following command to make sure the responder is disabled:

    (Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/responderdefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ?{$_.Name -like “ActiveDirectoryConnectivityConfigDCServerReboot"} | ft name,enabled

    If the responder is disabled, Enabled should have a value of 0, as shown below.

    Responder showing Enabled = 0, which means responder is disabled


    This posting is provided "AS IS" with no warranties, and confers no rights.

    Sunday, August 18, 2013 11:16 PM
  • I agree asctech. You can tell Bill G isn't running the show anymore, MS has gotten very sloppy with their flagship application.
    Monday, August 19, 2013 10:32 PM
  • Scott, after 60 days what happened ?

    The BSOD returns??

    this is a final fix, in this moment I have a case with MSFT to review this situation

    Regards


    GRF

    Tuesday, August 20, 2013 2:58 PM
  • MS have confirmed to me that this is now a known issue and a Patch is being worked upon.  There recommendation is to follow the PowerShell Responder Workaround or disable the Health Service.  Hope this info helps
    Thursday, August 22, 2013 9:29 AM
  • This is insane, never -  in 15 years of exchange -  have I seen such a buggy mess.  Shame on me for being a beta tester for Microsoft.

    Agree. I've been around since Exchange 4.0 and this new version is a buggy mess. Every patch seems to break something. I have installed Exchange 2013 for many different companies now and I have yet to have a single one go smoothly. Every single install has been a problem with the "new" and "improved" Exchange. In contrast I've installed Exchange 2010 over 100 times and never had a lick of problems. 

    This thing simply isn't baked yet and their update process appears severely flawed.

    • Edited by ABCFED Friday, August 23, 2013 10:14 PM adsas
    Friday, August 23, 2013 10:07 PM
  • We also have a ticket opened at MS for this. I'll update if we have a fix! (thanks for the workaround)
    Monday, August 26, 2013 6:20 AM
  • Lotus Notes ...
    Tuesday, August 27, 2013 4:18 PM
  • GRF

    BSOD is by design (I believe).

    If MA detects a problem and is unable to fix it - it simply stops with BSOD to prevent further damage. What is wrong with this ?

    The overall problem with Exchange is that it requires all sorts of "other" bits to actually work that are separate from Exchange server - so you may end up with a scenario that BSOD is a result of some other service not behaving properly on some other machine ... and instead of self healing (this is why MA was really introduced - I think) you have BSOD.

    Scott - why don't you just simply read configuration from AD every so often and store it on Exchange servers (hey you can even replicate among Exchange servers) and have an air-gap to isolate a bit ... from other services.

    Tuesday, August 27, 2013 4:29 PM
  • I wrote about this topic today in http://windowsitpro.com/blog/trauma-exchange-2013-servers-when-managed-availability-goes-bad.

    Clearly there are some troubling issues in product quality that the Exchange team has to address. I am sure that they are all over this problem and that it will be addressed in due course. Disabling the responder makes sense for the moment and allows the dust to settle and a more measured solution to be found.

    - Tony

    Tuesday, August 27, 2013 5:58 PM
  • Dear All,

    I have the same issue exactly, after creation of DAG 4 Mailbox servers and 4 transport servers each and every sever is facing random reboots!!!

    All of them blue screens; damn irritating. All the machines are VM on VMWARE and when i investigated the BSOD mini dump i got the same output as posted above.

    Do we have any solution released by Microsoft till now for this or disabling the service as mentioned above is the only workaround for now?

    Awaiting inputs and thanks in advance


    Apoorv Mehrotra

    Tuesday, October 29, 2013 6:54 AM
  • Hi!

    These are the ones we've disabled, and since then, we have no more reboots.

    It disabled them for 60 days (which is the maximum you can set) > It should be fixed in CU3

    Add-GlobalMonitoringOverride -Identity Exchange\ActiveDirectoryConnectivityServerReboot  -ItemType Responder -PropertyName Enabled -PropertyValue 0 -Duration 60.00:00:00

    (Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/responderdefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ?{$_.Name -like “ActiveDirectoryConnectivityConfigDCServerReboot"} | ft name,enabled

     

    Add-GlobalMonitoringOverride -Identity Exchange\ServiceHealthMSExchangeReplForceReboot  -ItemType Responder -PropertyName Enabled -PropertyValue 0 -Duration 60.00:00:00

    (Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/responderdefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ?{$_.Name -like “ServiceHealthMSExchangeReplForceReboot"} | ft name,enabled

     

    Add-GlobalMonitoringOverride -Identity Exchange\ServiceHealthActiveManagerForceReboot -ItemType Responder -PropertyName Enabled -PropertyValue 0 -Duration 60.00:00:00

    (Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/responderdefinition | % {[XML]$_.toXml()}).event.userData.eventXml | ?{$_.Name -like “ServiceHealthActiveManagerForceReboot"} | ft name,enabled

    Tuesday, October 29, 2013 6:56 AM
  • Dear,

    So that means i have to disable the above 3 mentioned on all servers i.e..., mailbox as well transport servers since all of them are facing random reboots?

    Thanks in Advance


    Apoorv Mehrotra

    Tuesday, October 29, 2013 7:03 AM
  • you only need to launch them once.
    They'll be applied to all servers.
    Tuesday, October 29, 2013 7:23 AM
  • I meet the same BSOD with you,  exchange 2013 CU2 machine also have configred DAG, and most of  dump file result also the same with you. iBSOD is case by WINNT.EXE and "FAILURE_BUCKET_ID:  0xEF_MSExchangeHMWo_IMAGE_wininit.exe" .  I hop microsft will fix it as soon as possible.
    Thursday, December 12, 2013 1:25 AM
  • Hi,

    Follow the above procedure in order to resolve this; however it is only going to work for 60 days then you need to redo the command.

    Secondly CU3 is out; i haven't had time to read through the full release highlights however it might have the fix for this random reboot behavior.

    If someone in this thread has already applied CU3, please share the result


    Apoorv Mehrotra

    Thursday, December 12, 2013 5:01 AM
  • We have CU3 installed on our system and it sure HAS NOT resolved the issue. Dump files still point to wininit.exe.

    To add more to this case, event log is pointing to ForceReboot-<servername>1-ServiceHealthMSExchangeReplForceReboot: Throttling rejected the operation

    (Windows Logs>Application and Service Logs>Microsoft Exchange>Managed Availability)

    Our serves are balanced and in DAG mode. We too have to suppress a probe but understand that this is merely a workaround to the problem.

    Does anyone else with CU3 have a (better) update?
    • Edited by edminister Monday, March 17, 2014 11:30 AM
    Monday, March 17, 2014 11:17 AM
  • edminister,

    Exchange 2013 Sp1 was released, try to deploy with SP1.

    http://www.microsoft.com/en-us/download/details.aspx?id=41994


    Yavuz Eren Demir

    Monday, March 17, 2014 11:20 AM
  • I can confirm this.

    We've three Ex2013 server in a DAG updated to CU3m and we got two BSOD on 10 days.

    Also, I applied the solution above (which is described in KB 2883203) and the result is controversial:

     

     The serrings is still "Enabled" (value = 1). Does anyone know if this is normal behaviour for CU3?

    We can't upgrade to SP1, as we've 3rd party toold that currently are not supporting SP1.

    Any help would be highly appreciated.

    Thanks!

    Thursday, March 20, 2014 10:15 AM
  • I installed A new Dag on our brand new Exchange 2013 servers, everything looked great, 5 minutes later, BSOD (CRITICAL_PROCESS_DIED) on server1, then a moment later BSOD  (CRITICAL_PROCESS_DIED) on server2.

    We have SP1 for 2013 installed by the way already.

    I also get the same result when checking the settings with:

    [PS] C:\>(Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/responderdefinition | % {[XML]$_.toXml()}).event.use
    rData.eventXml | ?{$_.Name -like "ActiveDirectoryConnectivityConfigDCServerReboot"} | ft name,enabled

    Name                                                                           Enabled
    ----                                                                             -------
    ActiveDirectoryConnectivityConfigDCServerReboot             1

    I did set:

    [PS] C:\>Add-GlobalMonitoringOverride -Identity Exchange\ActiveDirectoryConnectivityConfigDCServerReboot -ItemType Respo
    nder -PropertyName Enabled -PropertyValue 0 -ApplyVersion "15.0.712.24"

    A day before running the check.




    • Edited by NathanHD Monday, March 31, 2014 9:37 PM
    Monday, March 31, 2014 9:34 PM
  • Has anyone found a solution to this issue. I have 2 exchange 2013 sp1 servers in a DAG and i am getting the BSOD for wininit.exe on the secondary server only.

    I have tried the above mentioned cmdlets and they don't change the enabled value which is still set to 1.

    Please advise!

    Wednesday, May 14, 2014 6:11 PM
  • Hi,

    I think our solution was to enable 'register this connection's dns suffix' in the IPv4 properties for the production NIC. Even though DNS entries were created for the DAG and both hosts, when we did a test-replicationhealth | fl the TCPListener would fail because it could not find the IP address for the host in DNS.

    After enabling register this connection.. the test passed immediately. I suspect this was the cause of the BSOD (for us at least).

    I hope that helps and makes sense!

    Wednesday, May 28, 2014 11:03 PM
  • Anyone able to confirm if this works?  I have the same issue, brand new three member DAG with SP1.

    I have an Infoblox appliance that doesn't allow DNS registration at the moment, so I always uncheck this. 

    Thursday, June 19, 2014 2:13 PM
  • Anyone able to confirm if this works?  I have the same issue, brand new three member DAG with SP1.

    I have an Infoblox appliance that doesn't allow DNS registration at the moment, so I always uncheck this. 

    I couldn't see any changes in our DNS records after enabling this setting on the MSX servers so it may not actually register anything anyway. So long as you have manually configured the DNS records correctly, you could tick the box and see if it helps..

    Of course you would want to make sure you are seeing the test-replicationhealth TCPListener failed for the same reason I mentioned above (and after ticking the register this connection the TCPListener test should pass immediately).

    If you are getting other failures in test-replicationhealth then your mileage may vary.

    Good luck!

    Thursday, June 19, 2014 10:57 PM