none
how to check RCA for heartbeat missing RRS feed

  • Question

    • \

    • The cluster service was halted to prevent an inconsistency within the failover cluster . the error code was 1359

    • Server : windows 2016

      As per my investigation , the  network adapter reset issue was observed at the same timestamp i.e., 3:18:26 AM on 07-01-2019. Please be informed that cluster logs timezone will be in GMT timezone.


      00000c64.00001950::2019/07/01-07:18:33.587 INFO  [IM - Cluster Network 1] Resetting interface state calculation state

      00000c64.00001950::2019/07/01-07:18:33.587 INFO  [IM] Leader is sending request for all interfaces in the current view

      00000c64.00000b44::2019/07/01-07:18:33.587 INFO  [DCM] Force disconnect payload: netname \xxxxxxx, requested disconnect status (0), src <null>, dest <null>

      00000c64.00000b44::2019/07/01-07:18:33.587 ERR   [DCM] Force disconnect failed on DisconnectSmbInstance::CSV, status (c000000d)

      00000c64.00000b44::2019/07/01-07:18:33.587 INFO  [DCM] Force disconnect(DisconnectAll): server \169.254.2.228, DisconnectSmbInstance::CSV

      00000c64.00000b44::2019/07/01-07:18:33.587 INFO  [DCM] Releasing RDR handle for target node id 2

      .000006ec::2019/07/01-07:19:02.884 ERR   [NODE] Node 1: Connection to Node 2 is broken. Reason (10054)' because of 'channel to remote endpoint 169.254.2.228:~3343~ has failed with status 10054'

      00000c64.000006ec::2019/07/01-07:19:02.884 WARN  [NODE] Node 1: Initiating reconnect with n2.

      00000c64.000006ec::2019/07/01-07:19:02.884 INFO  [MQ-thpqhms0] Pausing

      00000c64.000008dc::2019/07/01-07:19:02.884 INFO  [Reconnector-thpqhms0] Reconnector from epoch 1 to epoch 2 waited 00.000 so far.

      00000c64.00001930::2019/07/01-07:19:03.012 INFO  [IM] got event: Node with FaultTolerantAddress xxxxx:~0~ has gone down with fatal error\crash

      00000c64.00001930::2019/07/01-07:19:03.013 ERR   [IM] Couldn't find node id for remote virtual IP xxxxxxxx:~0~

      0000194c::2019/07/01-07:19:14.683 DBG   [NETFTAPI] Signaled NetftRemoteUnreachable event, local address 10.81.64.153:3343 remote address 10.81.65.25:3343

      00000c64.00001930::2019/07/01-07:19:14.683 INFO  [IM] got event: Remote endpoint 10.81.65.25:~3343~ unreachable from xxxxx

      00000c64.00001930::2019/07/01-07:19:14.683 INFO  [NDP] Checking to see if all routes for route (virtual) local xxxxx:~0~ to remote 169.254.2.228:~0~ are down

      00000c64.00001930::2019/07/01-07:19:14.683 WARN  [NDP] All routes for route (virtual) local 169.254.1.43:~0~ to remote xxxxxxxxx:~0~ are down

      00000c64.00001924::2019/07/01-07:19:14.683 INFO  [CORE] Node 1: executing node 2 failed handlers on a dedicated thread

    • Also found this in event logs :

      07-02-2019           7:20:42 AM           Warning thpqghs0.prod.travp.net     10400    Microsoft-Windows-NDIS   N/A         N/A         The network interface 'vmxnet3 Ethernet Adapter' has begun resetting.  There will be a momentary disruption in network connectivity while the hardware resets. Reason: The network driver detected that its hardware has stopped responding to commands. This network interface has reset 1 time(s) since it was last initialized.

    Please let me know if this causing the issue

    Monday, August 19, 2019 5:20 PM

All replies

  • Did you run the cluster validation test? Are there any errors? 

    I would recommend checking your physical network (switches, adapters, cables, etc.), also updating your network adapter drivers and firmware. 


    Microsoft Certified Professional

    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post or click Answered "Vote as helpful" button of that post. By marking a post as Answered or Helpful, you help others find the answer faster. ]

    Tuesday, August 20, 2019 5:45 AM
  • Validate Network Communication

      Please find the Network validation test had some warnings in network
      Analyzing connectivity results ...
      Node xxxxxx is reachable from Nodexxxxx by only one pair of network interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available, or consider adding additional networks to the cluster.
      Following are the connectivity checks made using UDP on port 3343 from network interfaces on node xxxxxxx o network interfaces on node xxxxxxt
      Result Source Interface Name Source IP Address Destination Interface Name Destination IP Address Same Cluster Network Packet Loss (%)
      Success xxxxxx - Ethernet0                   xxxxx txxxxx- Ethernet0 xxxxx True 0
      Node xxxxx is reachable from Node xxxxxxx by only one pair of network interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available, or consider adding additional networks to the cluster.
      Following are the connectivity checks made using UDP on port 3343 from network interfaces on node xxxxxx to network interfaces on node xxxxxxxx
      Result Source Interface Name Source IP Address Destination Interface Name Destination IP Address Same Cluster Network Packet Loss (%)
      Success xxxxxx  - Ethernet0xxxxx xxxxxx - Ethernet0                              xxx True 0


    Tuesday, August 20, 2019 2:07 PM
  • Server environment is  windows 2016

    Can we ignore the below error as the  link " https://support.microsoft.com/en-us/help/2710487/error-1359-and-the-cluster-service-stops-in-a-windows-server-2008-or-w " shows for older version of windows


    Server

    Log Type

    Time

    Entry Type

    Provider Name

    Event ID

    Message

    xxxxxx

    System

    8/5/2019 03:46:11

    Critical

    Microsoft-Windows-FailoverClustering

    1073

    The Cluster service was halted to prevent an inconsistency within the failover cluster. The error code was '1359'.

    Wednesday, August 21, 2019 2:21 PM
  • Was this a one time event or is it repeating? 

    Did you manage to update network drivers and firmware? 


    Microsoft Certified Professional

    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post or click Answered "Vote as helpful" button of that post. By marking a post as Answered or Helpful, you help others find the answer faster. ]

    Thursday, August 22, 2019 5:14 AM
  • Hello Matej,

    Cx is very concerned about the event id : 1073, once he get confirmation , he won't be able to check with vmware network team for updating n/w drivers

    Repeatedly we are getting the event id : 1073  in the server os : windows 2016

    PFB event logs prior to the event id :1073


    07-02-2019 1:22:31 AM Error xxxx.xxx 1069 Microsoft-Windows-FailoverClustering Resource Control Manager NT AUTHORITY\SYSTEM Cluster resource 'SQLAG_1' of type 'SQL Server Availability Group' in clustered role 'SQLAG_1' failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
    07-02-2019 1:23:51 AM Information xxxxx Microsoft-Windows-Kernel-General N/A NT AUTHORITY\SYSTEM The system time has changed to ?2019?-?07?-?02T05:23:51.493000000Z from ?2019?-?07?-?02T05:22:35.226500800Z. Change Reason: An application or system component changed the time.
    07-02-2019 1:25:01 AM Information xxxxx Microsoft-Windows-Kernel-General N/A NT AUTHORITY\SYSTEM The system time has changed to ?2019?-?07?-?02T05:25:01.082000000Z from ?2019?-?07?-?02T05:23:58.359771500Z. Change Reason: An application or system component changed the time.
    07-02-2019 1:25:03 AM Critical xxxxx 1135 Microsoft-Windows-FailoverClustering Node Mgr NT AUTHORITY\SYSTEM Cluster node xxxx was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
    07-02-2019 1:25:05 AM Critical xxxxx 1073 Microsoft-Windows-FailoverClustering N/A NT AUTHORITY\SYSTEM The Cluster service was halted to prevent an inconsistency within the failover cluster. The error code was '1359'.
    1:25:09 AM Error xxx 7024 Service Control Manager N/A N/A The Cluster Service service terminated with the following service-specific error:  An internal error occurred.  
    1:25:09 AM Error xxxxxx t 7031 Service Control Manager N/A N/A The Cluster Service service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 60000 milliseconds: Restart the service.  


    Thursday, August 22, 2019 2:31 PM

  • Cx is very concerned about the event id : 1073, once he get confirmation , he won't be able to check with vmware network team for updating n/w drivers


    You did not mention that you are hosting your VM's on a vmware platform. Unfortunately I cannot help with that, because problem that you are having might also be related to your vmware configuration. Maybe someone else on this forum can help, but I would recommend also opening a thread on vmware support forum or opening a support ticket with Microsoft. 

    Microsoft Certified Professional

    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post or click Answered "Vote as helpful" button of that post. By marking a post as Answered or Helpful, you help others find the answer faster. ]

    Friday, August 23, 2019 5:40 AM
  • Hi,

    Thanks for your reply.

    For now, I still couldn't find any clue, I'm afraid you might need to contact  Microsoft Customer Support Services (CSS) so that a dedicated Support Professional can help you on this issue.

     

    To obtain the phone numbers for specific technology request, please refer to the website listed below:

     

    https://www.microsoft.com/en-us/worldwide.aspx

    Appreciate your support and understanding.

    Best regards,

    Michael


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com



    Thursday, August 29, 2019 8:18 AM
    Moderator