none
Ноды кластера потеряли друг-друга RRS feed

  • Общие обсуждения

  • Приветствую!

    Возникла проблема в работе кластера Failover Cluster на Server 2012R2. Ноды потеряли друг друга и каждая попыталась взять кластерный ресурс на себя. В следствии чего, у одной ноды эо не получилось(так как ресурс уже был занят другой) и она упала в паузу, но переодический продолжала попытки подключения к ресурсу.

    Обровов сети не наблюдалось, а кворум построен через сетевой диск котороый примапленый через iscsi.

    Ниже лог машины которая держала ресур кластера.

    Лог ноды которая отвалилась и пыталась неудачно перехватить ресурсы кластера во вложении

    00000634.00000bd0::2019/11/01-07:54:34.736 INFO  [ACCEPT] :::~3343~: Accepted inbound connection from remote endpoint AAAA::AAAA:AAAA:AAAAA::~64347~.
    00000634.00000894::2019/11/01-07:54:34.876 INFO  [ACCEPT] 0.0.0.0:~3343~: Accepted inbound connection from remote endpoint 10.1.50.38:~64387~.
    00000634.00000be0::2019/11/01-07:54:34.889 WARN  [CHANNEL fe80::5962:4ff3:f091:4b6%14:~55424~] failure, status (10054)
    00000634.00000bd0::2019/11/01-07:54:34.889 INFO  [ACCEPT] :::~3343~: Accepted inbound connection from remote endpoint AAAA::AAAA:AAAA:AAAAA:~64351~.
    00000634.00000bd0::2019/11/01-07:54:34.889 INFO  [ACCEPT] :::~3343~: Accepted inbound connection from remote endpoint AAAA::AAAA:AAAA:AAAAA:~64349~.
    00000634.000017c0::2019/11/01-07:54:35.197 INFO  [SV] Route local (:~3343~) to remote  (~) exists. Forwarding to alternate path.
    00000634.00000be0::2019/11/01-07:54:35.562 INFO  [PULLER NodeA] А
    00000634.00000be0::2019/11/01-07:54:35.562 ERR   [NODE] Node 2: Connection to Node 1 is broken. Reason Closed(1236)' because of 'channel to remote endpoint fe80::5962:4ff3:f091:4b6%14:~55424~ has failed with status (10054)'
    00000634.00000be0::2019/11/01-07:54:35.580 WARN  [NODE] Node 2: Initiating reconnect with n1.
    00000634.00000be0::2019/11/01-07:54:35.580 INFO  [MQ-Node1 ] Pausing
    00000634.00000be0::2019/11/01-07:54:35.580 INFO  [SV] New real route: local (:~3343~) to remote  (:~64387~).
    00000634.00000be0::2019/11/01-07:54:35.603 INFO  [SV] Got a new incoming stream from :~64387~
    00000634.00001a84::2019/11/01-07:54:35.603 INFO  [SV] Route local (:~3343~) to remote  (:~64351~) exists. Forwarding to alternate path.
    00000634.00000be0::2019/11/01-07:54:37.227 INFO  [SV] Authentication and authorization were successful
    00000634.00000be0::2019/11/01-07:54:37.838 INFO  [SV] Security Handshake successful while obtaining SecurityContext for NetFT driver
    00000634.00000be0::2019/11/01-07:54:37.838 INFO  [VER] Got new TCP connection. Exchanging version data.
    00000634.00000b3c::2019/11/01-07:54:37.838 INFO  [IM] got event: LocalEndpoint NodeA:~3343~ has missed two consecutive heartbeats from 10.1.50.38:~3343~
    00000634.00000b3c::2019/11/01-07:54:37.838 INFO  [CHM] Received notification for two consecutive missed HBs to the remote endpoint 10.1.50.38:~3343~ from 10.1.50.39:~3343~
    00000634.00000be0::2019/11/01-07:54:37.838 INFO  [VER] Checking version compatibility for node Node1 id 1 with following versions: highest [Major 8 Minor 9600 Upgrade 3 ClusterVersion 0x00082580], lowest [Major 8 Minor 9600 Upgrade 3 ClusterVersion 0x00082580].
    00000634.00000be0::2019/11/01-07:54:37.838 INFO  [VER] Version check passed: node and cluster highest supported versions match.
    00000634.00000be0::2019/11/01-07:54:37.893 INFO  [SV] Negotiating message security level.
    00000634.00000be0::2019/11/01-07:54:38.097 INFO  [SV] Already protecting connection with message security level 'Sign'.
    00000634.00000be0::2019/11/01-07:54:38.127 INFO  [FTI] Got new raw TCP/IP connection.
    00000634.00000be0::2019/11/01-07:54:38.174 INFO  [FTI][Follower] This node (2) is not the initiator
    00000634.00000be0::2019/11/01-07:54:38.191 WARN  [FTI][Follower] Ignoring duplicate connection: route to remote node found
    00000634.00000be0::2019/11/01-07:54:38.191 INFO  [CHANNEL :~64387~] graceful close, status (of previous failure, may not indicate problem) (0)
    00000634.00000be0::2019/11/01-07:54:38.814 INFO  [CORE] Node 2: Clearing cookie 218dbbc5-e093-48d1-a4d9-50e4f71eb426
    00000634.00000be0::2019/11/01-07:54:38.924 WARN  mscs::ListenerWorker::operator (): GracefulClose(1226)' because of 'channel to remote endpoint 10.1.50.38:~64387~ is closed'
    00000634.00000ec4::2019/11/01-07:54:38.955 INFO  [Reconnector-Node1 ] Reconnector from epoch 1 to epoch 2 waited 03.000 so far.
    00000634.00000740::2019/11/01-07:54:38.955 INFO  [SV] Route local (:~3343~) to remote  (:~64349~) exists. Forwarding to alternate path.
    00000634.00000ec4::2019/11/01-07:54:38.994 INFO  [CHM] Sending route weight vector for nodes (1 2) to nodes (1)
    00000634.00000ec4::2019/11/01-07:54:40.088 INFO  [CHM] Sending route weight vector for nodes (1 2) to nodes (1)
    00000634.00001710::2019/11/01-07:54:40.158 INFO  [DCM] HandleSweeperRecheck
    00000634.00001710::2019/11/01-07:54:40.158 INFO  [CLI] LsaCallAuthenticationPackage: 0, 0 size: 4, buffer: HDL( 1108c10000 )
    00000634.00001710::2019/11/01-07:54:40.986 INFO  [Reconnector-Node1 ] Reconnector from epoch 1 to epoch 2 waited 05.000 so far.
    00000634.00000ec4::2019/11/01-07:54:42.564 INFO  [CHM] Sending route weight vector for nodes (1 2) to nodes (1)
    00000634.00001b1c::2019/11/01-07:54:42.971 INFO  [Reconnector-Node1 ] Reconnector from epoch 1 to epoch 2 waited 07.000 so far.
    00000634.00000ec4::2019/11/01-07:54:44.962 INFO  [Reconnector-Node1 ] Reconnector from epoch 1 to epoch 2 waited 09.000 so far.
    00000634.00000b5c::2019/11/01-07:54:45.424 DBG   [NETFTAPI] Signaled NetftRemoteUnreachable event, local address :3343 remote address :3343
    00000634.00000b3c::2019/11/01-07:54:45.439 INFO  [IM] got event: Remote endpoint :~3343~ unreachable from :~3343~
    00000634.00000b3c::2019/11/01-07:54:45.439 INFO  [IM] Marking Route from :~3343~ to :~3343~ as down
    00000634.00000b3c::2019/11/01-07:54:45.455 INFO  [NDP] Checking to see if all routes for route (virtual) local fe80::dc1e:f7b9:b1c9:5f76:~0~ to remote fe80::5962:4ff3:f091:4b6:~0~ are down
    00000634.00000b3c::2019/11/01-07:54:45.455 INFO  [NDP] All routes for route (virtual) local fe80::dc1e:f7b9:b1c9:5f76:~0~ to remote fe80::5962:4ff3:f091:4b6:~0~ are down
    00000634.00000b38::2019/11/01-07:54:45.471 INFO  [CORE] Node 2: executing node 1 failed handlers on a dedicated thread
    00000634.00000b38::2019/11/01-07:54:45.533 INFO  [NODE] Node 2: Cleaning up connections for n1.
    00000634.00000b38::2019/11/01-07:54:45.533 INFO  [MQ-Node1 ] Clearing 3 unsent and 0 unacknowledged messages.
    00000634.00000b38::2019/11/01-07:54:45.533 INFO  [NODE] Node 2: n1 node object is closing its connections
    00000634.00000b38::2019/11/01-07:54:45.533 INFO  [NODE] Node 2: closing n1 node object channels
    00000634.00000b38::2019/11/01-07:54:45.533 INFO  [CORE] Node 2: Clearing cookie 218dbbc5-e093-48d1-a4d9-50e4f71eb426
    00000634.00000b38::2019/11/01-07:54:45.533 INFO  [StreamDb] Cleaning all routes for route (virtual) local :~0~ to remote:~0~
    00000634.00000b38::2019/11/01-07:54:45.642 INFO  [NETFT] Route <struct mscs::FaultTolerantRoute>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO    <realLocal>10.1.50.39:~3343~</realLocal>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO    <realRemote>10.1.50.38:~3343~</realRemote>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO    <virtualLocal>fe80::dc1e:f7b9:b1c9:5f76:~0~</virtualLocal>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO    <virtualRemote>fe80::5962:4ff3:f091:4b6:~0~</virtualRemote>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO    <Delay>1000</Delay>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO    <Threshold>10</Threshold>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO    <Priority>79840</Priority>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO    <Attributes>2147483649</Attributes>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO  </struct mscs::FaultTolerantRoute>
    00000634.00000b38::2019/11/01-07:54:45.642 INFO   removed
    00000634.00000b38::2019/11/01-07:54:45.642 INFO  [NODE] Node 2: Pausing queue sending for n1.
    00000634.00000b3c::2019/11/01-07:54:45.650 INFO  [IM] Not sending connectivity report for probe routes, unknown adapters, and disconnected adapters
    00000634.00000b48::2019/11/01-07:54:46.189 INFO  [RGP] node 2: Node Disconnected 1 00000000000000000000000000000000000000000000000000000000000000100
    00000634.00000b48::2019/11/01-07:54:46.189 INFO  [RGP] node 2: MergeAndRestart +() -()
    00000634.00000b44::2019/11/01-07:54:46.221 INFO  [CORE] Node 2: Proposed View is <ViewChanged joiners=() downers=() newView=11602(1 2) oldView=11501(1 2) joiner=false form=false/>
    00000634.00000bdc::2019/11/01-07:54:46.611 INFO  [RGP] node 2: Tick
    00000634.00000bdc::2019/11/01-07:54:46.611 INFO  [RGP] node 2: considering shortcut for stragglers (1)
    00000634.00000bdc::2019/11/01-07:54:46.611 INFO  [RGP] node 2: I see no issue with 1. no shortcut
    00000634.00000bdc::2019/11/01-07:54:46.611 INFO  [RGP] sending to 64 nodes 2: 11501(1 2) => 11602(1 2) +() -() [(2)] (2 1)
    00000634.00000be0::2019/11/01-07:54:46.861 DBG   [NETFTAPI] received NsiParameterNotification for 169.254.2.59 (IpDadStateDeprecated)
    
    

    00003680.00002c78::2019/11/01-07:54:36.125 INFO  [IM] got event: LocalEndpoint :~3343~ has missed two consecutive heartbeats from :~3343~
    00003680.00002c78::2019/11/01-07:54:36.125 INFO  [CHM] Received notification for two consecutive missed HBs to the remote endpoint :~3343~ from :~3343~
    00003680.000061b4::2019/11/01-07:54:36.125 INFO  [CHM] My weights have changed: <vector len='65'>
    00003680.000061b4::2019/11/01-07:54:36.125 INFO      <item>0</item>
    00003680.000061b4::2019/11/01-07:54:36.125 INFO      <item>0</item>
    00003680.000061b4::2019/11/01-07:54:36.125 INFO      <item>98</item>


    А сам класер ввывел следующие сообщение что каждая из нод потеряла вторую ноду из кластера. Ошибок по Application или System не обноруженно.

    Буду благодарен за любые советы куда можно еще копнуть?

    4 ноября 2019 г. 8:24

Все ответы