none
Trouble getting Network Load Balancing to work on 3rd node in a second server room RRS feed

  • Question

  • Hi!

    I am having trouble to get the NLB working for a 3rd node (talking about Windows RDP servers).

    Two nodes (two Windows Server 2016 VMs) residing on the same vHost work in the NLB cluster just fine. The 3rd node a VM (Windows Server 2016) resides on a second vHost in a different server room.

    The vSphere hosts are connected via 4 Cisco switches/routers. NLB is set to use multicast. Static ARP entries have been configured by our network team. MAC address tables have been populated with the MAC of the virtual cluster node and a VLAN for this cluster has been setup and configured on all trunks (I myself have no direct access to the shell on the switches) 

    Yet, when node 3 is enabled, the cluster breaks = running RDP sessions get disconnected, new connections to the cluster are not possible - unknown error on RDP connection.

    Installed MS Network Monitor 3.4 - ran it on the third node while node is in NLB stopped mode. I see packets sent to the cluster node being received on the third node as well.

    On the first two nodes I see packets with the protocol NLBHB being sent from both nodes to the cluster node. The only strange thing here is that the source is not a regular IP, but a "VMWare, Inc. B77167 [00-50-56-B7...." . I don't see any messages with the NLBHB protocol on node 3.

    All nodes can reach the other two nodes by name and by ip.
    RDP sessions to the 3rd node work if you connect to it directly but not via cluster name/ip.

    What are my best debugging options? Our server support team says, the cluster needs to be setup from scratch. That does not work as the first two nodes are in production and a downtime is not possible. To setup a new NLB on new servers which are not in production also means a lot of additional effort.

    How can I verify that the network side is setup correctly? I suspect that there is a network trunk somewhere which is not setup correctly. I have double and tripple checked the cluster settings in Windows, I don't see any issue there.

    Thank you!

    Andy

     

    Wednesday, September 9, 2020 9:03 AM

All replies

  • On the first two nodes I see packets with the protocol NLBHB being sent from both nodes to the cluster node. ... I don't see any messages with the NLBHB protocol on node 3.

    So currently I am focusing on this part. I believe the NLBHB packets sent to the cluster must be also received on the 3rd node. As they don't reach the 3rd node I guess that's a proof that the network configuration is not yet right and the packets most probably get dropped somewhere on the route. Waiting for network team to look into this. 

    The other option would be that the packets don't reach the network at all but are routed in VM directly internaly, at least the source of the packet is strange. But then again the network team should be able to figure out that those packets don't reach the first switch.

    Thanks,

    Andy 

    Thursday, September 10, 2020 6:35 AM