locked
VMM - Cluster - Host not responding RRS feed

  • Question

  • Hi guys

    Something weird is happening with our VMM installation. We currently have 2 clusters in place.

    Core Cluster

    RGPH101

    RGPH102

    RGPH103

    RGPH104

    RGPH105

    RGPH106

    Dev Cluster

    RGPH201

    RGPH202

    RGPH203

    RGPH204

    RGPH205

    All of the hosts except RGPH101 are showing "Host not responding" (WinRM issues). I have checked and confirmed on multiple hosts that WinRM is working.

    I have removed RGPH106 from the cluster, reset it back up and tried to rejoin it but it won't let me. It started off being healthy and after 30 minutes that host also became "Host not responding".

    I have tried updating some of the hosts incase its a patch issue and this has had no affect.

    Any advice to resolve this? Servers and VMM server have been rebooted, WinRM services rebooted.

    Thanks
    Alec

    Wednesday, August 15, 2018 2:06 PM

All replies

  • Hi,

    Could you post the exact errors and error IDs your VMM is giving you about the WinRM?

    In which state are the VMM host agents?


    Also provide us with OS versions, VMM version & build.

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, August 15, 2018 2:14 PM
  • Hi

    The hosts are all saying "Host Not Responding" but they are hosting servers that are running fine.

    The WinRM issue error:

    Error (20506)
    Virtual Machine Manager cannot complete the Windows Remote Management (WinRM) request on the computer RGPH106.ridgiandm1.ridgian.co.uk.


    Recommended Action
    Ensure that the Windows Remote Management (WinRM) service and the Virtual Machine Manager Agent service are installed and running. If a firewall is enabled on the computer, ensure that the following firewall exceptions have been added: a) Port exceptions for HTTP/HTTPS; b) A program exception for scvmmagent.

    Than

    Wednesday, August 15, 2018 2:26 PM
  • Have you verified that the VMM service account is a member of the local administrators group on the hosts?

    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, August 15, 2018 2:34 PM
  • Hi Leon,

    Yep the VMM service account is local admin on the hosts. Nothing has really changed in terms of accounts and security.

    Thanks

    Wednesday, August 15, 2018 2:39 PM

  • Hi Leon

    I have followed that article and I can test WinRM access from the VMM server and the hosts which both work either way. WinRM is functioning as expected yet VMM still showing that it can't manage the host due to WiNRM issue.

    Thanks

    Thursday, August 16, 2018 9:44 AM
  • It might be a cert related issue that you are running into, and this can be fixed by running:

    $Credentials = Get-Credentials
    Get-SCVMMManagedComputer | Register-SCVMMManagedComputer -Credential $Credentials



    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, August 16, 2018 1:10 PM
  • Hi Leon,

    Just tried running that command on the VMM server and received the following errors (There is more but I copied the first few)

    PS C:\Windows\system32> Get-SCVMMManagedComputer | Register-SCVMMManagedComputer -Credential $Credential
    Register-SCVMMManagedComputer : VMM is unable to complete the request. The connection to the agent
    RGPH106.ridgiandm1.ridgian.co.uk was lost.
    WinRM: URL: [http://rgph106.ridgiandm1.ridgian.co.uk:5985], Verb: [ENUMERATE], Resource:
    [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_OperatingSystem], Filter: []
     (Error ID: 2916, Detailed Error: Unknown error (0x80338126))

    Ensure that the Windows Remote Management (WinRM) service and the VMM agent are installed and running and that a
    firewall is not blocking HTTP/HTTPS traffic. Ensure that VMM server is able to communicate with
    RGPH106.ridgiandm1.ridgian.co.uk over WinRM by successfully running the following command:
     winrm id -r:RGPH106.ridgiandm1.ridgian.co.uk
    This problem can also be caused by a Windows Management Instrumentation (WMI) service crash. If the server is running
    Windows Server 2008 R2, ensure that KB 982293 (http://support.microsoft.com/kb/982293) is installed on it.
    If the error persists, restart RGPH106.ridgiandm1.ridgian.co.uk and then try the operation again. /nRefer to
    http://support.microsoft.com/kb/2742275 for more details.

    To restart the job, run the following command:
    PS> Restart-Job -Job (Get-VMMServer rgpsvr-vmm.ridgiandm1.ridgian.co.uk | Get-Job | where { $_.ID -eq
    "{c24bcb7c-5a68-4440-9fdc-1bdb05ee4fce}"})
    At line:1 char:28
    + Get-SCVMMManagedComputer | Register-SCVMMManagedComputer -Credential $Credential
    +                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : ReadError: (:) [Register-SCVMMManagedComputer], CarmineException
        + FullyQualifiedErrorId : 2916,Microsoft.SystemCenter.VirtualMachineManager.Cmdlets.ReassociateAgentCmdlet
    Register-SCVMMManagedComputer : An internal error has occurred trying to contact the rgph102.ridgiandm1.ridgian.co.uk
    server: : .
    WinRM: URL: [http://rgph102.ridgiandm1.ridgian.co.uk:5985], Verb: [ENUMERATE], Resource:
    [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_OperatingSystem], Filter: []
     (Error ID: 2912, Detailed Error: The request is not supported (0x80070032))

    Check that WS-Management service is installed and running on server rgph102.ridgiandm1.ridgian.co.uk. For more
    information use the command "winrm helpmsg hresult". If rgph102.ridgiandm1.ridgian.co.uk is a host/library/update
    server or a PXE server role then ensure that VMM agent is installed and running. Refer to
    http://support.microsoft.com/kb/2742275 for more details.

    To restart the job, run the following command:
    PS> Restart-Job -Job (Get-VMMServer rgpsvr-vmm.ridgiandm1.ridgian.co.uk | Get-Job | where { $_.ID -eq
    "{6f1bbfac-c2f0-41c1-b308-75cde9b97e1a}"})
    At line:1 char:28
    + Get-SCVMMManagedComputer | Register-SCVMMManagedComputer -Credential $Credential
    +                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : ReadError: (:) [Register-SCVMMManagedComputer], CarmineException
        + FullyQualifiedErrorId : 2912,Microsoft.SystemCenter.VirtualMachineManager.Cmdlets.ReassociateAgentCmdlet
    Register-SCVMMManagedComputer : An internal error has occurred trying to contact the RGPH105.ridgiandm1.ridgian.co.uk
    server: : .
    WinRM: URL: [http://rgph105.ridgiandm1.ridgian.co.uk:5985], Verb: [ENUMERATE], Resource:
    [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_OperatingSystem], Filter: []
     (Error ID: 2912, Detailed Error: The request is not supported (0x80070032))

    Check that WS-Management service is installed and running on server RGPH105.ridgiandm1.ridgian.co.uk. For more
    information use the command "winrm helpmsg hresult". If RGPH105.ridgiandm1.ridgian.co.uk is a host/library/update
    server or a PXE server role then ensure that VMM agent is installed and running. Refer to
    http://support.microsoft.com/kb/2742275 for more details.

    To restart the job, run the following command:
    PS> Restart-Job -Job (Get-VMMServer rgpsvr-vmm.ridgiandm1.ridgian.co.uk | Get-Job | where { $_.ID -eq

    Thursday, August 16, 2018 1:15 PM
  • Have you checked/monitored the firewalls that there's nothing blocking the connection?

    You could try to reinstall the agent manually on one of your hosts to see if it helps.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, August 16, 2018 1:20 PM
  • Hi Leon,

    Tested with Firewall on and off including exclusions for winRM management ports.
    Also tried reinstalling the agent which has no affect either.

    I have noticed that all of my not responding hosts are the same virtualisation version which is different to the host that is working.

    RGPH101 - 6.3.9600.19000
    Other hosts - 6.3.9600.18623

    How do I update the "Virtualisation software version"


    Thanks


    • Edited by AlecJ Friday, August 17, 2018 8:45 AM Added more
    Friday, August 17, 2018 8:35 AM
  • Fixed the issue.

    Turned out to be the CredSSP update blocking the connections from VMM to hosts.
    I made Encryption Oracle Remediation policy to Enabled, and Protection Level to Vulnerable:

    which instantly fixed the issue.

    Thanks for the help!

    • Proposed as answer by Leon Laude Friday, August 17, 2018 9:24 AM
    Friday, August 17, 2018 9:09 AM
  • Glad to hear you located the problem, and thank you for sharing!

    Blog: https://thesystemcenterblog.com LinkedIn:

    Friday, August 17, 2018 9:17 AM