Asked by:
VMM - Cluster - Host not responding

Question
-
Hi guys
Something weird is happening with our VMM installation. We currently have 2 clusters in place.
Core Cluster
RGPH101
RGPH102
RGPH103
RGPH104
RGPH105
RGPH106
Dev Cluster
RGPH201
RGPH202
RGPH203
RGPH204
RGPH205
All of the hosts except RGPH101 are showing "Host not responding" (WinRM issues). I have checked and confirmed on multiple hosts that WinRM is working.
I have removed RGPH106 from the cluster, reset it back up and tried to rejoin it but it won't let me. It started off being healthy and after 30 minutes that host also became "Host not responding".I have tried updating some of the hosts incase its a patch issue and this has had no affect.
Any advice to resolve this? Servers and VMM server have been rebooted, WinRM services rebooted.Thanks
AlecWednesday, August 15, 2018 2:06 PM
All replies
-
Hi,
Could you post the exact errors and error IDs your VMM is giving you about the WinRM?
In which state are the VMM host agents?
Also provide us with OS versions, VMM version & build.Best regards,
LeonBlog:
https://thesystemcenterblog.com LinkedIn:
Wednesday, August 15, 2018 2:14 PM -
Hi
The hosts are all saying "Host Not Responding" but they are hosting servers that are running fine.
The WinRM issue error:
Error (20506)
Virtual Machine Manager cannot complete the Windows Remote Management (WinRM) request on the computer RGPH106.ridgiandm1.ridgian.co.uk.
Recommended Action
Ensure that the Windows Remote Management (WinRM) service and the Virtual Machine Manager Agent service are installed and running. If a firewall is enabled on the computer, ensure that the following firewall exceptions have been added: a) Port exceptions for HTTP/HTTPS; b) A program exception for scvmmagent.
Than
Wednesday, August 15, 2018 2:26 PM -
Have you verified that the VMM service account is a member of the local administrators group on the hosts?
Blog:
https://thesystemcenterblog.com LinkedIn:
Wednesday, August 15, 2018 2:34 PM -
Hi Leon,
Yep the VMM service account is local admin on the hosts. Nothing has really changed in terms of accounts and security.
Thanks
Wednesday, August 15, 2018 2:39 PM -
Have you had a look at the following troubleshooting site?
Blog:
https://thesystemcenterblog.com LinkedIn:
Wednesday, August 15, 2018 10:07 PM -
Hi Leon
I have followed that article and I can test WinRM access from the VMM server and the hosts which both work either way. WinRM is functioning as expected yet VMM still showing that it can't manage the host due to WiNRM issue.
Thanks
Thursday, August 16, 2018 9:44 AM -
It might be a cert related issue that you are running into, and this can be fixed by running:
$Credentials = Get-Credentials Get-SCVMMManagedComputer | Register-SCVMMManagedComputer -Credential $Credentials
Blog:
https://thesystemcenterblog.com LinkedIn:
Thursday, August 16, 2018 1:10 PM -
Hi Leon,
Just tried running that command on the VMM server and received the following errors (There is more but I copied the first few)
PS C:\Windows\system32> Get-SCVMMManagedComputer | Register-SCVMMManagedComputer -Credential $Credential
Register-SCVMMManagedComputer : VMM is unable to complete the request. The connection to the agent
RGPH106.ridgiandm1.ridgian.co.uk was lost.
WinRM: URL: [http://rgph106.ridgiandm1.ridgian.co.uk:5985], Verb: [ENUMERATE], Resource:
[http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_OperatingSystem], Filter: []
(Error ID: 2916, Detailed Error: Unknown error (0x80338126))
Ensure that the Windows Remote Management (WinRM) service and the VMM agent are installed and running and that a
firewall is not blocking HTTP/HTTPS traffic. Ensure that VMM server is able to communicate with
RGPH106.ridgiandm1.ridgian.co.uk over WinRM by successfully running the following command:
winrm id -r:RGPH106.ridgiandm1.ridgian.co.uk
This problem can also be caused by a Windows Management Instrumentation (WMI) service crash. If the server is running
Windows Server 2008 R2, ensure that KB 982293 (http://support.microsoft.com/kb/982293) is installed on it.
If the error persists, restart RGPH106.ridgiandm1.ridgian.co.uk and then try the operation again. /nRefer to
http://support.microsoft.com/kb/2742275 for more details.
To restart the job, run the following command:
PS> Restart-Job -Job (Get-VMMServer rgpsvr-vmm.ridgiandm1.ridgian.co.uk | Get-Job | where { $_.ID -eq
"{c24bcb7c-5a68-4440-9fdc-1bdb05ee4fce}"})
At line:1 char:28
+ Get-SCVMMManagedComputer | Register-SCVMMManagedComputer -Credential $Credential
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ReadError: (:) [Register-SCVMMManagedComputer], CarmineException
+ FullyQualifiedErrorId : 2916,Microsoft.SystemCenter.VirtualMachineManager.Cmdlets.ReassociateAgentCmdlet
Register-SCVMMManagedComputer : An internal error has occurred trying to contact the rgph102.ridgiandm1.ridgian.co.uk
server: : .
WinRM: URL: [http://rgph102.ridgiandm1.ridgian.co.uk:5985], Verb: [ENUMERATE], Resource:
[http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_OperatingSystem], Filter: []
(Error ID: 2912, Detailed Error: The request is not supported (0x80070032))
Check that WS-Management service is installed and running on server rgph102.ridgiandm1.ridgian.co.uk. For more
information use the command "winrm helpmsg hresult". If rgph102.ridgiandm1.ridgian.co.uk is a host/library/update
server or a PXE server role then ensure that VMM agent is installed and running. Refer to
http://support.microsoft.com/kb/2742275 for more details.
To restart the job, run the following command:
PS> Restart-Job -Job (Get-VMMServer rgpsvr-vmm.ridgiandm1.ridgian.co.uk | Get-Job | where { $_.ID -eq
"{6f1bbfac-c2f0-41c1-b308-75cde9b97e1a}"})
At line:1 char:28
+ Get-SCVMMManagedComputer | Register-SCVMMManagedComputer -Credential $Credential
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ReadError: (:) [Register-SCVMMManagedComputer], CarmineException
+ FullyQualifiedErrorId : 2912,Microsoft.SystemCenter.VirtualMachineManager.Cmdlets.ReassociateAgentCmdlet
Register-SCVMMManagedComputer : An internal error has occurred trying to contact the RGPH105.ridgiandm1.ridgian.co.uk
server: : .
WinRM: URL: [http://rgph105.ridgiandm1.ridgian.co.uk:5985], Verb: [ENUMERATE], Resource:
[http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_OperatingSystem], Filter: []
(Error ID: 2912, Detailed Error: The request is not supported (0x80070032))
Check that WS-Management service is installed and running on server RGPH105.ridgiandm1.ridgian.co.uk. For more
information use the command "winrm helpmsg hresult". If RGPH105.ridgiandm1.ridgian.co.uk is a host/library/update
server or a PXE server role then ensure that VMM agent is installed and running. Refer to
http://support.microsoft.com/kb/2742275 for more details.
To restart the job, run the following command:
PS> Restart-Job -Job (Get-VMMServer rgpsvr-vmm.ridgiandm1.ridgian.co.uk | Get-Job | where { $_.ID -eqThursday, August 16, 2018 1:15 PM -
Have you checked/monitored the firewalls that there's nothing blocking the connection?
You could try to reinstall the agent manually on one of your hosts to see if it helps.
Blog:
https://thesystemcenterblog.com LinkedIn:
Thursday, August 16, 2018 1:20 PM -
Hi Leon,
Tested with Firewall on and off including exclusions for winRM management ports.
Also tried reinstalling the agent which has no affect either.I have noticed that all of my not responding hosts are the same virtualisation version which is different to the host that is working.
RGPH101 - 6.3.9600.19000
Other hosts - 6.3.9600.18623
How do I update the "Virtualisation software version"
Thanks- Edited by AlecJ Friday, August 17, 2018 8:45 AM Added more
Friday, August 17, 2018 8:35 AM -
Fixed the issue.
Turned out to be the CredSSP update blocking the connections from VMM to hosts.
I made Encryption Oracle Remediation policy to Enabled, and Protection Level to Vulnerable:which instantly fixed the issue.
Thanks for the help!
- Proposed as answer by Leon Laude Friday, August 17, 2018 9:24 AM
Friday, August 17, 2018 9:09 AM -
Glad to hear you located the problem, and thank you for sharing!
Blog:
https://thesystemcenterblog.com LinkedIn:
Friday, August 17, 2018 9:17 AM