locked
VMM administrator randomly crash with event id 1612 event id 19999 event id 1 RRS feed

  • Question

  • Hi budy,

    The symptom I met was just like this post http://social.technet.microsoft.com/Forums/en-US/virtualmachinemanager/thread/33305cf0-4bf9-4129-8a81-f6ae6a2cf9ca/, and this technical article http://blogs.technet.com/scvmm/archive/2009/01/21/vmm-admin-console-crashes-with-errors-19999-and-1-as-logged-in-vm-manager.aspx

    The administrator console randomly crash with even id 1612 in 1 - 30 minutes. and in event viewer I could see event id 19999 with event id 1 periodically appears.(this two events not appears every time after the console crashed.)

    I don't think it's duplicated NIC device name or ghost NIC could be the reason. I tried manually refreshing each blade host serveral times, and sometimes console didn't crash even all host had completed their refresh job. Besides, we used VMM to manage these blades since half year ago, but this issue only happened since last month. And we didn't change our hosts configuration.

    I enabled the VMM debug log. it shows that this crash issue always happens just after an action of failing to delete a folder at C: which is named by a connection uuid. The connection uuid folder is randomly for each time, and the corresponding hosts in this connection are different for each time as well.

    Could anyone help to shoot the target? I could send you some debug log I recorded if you need.

    P.S. Follow the console crash, all running task will fail in middle. If VMM are deploying VMs, the BITS will continue until end, then rest subjobs will stop until repair/restart job again in the console.
    Friday, April 17, 2009 9:20 AM

Answers

  • Sorry for the late reply. I do not have admin privilege to all hosts managed by VMM, plenty of time has been put on waiting for others feedback. Anyway let's back to the topic. I locate the issue. The problem is not caused by duplicated NIC device name, but it is something related to NIC error. There is a host got one of the 2 path to EMC SAN dead for a month. I'm not sure if the issue is on NIC or is on the EMC SAN client driver, or even the iSCSI initializor of win2008 OS yet, somebody else is looking into it. But simply remove this host, then VMM never got error 19999 again. Here I tells the method how I shoot it. First enable debug of VMM on server. http://blogs.technet.com/chengw/archive/2008/05/08/how-to-collect-scvmm-traces.aspx Search in the traces to look for the same time as the error 19999 happened in event viewer, I found these traces during that second. [3492] 0DA4.13F8::05/05-08:02:38.890#26:VmRefresher.cs(182): VM Light Refresher caught an unexpected exception and will crash the engine. Host hqdevblade7.dev.aspentech.com [3492] 0DA4.13F8::05/05-08:02:38.891#26:VmRefresher.cs(182): System.NullReferenceException: Object reference not set to an instance of an object. [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.UpdateNICs(IVMComputerSystem vmComputer) [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.UpdateFullVMObjectToCarmine(IVMComputerSystem vmComputer, VMData vmData, UpdateRequired updateRequired, Boolean vmObjectHasChanged) [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.RunFullRefresher() [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.UpdateHostandVMs(VMRefresherType refresherType, Guid vmObjectId, VM tempVm) [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.RunLightRefresher() [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.UpdateHostandVMs(VMRefresherType refresherType, Guid vmObjectId, VM tempVm) I traced several times, everytime the error 19999 shows in event viewer, there will appear the same trace log which tell me when refresh blade7 there was an exception throwed when doing update NIC action and this exception "crash the engine"
    • Marked as answer by Ricky Ren Wednesday, May 6, 2009 3:26 AM
    • Edited by Ricky Ren Wednesday, May 6, 2009 3:29 AM
    Wednesday, May 6, 2009 3:26 AM
  • You can usually confirm if you have two NICs by the same name by simply looking in the control panel (Control Panel\Network Connections\).  Switch to the details view and compare the device names.  You can also launch wbemtest.  Connect to the root\virtualization namespace.  Open class Msvm_ExternalEthernetPort.  Click the instances button.  Click on each instance and check that each ElementName is unique.  If you find one that is not unique, you need to follow the steps in the blog link you list above, using Devcon to reset this NIC's element name.
    Saturday, April 18, 2009 12:42 AM

All replies

  • You can usually confirm if you have two NICs by the same name by simply looking in the control panel (Control Panel\Network Connections\).  Switch to the details view and compare the device names.  You can also launch wbemtest.  Connect to the root\virtualization namespace.  Open class Msvm_ExternalEthernetPort.  Click the instances button.  Click on each instance and check that each ElementName is unique.  If you find one that is not unique, you need to follow the steps in the blog link you list above, using Devcon to reset this NIC's element name.
    Saturday, April 18, 2009 12:42 AM
  • Sorry for the late reply. I do not have admin privilege to all hosts managed by VMM, plenty of time has been put on waiting for others feedback. Anyway let's back to the topic. I locate the issue. The problem is not caused by duplicated NIC device name, but it is something related to NIC error. There is a host got one of the 2 path to EMC SAN dead for a month. I'm not sure if the issue is on NIC or is on the EMC SAN client driver, or even the iSCSI initializor of win2008 OS yet, somebody else is looking into it. But simply remove this host, then VMM never got error 19999 again. Here I tells the method how I shoot it. First enable debug of VMM on server. http://blogs.technet.com/chengw/archive/2008/05/08/how-to-collect-scvmm-traces.aspx Search in the traces to look for the same time as the error 19999 happened in event viewer, I found these traces during that second. [3492] 0DA4.13F8::05/05-08:02:38.890#26:VmRefresher.cs(182): VM Light Refresher caught an unexpected exception and will crash the engine. Host hqdevblade7.dev.aspentech.com [3492] 0DA4.13F8::05/05-08:02:38.891#26:VmRefresher.cs(182): System.NullReferenceException: Object reference not set to an instance of an object. [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.UpdateNICs(IVMComputerSystem vmComputer) [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.UpdateFullVMObjectToCarmine(IVMComputerSystem vmComputer, VMData vmData, UpdateRequired updateRequired, Boolean vmObjectHasChanged) [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.RunFullRefresher() [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.UpdateHostandVMs(VMRefresherType refresherType, Guid vmObjectId, VM tempVm) [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.RunLightRefresher() [3492] at Microsoft.VirtualManager.Engine.BitBos.VMRefresherBase.UpdateHostandVMs(VMRefresherType refresherType, Guid vmObjectId, VM tempVm) I traced several times, everytime the error 19999 shows in event viewer, there will appear the same trace log which tell me when refresh blade7 there was an exception throwed when doing update NIC action and this exception "crash the engine"
    • Marked as answer by Ricky Ren Wednesday, May 6, 2009 3:26 AM
    • Edited by Ricky Ren Wednesday, May 6, 2009 3:29 AM
    Wednesday, May 6, 2009 3:26 AM
  • Hi, I had the same problem- the SCVMM crash All the time after i add NIC card to both of my HOST servers (Cluster)
    I found that the new NICs card has the same name as the old ones.

    so i just diable the new cards, uninstall them and let the windows find them again- now their name was changed and diffrent 

    now the SCVMM works fine without crashes.

    Good Luck everyone!!!!
     
    • Proposed as answer by Yakov M Wednesday, May 6, 2009 9:33 AM
    Wednesday, May 6, 2009 9:32 AM
  • Can you provide which model of NIC card that you saw this with?  Also, can you give details about your cluster configuration and whether the node was active or passive when you added and later re-added the NIC?
    Friday, May 8, 2009 4:57 PM
  • i had the same problem when we upgraded the scom to R2
    only after we installed the opsmgr console again on the scvmm server the problem solved.

    yuval

    http://blogs.microsoft.co.il/blogs/yuvalts7/
    Thursday, September 24, 2009 1:35 PM
  • Thanks Ricky. This explains just what I've been seeing. I have a SAN with problematic MELIO drivers and any hosts that are attached to this SAN are causing problems with our SCVMM R2 server. The VMM service keeps stopping and restarting (or just stopping) and of course can't manage any hosts under that condition. Removing the hosts appears to remove the problem, although it leaves us with precious few hosts to manage. :)

    So, anyway, your post confirms my own experience.

    Thursday, February 24, 2011 10:19 AM