none
Strange lockup problem when booting Hyper-V Server

    Soru

  • Hi,

    I'm having a strange issue on one of our longest running Hyper-V servers (about a year now). Backup is done using BackupExec 12.5 with the Hyper-V aware backup option.

    When booting the machine, it locks up for roughly 50 minutes, and then continues working normally.

    After much research, i've arrived at what COULD be the problem, but i'm not sure:

    C:\windows\system32\config\SYSTEM has an unusual size of 170MB

    When trying to find why using dureg.exe, i've found several trees that would indicate some sort of enumeration problem with the Hyper-V VSS Writer and it's Backups:

    All of those keys have 500-1000 Sub Entries, which i find highly unusual:
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\DeviceClasses\{53f56307-b6bf-11d0-94f2-00a0c91efb8b}
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\STORAGE\VolumeSnapshot
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Disk&Ven_Msft&Prod_Virtual_Disk

    Can someone running Hyper-V with backups for quite some time check the size of their registry keys?

    Regards,

    Lukas Beeler
    31 Mayıs 2009 Pazar 14:03

Yanıtlar

  • Looks like my guess was correct - removing the devices using Device Remover (http://www.pro-it-education.de/software/deviceremover/) worked perfectly. DeviceRemover detected 9500, i deleted them, and the system is now at roughly 350 devices (which is similar to other servers).

    Boot-Up issue is resolved, but the entries are still being created when running backups.
    05 Haziran 2009 Cuma 16:43

Tüm Yanıtlar

  • Hi Luka,

     

    I don't have Symantec Backup Exec installed on my Hyper-V host. I checked the registry keys you mentioned, I only have several entries in most of them, the maximum is around 30 entries.(Maybe I don't have Backup Exec installed). And dureg.exe is a Windows Server 2003 Resource Kit tool, there is no guarantee that it will work for Windows Server 2008.

     

    In order to isolate the problem, please help me to collect the following information:

     

    1.    When this symptom occur?

    2.    What’s the exactly symptom when the server “LOCKUP”?

     

    Please perform the following to tests whether the same issue persists:

     

    1.    Please boot the computer into safe mode to have a test

    2.    Please uninstall Symantec Backup Exec to have a test.

     

     

    Best regards,

    Vincent Hu

     

    01 Haziran 2009 Pazartesi 06:44
    Moderatör
  • Hi Vincent,

    Thanks for your reply.

    1. Symptoms occurs during server startup (sometimes before logon, sometimes after), never at any other time.

    2. Server does not accept any mouse or keyboard input. Can't connect using RDP or RPC. Mouse still moving on screen. Clock in Taskbar is stopped. Keyboard still handling Numlock lights. After about 40-50 Minutes, server continues working, completes starting up. May then run for weeks without issues, until the next reboot with the same symptoms.

    1. Issue persists
    2. Have uninstalled the BE agent, rebooted, issue persists.

    A question for you: Do you backup the Hyper-V host you've checked this using the Hyper-V VSS writer? I've seen this issue on all our production Hyper-V machines which are being backed up using the Hyper-V VSS Writer, but not on our test machines, which are either not backed up or backed up using Imaging software.

    Regards,

    Lukas Beeler
    01 Haziran 2009 Pazartesi 11:41
  • Hi,

     

    Yes, the size of SYSTEM hive is unusually large which may cause start issue. According to the registry keys, the problem occurs when the SAN is exposing LUN'S slightly different. As a result, the system thinks that the disk is different and it adds a new entry for it when the server is booted. Generally under the \scsi branch you should only see 2 or 3 entries. To resolve this, you may backup and then remove the extra entries from each of the control sets. After that, you can then use the following command against the hive to compact and repair it:

     

    chkreg /f system /c /l /r

     

    How To Use CHKREG.EXE To Check A Hive To Determine What Is Taking Up Space

     

    Here are the steps to determine what part of the registry is taking up the most space

     

    1. Get your system hive and place it in c:\bin

    2. Get the chkreg.exe utility(Please download the chkreg.exe from the Skydrive.)

    3. Run the following command and cmd prompt

     

    chkreg /f c:\bin\system /d 5 /s>c:\bin\chkreg.txt

     

    the /d 5 switch is used to determine how far down the registry tree is displayed, you may need to increase this but 5 is usually enough to get you heading in the right direction

     

    4. In the chkreg.txt file delete everything above the following section

     

    Keys,Values, Cells, Size, SubKeys

    1, 2, 6, 2824, ControlSet001\Control\Arbiters\AllocationOrder

    what is listed may differ than your hive. Be sure and keep the column headings

     

    5. Open the chkreg.txt using Excel. Select delimited when prompted, next, then select comma for the delimiter, next, finish

    6. Now select all data in the spreadsheet(upper left cell between a and 1)

    7. Click data, sort

    8. Sort by size

    9. Go to the bottom of the results

    10. At this point you need to look at the highest size entries. For example here is one

     

    2179 8372 18538 1201808

    ControlSet002\Enum\SCSI\Disk&Ven_EMC&Prod_SYMMETRIX&Rev_5567\

    2179 8372 18538 1203232

    ControlSet003\Enum\SCSI\Disk&Ven_EMC&Prod_SYMMETRIX&Rev_5567\

    2685 10600 23443 1482000 ControlSet002\Enum\SCSI\

    2685 10600 23443 1483408 ControlSet003\Enum\SCSI\

    2939 13773 32787 1566872 ControlSet002\Control\

    2939 13773 32787 1567072 ControlSet003\Control\

    3383 12975 28936 1764288 ControlSet002\Enum\

    3383 12975 28936 1765760 ControlSet003\Enum\

    4484 16262 37354 2067568 ControlSet001\

    7592 31210 70993 3914608 ControlSet002\

     

    You can see that obviously the CCS001 and CCS002 are going to be big since that is the total for that control set. As you move up in the list you can see that the Enum\SCSI section is quite large. We looked at these branches of the registry and there were hundreds of entries for lun's. This is what was causing the hive to grow so much. You may need to increase the /d switch to get further into the registry if it is not obvious

     

     

    More Information

    -----------------------

    302594  The System hive memory limitation is improved in Windows Server 2003

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;302594

     

    306038  Your computer does not start if the SYSTEM hive is too large

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;306038

     

    http://cid-2f1b62a6fb925ab7.skydrive.live.com/self.aspx/Share/chkreg.zip

     

     

    Best regards,

    Vincent Hu

     

    03 Haziran 2009 Çarşamba 02:33
    Moderatör
  • Hi Vincent,

    Yes, using chkreg.zip i found out the same values causing the issues as i previously found out using dureg.exe, namely:

    ControlSet001\Enum\SCSI\Disk Ven_Msft&Prod_Virtual_Disk\
    ControlSet001\Enum\VMBUS\

    I'm not using a SAN - Local Disks only, and the disk entries created are _all_ from the Hyper-V VSS writer.

    Now i don't know more than before - i could go and delete those keys, but that seems rather risk. And everytime the Hyper-V VSS Writer is used, new keys are created again.
    04 Haziran 2009 Perşembe 06:12
  • Hi,

     

    You mentioned that "the disk entries created are _all_ from the Hyper-V VSS writer."

     

    Did you mean each time when you perform the actions related to Hyper-V VSS writer such as taking snapshot, backup the VMs using Backup Exec, the disk entries will add automatically. If I misunderstand your concern, please feel free to let me know.

     

     

    Best regards,

    Vincent Hu

    04 Haziran 2009 Perşembe 11:41
    Moderatör
  • Hi Vincent,

    I didn't test with Hyper-V snapshots, but when using BackupExec with the Hyper-V VSS writer, additional devices are created. I've looked at test machines that are backed up using Windows Server Backup with the Hyper-V VSS Writer enabled and they have exactly the same symptoms, albeit to a much lesser degree (less VHDs, not backed up frequently).

    This is why i suspect a problem with the Hyper-V VSS Writer, and unfortunately i don't know anyone that has been running Hyper-V for a long time (6 months to a year) with daily backups.

    Regards,

    Lukas
    04 Haziran 2009 Perşembe 11:54
  • Looks like my guess was correct - removing the devices using Device Remover (http://www.pro-it-education.de/software/deviceremover/) worked perfectly. DeviceRemover detected 9500, i deleted them, and the system is now at roughly 350 devices (which is similar to other servers).

    Boot-Up issue is resolved, but the entries are still being created when running backups.
    05 Haziran 2009 Cuma 16:43
  • Hi Lukas,

     

    Sorry late for weekend.

     

    I am glad to hear that you boot-Up issue is resolved.

     

    I perform a backup using Windows Server Backup on my Hyper-V computer, I monitored the following registries you mentioned, and never have new entries added.

     

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\DeviceClasses\{53f56307-b6bf-11d0-94f2-00a0c91efb8b}

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\STORAGE\VolumeSnapshot

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Disk&Ven_Msft&Prod_Virtual_Disk

     

    I guess may be something you and I missed. Device Remover should be the alternative method before we find the root cause.

     

     

    Best regards,

    Vincent Hu

     

    • Yanıt Olarak Öneren svuksano 02 Kasım 2009 Pazartesi 15:13
    08 Haziran 2009 Pazartesi 02:35
    Moderatör
  • Hi Lukas

    I have same problem. Did you find any solution?

    Regards

    Sime
    02 Kasım 2009 Pazartesi 15:14
  • Hi Lukas,

    Did either Microsoft or Symantec provide a solution for this problem?

    I also have the same problem. C:\windows\system32\config\SYSTEM is around 130MB. It takes at least 15 minutes to boot. The server is running Server 2008 x64 SP2 and Hyper-V. 

    We are performing daily backups of the VMs using Backup Exec 12.5 SP3 Using the Hyper-V agent (with GRT enabled). The HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\DeviceClasses\{53f56307-b6bf-11d0-94f2-00a0c91efb8b} and HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ENUM\VMBUS registry keys contain several thousands (6800) of entries.

    On the first server 5 VMs are backed up on a daily basis also using the verify option. I have a second Hyper-V server on which I backup only 2VMs every  night. On this server the C:\windows\system32\config\SYSTEM is only 40MB.

    On the Backup server itself both registry keys also contain thousands of entries. On this server the C:\windows\system32\config\SYSTEM  is 110MB.

    I used DeviceRemover to remove the Msft Virtual Disk SCSI Disk devices but this only removes the entries in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\DeviceClasses\{53f56307-b6bf-11d0-94f2-00a0c91efb8b} key. The VMBUS registry key still contains lots of entries.

    Regards,
    Emiel
    11 Şubat 2010 Perşembe 10:21
  • I have exactly the same problem except we are using Windows Server Backup. We are not using any 3rd party software. Our System hive is 343MB and growing! We have over 24,000 devices in the registry! This is happening on two of our servers.

    Please can Microsoft treat this is a bug and find the cause and a permanent fix? We are running Windows Server 2008 R2 Datacenter with Hyper-V. WSB is taking incrimental backups every 30 minutes, but we have recently changed it to 60 minutes. No other roles or 3rd party software is running on these host machines which are both a Dell T710 (11th generation). Thanks.

    Event log has loads of these entries:

    Failed to delete the shadow copy (VSS snapshot) set with id '1A1938A0-1590-4BF4-8173-20DF5FD69E36' in the running virtual machine 'MG01': Unspecified error (0x80004005). (Virtual machine ID A3F241F1-ED7F-48E9-9CD7-CB7C28A6604B)

    Windows failure to delete VSS snapshots is filling up the registry and preventing Windows from booting properly. Our servers freeze for up to 2 hours each after a reboot!

    10 Mart 2010 Çarşamba 09:51
  • See the following blog post for a hotfix if you are still experiencing these problems:

    http://blogs.msdn.com/b/virtual_pc_guy/archive/2010/06/14/hotfix-hyper-v-backup-can-cause-slow-system-boot-large-registry-files.aspx

    Or just go straight to the fix:

    http://support.microsoft.com/kb/982210

    • Yanıt Olarak Öneren Borgquite 15 Haziran 2010 Salı 10:27
    15 Haziran 2010 Salı 10:27