none
Need to analyse procmon for logs files (VMWare Quiesced snapshots causes disk errorson server running 2012 R2 Datacenter RRS feed

  • Question

  • Hello Team,

    Server : Windows server 2012 R2

    When creating a quiesced VMware snapshot, the server logs the following event ids : 50, 137, 140, 157

    I need help in analysing procmon (I have reproduced the issue by creating quiesced snapshot in vmware )

    Thanks


    swathi

    Thursday, December 5, 2019 1:48 PM

All replies

  • In the case of a snapshot or a backup via VMWare VSS Provider, those events are just cosmetic events logged by windows as stated by the following official VMWare and Microsoft  articles:

    https://kb.vmware.com/s/article/2115932

    https://blogs.msdn.microsoft.com/ntdebugging/2013/12/27/event-id-157-disk-has-been-surprise-removed/

    So, just ignore them.

    HTH
    -mario

    Thursday, December 5, 2019 2:16 PM
  • But can you also check for the other event IDs 50, 137, 140.

    I have also checked below article from VMware forum :

    Creating a quiesced snapshot of a Windows virtual machine generates Event IDs 50, 57, 137, 140, 157, or 12289 (2006849)


    swathi

    Thursday, December 5, 2019 2:33 PM
  • Yes, they are all related..

    They are only event logged by windows, but as it is explained in the VMWare KB they are logged by error from windows when really no corruption can happen.

    You may ignore them, obviously if they happen at the same time you are snapshotting the VM or creating a backup of the VM. I fthey happen far away from those moments they become relevant. Otherwise just ignore them..

    HTH
    -mario

    Thursday, December 5, 2019 3:18 PM
  • Hi Mario,

    user is saying that his logs files are being corrupted after creating snapshots due to the disk errors.

    Could you please help me in fixing this.


    swathi

    Thursday, December 5, 2019 3:44 PM
  • Log files of procmon can become corrupted becuse the disk is put in read only.. and that is a thing..

    The most important thing here is that the OS can resume to work as usual after the snapshot or the backup without any problem, even if you find all those errors in the event log.

    If you are tracing during a snapshot or a backup then it is reasonable that the trace file may become corrupted unless you trace onto another disk. Let's say you have three virtual disk attached to the machine: C:\ D:\ and E:\.

    If you use vmware quescied provider tobackup the disk c:\ and you are running procmon trace saving to a file on disk e:\ I would expect the trace file not to be corrupted.

    If you are tracing with the default which is the swap file on disk c:\, then because under the hood VMware is using VSS, when VSS creates the snapshot of the disk, it start saving transactions of what is neing writte in the meantime on the NTFS. This writes cannot exceed some seconds.. so if the total time to take the backup is 3 minutes, probably that is too much for VSS to save all the writes that procmon would generally do in that time and the results is a corrupted trace. I believe that if you put the file ont another disk where no VSS snapshot is taken, then you will get a useful and not corrupted trace.

    So, this is really a matter of time.. Procmon log tons of binary data.. NTFS cannot save all those data until the snapshot is cleared, and so this will result always in a corrupted trace. But if you save the trace onto another disk, there are good chances that the trace will be ok.

    HTH
    -mario

    Thursday, December 5, 2019 4:24 PM
  • Hi Mario,

    Thank you for detailed description.

    Here logfiles means not the procmon logs. I have collected the procmon logs after reproducing the issue.

    I just wanted to check with you in analysing the procmon logs.


    swathi

    Thursday, December 5, 2019 5:14 PM
  • Share them if you want, but before tell me what exactly is the problem or the error you are facing.

    If the problem are the event log events, those are not a problem..

    If you have any other problem, describe it as detailed as possible and share the logs.

    Thanks
    -mario

    Thursday, December 5, 2019 6:46 PM
  • Can you provide me your email address so that I can share you the logs.

    Thank you 


    swathi

    Thursday, December 5, 2019 7:31 PM
  • mariora_@hotmail.com

    -mario

    Thursday, December 5, 2019 7:51 PM
  • Hello Mario,

    I have sent an email to you with the procmon logs and summary of the issue.

    Thank you 


    swathi

    Thursday, December 5, 2019 8:05 PM
  • Hi Swathi.
    The issue is exactly what I tried to describe you before.
    This kind of issue, happens only on Domain Controller server, and happens because the AD cannot be backuped using any other tool than Windows Backup.
    Try to do that using VMware quiescing generate those errors. You didn't state immediately that this machine was a Domain Controller, so for that reason I simply said to you that  the returned errors are a cosmetic problem and can be ignored. On most of the servers that's true.. they can be ignored. 
    On the contrary, on a DC those errors are relevant as you have seen.
    I don't know exactly where the issue is. We engaged both VMWare and Microsoft at the time, but didn't get a final word on it. If you are in the same situation let them investigate some more.
     The solution we have found near the customer I worked for as a consultant was to remove all the DCs from the backup performed via VMWare or any other third party tool that uses the VSS VMWare provider to do the backup, and backup those machine using the old and native MS Backup. That solved the problem and we had no more errors returned.
    As you stated in your point 7 below "As per action plan we have created shadow copies on the drive and taken backup using windows server backup. We haven't got any disk/ntfs event id’s from windows end." that's the only viable solution so far.
    So stick with it and let Microsoft and VMWare investigate some more if you can, as this issue resurface any time someone want to put under backup a Virtual machine which has the role of a Domain Controller. In that case, just use Windows Backup and you will be ok.
    HTH
    -mario
    • Proposed as answer by mariora_ Friday, December 6, 2019 8:20 AM
    Friday, December 6, 2019 8:20 AM
  • Thank you for the update Mario. So do you have any articles that describes its due to the dc / ad backup will not work that uses vss VMware.

    swathi

    Friday, December 6, 2019 1:47 PM
  • Nfortunately the only article we were able to find were the two I sent you yesterday:

    https://kb.vmware.com/s/article/2115932

    https://blogs.msdn.microsoft.com/ntdebugging/2013/12/27/event-id-157-disk-has-been-surprise-removed/

    The fact that happens on DCs was just big luck. We had two identical environments one in production and one in test.. in test we never had a problem and in production every backup caused all that mess.. the difference? in test we followed the microsoft best practices to use Windows backup to backup the DCs. This because we hadn't enough resource to extend also to the test environment the automatic backup made via VMWare VSS provider using a third party product that we were using in Production.

    When we lok at it and understood that using Windows Backup there was no problem at all, we just took off the DCs from the automatic backup and returned to the old WIndows Backup.. Problem solved. 

    If you can work with Microsoft and VMWare try to reach the final point were the problem is, so It will be solved forever.. If you can't, simply remove the DCs from the VMWare backup and return to the old WIndows Backup. It will solve your problem.

    HTH
    -mario

    Friday, December 6, 2019 2:00 PM
  • Hello Mario,

    Good day !!

    I have checked with the user, this server is not DC.


    swathi

    Monday, December 9, 2019 12:57 PM
  • Well, the log you sent me contains trace of DNS Service and LSASS is using lots of calls used to determine the Replication status with ather DCs.. so I would say that is a DC.

    It would be very strange if that server would not be a DC.

    In any case, I cannot offer any more clue on this. Engage Microsoft and VMWare and let them investigate the problem unti they found a solution.

    Thanks
    -mario

    Monday, December 9, 2019 1:39 PM