none
DPM 2016 backups failing - Unknown error ID 104 - Leaving Checkpoints behind RRS feed

  • Question

  • Hi,

    I have been having a recent issue with DPM 2016 UR7 where my VM backups have been failing and leaving behind "checkpoints" that don't actually exist in Hyper-V.

    If I then try to run the backup job again I get faced with Unknown error ID 104. I've seen a few posts regarding this however no fixes that have worked for me.

    I have tried rebooting the virtual machines in question to find that they fail there merge. I have updated all protection agents to use UR7 agent (5.0.409.0). Some of the virtual machines will merge once shutdown and backup again successfully but a day or two later I will be faced with the same issue.

    Has any experienced anything similar recently?

    I would love some ideas on this one, in some cases if I shut down the VM to merge and try to start again I will be faced with power on error, virtual hard disk chain is corrupt of which I then run SET-VHD command to point back to the parent disk. It will then boot and backup.

    Just seems like a really odd issue and has only been happening over the course of the past week or 2. I have also noticed that DPM has stopped deleting expired recovery points outside of the retention range but I believe that is a different issue.

    Cheers.

    Wednesday, July 3, 2019 10:14 AM

Answers

  • It seems that way!

    It was the allow logon as a service right that seemed to cause the issue. This was the service account being used by DPM itself. Now I just need to find out why!

    Thanks for your help Leon.

    • Marked as answer by Kyle Moody Monday, July 8, 2019 9:11 AM
    Thursday, July 4, 2019 3:53 PM

All replies

  • Hello Kyle,

    Do you recall any changes lately within the past 1-2 weeks? You can check the update history of the Hyper-V hosts if any updates have been installed lately that could cause these issues.

    Could you give us some more information about your environment?

    • Hyper-V host operating system version.
    • Host/guest level backups.
    • Virtual machine configuration version of the VMs failing
    • Hyper-V cluster/standalone.

    Please check the DPM logs for any more clues:

    DPM Server:

    • %ProgramFiles%\Microsoft System Center 2016\DPM\DPM\Temp\MSDPMCurr.errlog

    Hyper-V hosts:

    • C:\Program Files\Microsoft Data Protection Manager\DPM\Temp\DPMRACurr.errlog

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, July 3, 2019 10:52 AM
  • Hi Leon,

    Not any changes that I am aware of, I have been looking through updates today but all seems mine. We are using Server 2016 and do host level backups using System center DPM 2016 with the latest UR. I have updated the agents on all servers as troubleshooting but to no avail.

    Currently to try and solve I have been removing the PG and removing the data. Rebooting the Host and then backing up again which seems to be working but won't know until tonights backup.

    All VM config versions on are 8.0 and this is just a standalone Hyper-v scenario.

    Just seems odd that all of a sudden they stopped backing up and left bad checkpoints which need merging. I have checked both log files and they seem to be fine, nothing in there that would worry me.

    Also with the checkpoints that are being left behind, if I shutdown they fail to merge and also when I start the VM again they still fail to merge.
    • Edited by Kyle Moody Wednesday, July 3, 2019 12:41 PM Update
    Wednesday, July 3, 2019 12:18 PM
  • This is not normal at all, the logs should have some clues and I'd be interested to take a look at them.

    Have you seen any errors in the VSS writers?

    vssadmin list writers

    Also list the output of the following command:

    vssadmin list providers

    Also check your Application log on your Hyper-V standalone host for any VSS errors.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, July 4, 2019 8:13 AM
  • Hi Leon,

    List writers seems to be absolutely fine, no retry-able errors or anything. Funnily enough, some backups started working again on there own accord but then the following day there is a leftover checkpoint and the same issue again. Frustrating!

    Here is my output from VSS Providers for one of the hosts with an issue:

    Provider name: 'VSS Null Provider'
       Provider type: Software
       Provider Id: {8202aeda-45bd-48c4-b38b-ea1b7017aec3}
       Version: 5.0.158.0

    Provider name: 'Microsoft File Share Shadow Copy provider'
       Provider type: Fileshare
       Provider Id: {89300202-3cec-4981-9171-19f59559e0f2}
       Version: 1.0.0.1

    Provider name: 'Microsoft Software Shadow Copy provider 1.0'
       Provider type: System
       Provider Id: {b5946137-7b9f-4925-af80-51abd60b20d5}
       Version: 1.0.0.7

    Provider name: 'VSS Null Provider'
       Provider type: Fileshare
       Provider Id: {f4a69dd4-f712-40e3-a6b3-faeff03cb2b8}
       Version: 5.0.158.0

    I would love to get this fixed because it is driving me up the wall! The fact it is also leaving these "checkpoints" behind which wont merge on shutdown is odd. I did find this:  background disk merge failed to complete: The process cannot access the file because it is being used by another process. (0x80070020). 

    Any ideas?

    Thannks

    Thursday, July 4, 2019 2:56 PM
  • Does this occur to all virtual machines (VMs), or just specific ones? If it happens to specific VMs only, what differences are there between the working VMs and the non-working VMs?

    Do you have any antivirus/anti-malware software on your Hyper-V host or any other software that may be interfering with DPM?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, July 4, 2019 3:27 PM
  • We may have actually found the issue Leon.

    We have been updating our group policies recently and after doing some research it looks like the User rights assignment/Log on as a Service group policy has been causing some issues.

    After temporarily disabling this group policy and running a gpupdate /force on all affected hosts the backups seem to be running again!

    The only thing I find strange about this is why?

    Thursday, July 4, 2019 3:44 PM
  • So there were some changes after all :-)

    Could you tell us which exact GPO?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, July 4, 2019 3:49 PM
  • It seems that way!

    It was the allow logon as a service right that seemed to cause the issue. This was the service account being used by DPM itself. Now I just need to find out why!

    Thanks for your help Leon.

    • Marked as answer by Kyle Moody Monday, July 8, 2019 9:11 AM
    Thursday, July 4, 2019 3:53 PM
  • That makes sense yeah, I'm happy to hear that you've got it sorted out! Hardening can cause issues, so these changes should always be documented well!

    (Please don't forget to marks as answer and vote as helpful the replies that helped!)

    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, July 4, 2019 4:06 PM