none
DPM Bare Metal/ System System state fails ideas RRS feed

  • Question

  • Hello,

    Currently we are using DPM version 5.1.378 for doing our backups.

    My problem is that all of my Bare metal/System state backups fail... sometimes they work but most of the time they just fail...

    Is there any configuration that must be done for VSS Writer or wbadmin? Some of these errors I have also seen in previous versions of DPM. I guess BMR on DPM is very unreliable or is it just me?

    If anyone has run into problems or wants to try to help i can share details on all of the errors on all of my servers...

    Thank you.


    Friday, March 22, 2019 9:48 AM

Answers

  • Hello Leon,

    After many months I have come to some results.

    Here is what I have done/tried.

    1. BMR/System state with Bitdefender disabled --> Result: backup failed

    2. BMR/System state with shadow copies enabled on where the OS is installed + the other partitions that Windows automatically creates --> Result: backup failed

    So what I did was do both things and to my surprise it worked... Finally all my backups worked.

    I need to test if they will continue working if i don't disable my antivirus because having the antivirus disabled defeats the purpose of the antivirus + contact Bitdefender if the issue does persist.

    I will keep monitoring and post my results if you are interested. Hope this helps other people that are facing this problem. Would you like me to mark this as answered or first post the results and after mark it?

    Thank you for all the help!

    Thursday, May 23, 2019 7:49 AM

All replies

  • Hi Bogdan,

    Backups tend to fail every now and then for various reasons, it is very common as there's a lot going on in an environment, which may cause backups to fail.

    There has been a lot of issues registered with DPM 1807 (5.1.378) when backing up with BMR (Bare Metal Recovery), but no real fixes have been found/provided.

    I know that DPM 2016 which is part of the LTSC (Long-Term Servicing Channel) has had some fixes regarding BMR backups in it's latest Update Rollup 6.

    Apart from that, as Microsoft announced that there will not be any more Semi-Annual Channel (SAC) for System Center, I would recommend to start using the LTSC version, either DPM 2016 or preferably DPM 2019.

    In your current situation, you could start by analyzing the DPM log on both the protected server and the DPM server for any more clues.

    If the failures happen often, try to find out if there's any pattern to when it's failing, then you could get closer on finding the root cause.


    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Friday, March 22, 2019 10:28 AM
  • Hello again Leon,

    As you know we have discussed some issues regarding DPM Bare Metal/System state in a previous thread.

    I had DPM 2016 and had the same problems and upgraded to 5.1378 hoping that will help with something... unfortunately that didn't happen. I'm afraid that i will install 2019 and will have the same problems plus others on top of that...

    I can share a list of all the errors on both dpm server and backed-up servers maybe you know some of them and have an idea. :) 

    Would you recommend upgrading to DPM 2019? Are you using it? I haven't performed a downgrade before.. should i try that?

    The failures don't happen often, they happen every day. I have to rerun the backup and pray that it will not fail...

    Friday, March 22, 2019 11:56 AM
  • Yes you can post the errors here and I can try to help you as much as I can!

    I never want to "force" anyone to upgrade a software/hardware unless really needed, but there are some good improvements in DPM 2019 which I think most users would probably want to have.

    You could also consider setting up a test DPM 2019 on the side, and then test the BMR backup/backups to see how it works.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Friday, March 22, 2019 12:33 PM
  • I'll try installing 2019 in a test environment.

    Till then here is one of the errors. 

    DPM:

    Backed-up Server

    From event viewer:

    Event 517, Backup

    The backup operation that started at '‎2019‎-‎03‎-‎22T12:48:57.358472900Z' has failed with following error code '0x8078015B' (Windows Backup encountered an error when accessing the remote shared folder. Please retry the operation after making sure that the remote shared folder is available and accessible.). Please review the event details for a solution, and then rerun the backup operation once the issue is resolved.

    From wbadmin:

    Windows backup encountered an error when accessing the remote shared folder. Please retry the operation after making sure that the remote shared folder is available and accessible. Detailed error: The semaphore timeout period has expired

    Backup of volume C: has failed. Windows Backup encountered an error when accessing the remote shared folder. Please retry the operation after making sure that the remote shared folder is available and accessible.

    What i have done:

    1. Did a manual bare metal to ensure that wbadmin is working

    2. Checked that the server has rights to the DPM share created

    3. Removed the backup and recreated (initial backup worked and after that it started failing)

    4. Increased allocation size to far more than what was needed (gave it 250gb)

    5. Gave unlimited space for shadow copy on drives that it was backing up during bare metal

    What i noticed:

    The backups don't just instantly fail... they run for a while transfer X amount of data and afterwards they just fail... 


    Friday, March 22, 2019 1:00 PM
  • Any ideas Leon? :)

    Thank you,

    Bogdan

    Thursday, April 4, 2019 2:54 PM
  • Sorry for the late answer, I don't have much to come with I'm afraid.

    You mentioned that the failures don't happen often, which means it happens either randomly or at similar times (some kind of pattern), I would try to monitor the situation to find out what could be causing this.

    What else is going on during the times this specific backup or backups are running.

    Since it seems WSB (Windows Server Backup) is having issues accessing the share somehow, you could try to manually create the share and check whether DPM can access it.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Sunday, April 14, 2019 9:49 PM
  • Hello Leon,

    The backups fail every day :) the behavior changed.

    I keep getting the semaphore period expired. I incline to think that the connection is cut off after some time.

    Have you gotten this error? I checked if the share exists/ is created on backup start and it is so i don't think that is the problem. Any idea?

    Monday, April 15, 2019 6:19 AM
  • Hi Bogdan,

    I have not received this error, but many others in the community have, I've seen many posts about this, although no proper solution has yet to be found.

    I'm aware of that you're using DPM 1807 which is a Semi-Annual Channel (SAC) release, I would strongly recommend starting to use the Long-Term Servicing Channel (LTSC) (i.e. DPM 2016/2019) as they will be able to receive upcoming fixes.

    Update Rollup 7 for DPM 2016 came out this week as well, but t didn't have any mention of a fix for this issue though, but it could also be "hidden" :-)


    Blog: https://thesystemcenterblog.com LinkedIn:


    • Edited by Leon Laude Friday, April 26, 2019 7:08 AM
    Friday, April 26, 2019 7:02 AM
  • This software is incredible... 

    How many years has it been out and it still has big issues with it...

    I ended up upgrading last time because of similar issues ... It seems like the bugs aren't fixed... I will try upgrading to Rollup 7.. if i can downgrade from my current version that is.

    Thank you Leon.

    Monday, May 6, 2019 6:42 AM
  • Hi Bogdan,

    I would go for the newer version, DPM 2019, although it seems that Update Rollup 7 for DPM 2016 has some fixes that are not yet in DPM 2019.

    DPM 2019 is scheduled to receive it's first Update Rollup 1 in Q3 2019.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Tuesday, May 7, 2019 9:24 AM
  • Hello Leon,

    I'm upgrading DPM today and will provide feedback.

    I am following your step by step. 

    Link: https://thesystemcenterblog.com/2019/03/17/upgrading-to-dpm-2019-step-by-step/

    One thing you could add is:

    If the person upgrading also has Azure backup he needs to close these services as well.

    Also after stopping the specified services he should close the Microsoft Management Console or the upgrade won't go forward.

    Hope this helps.



    Wednesday, May 22, 2019 7:14 AM
  • Thanks for the feedback Bogdan, I will add it.

    Let us know how your BMR/System State goes with the new DPM.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, May 22, 2019 7:19 AM
  • Hello Leon,

    Unfortunately I have the same problem.

    Windows backup encountered an error when accessing the remote shared folder. Please retry the operation after making sure that the remote shared folder is available and accessible. The semaphore timeout period has expired.

    Any other ideas?

    Wednesday, May 22, 2019 7:39 AM
  • Do you have Windows Defender, or any other antivirus / firewall enabled?

    If so, could you try to completely disable all of them and then try backing up?

    Have you checked the permissions to the remote shared folder if DPM can access it?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, May 22, 2019 8:04 AM
  • Hello Leon,

    Windows Defender is disabled. We are using Bitdefender (which i have also disabled to check if it works like this).

    Unfortunately that doesn't seem to be the case.

    I have checked the folder permissions. When a backup is started from DPM there is a share that is automatically created and permissions to the specified server are granted... I guess there shouldn't be an issue on that.

    To top it off SQL and files backups are working without any problems on the servers that fail BareMetal/System state.

    I have ran out of ideas...

    Wednesday, May 22, 2019 8:21 AM
  • Disabling might not always be enough, I have both seen and heard that BitDefender can be quite rough and has been known to cause problems.

    Is it physical servers or virtual machines that you're trying to perform a BMR/System State backup on?

    Does it happen to all servers or just some? What operating system are they running? Are there any more differences?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, May 22, 2019 8:27 AM
  • Seems like one of the servers (that has antivirus disabled)has finished the backup successfully but the others with the same setup keep failing... the inconsistency is really confusing.

    All the servers are physical.

    Yes it did happen to all servers (up until now).

    OS: Windows Server 2016 Datacenter.

    I will try uninstalling the antivirus on one of the servers to see if the issue persists. I will keep you updated.


    Wednesday, May 22, 2019 8:48 AM
  • Hello Leon,

    After many months I have come to some results.

    Here is what I have done/tried.

    1. BMR/System state with Bitdefender disabled --> Result: backup failed

    2. BMR/System state with shadow copies enabled on where the OS is installed + the other partitions that Windows automatically creates --> Result: backup failed

    So what I did was do both things and to my surprise it worked... Finally all my backups worked.

    I need to test if they will continue working if i don't disable my antivirus because having the antivirus disabled defeats the purpose of the antivirus + contact Bitdefender if the issue does persist.

    I will keep monitoring and post my results if you are interested. Hope this helps other people that are facing this problem. Would you like me to mark this as answered or first post the results and after mark it?

    Thank you for all the help!

    Thursday, May 23, 2019 7:49 AM
  • Hi Bogdan,

    Thanks for being patient with this, I'm glad to hear that you have some progress on your issue!

    I'd be glad to hear about your results, you can mark as answered after the results to verify that it actually works.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, May 23, 2019 8:10 AM
  • Hello Leon,

    As i mentioned I will return with details regarding my findings.

    First of all this is frustrating... since I thought the problem was because of the antivirus I didn't disable it expecting the back-ups to fail as before.

    To my surprise they have been working without any problems.. i'm still not sure what caused the problems but the only thing that comes to mind is that it needed "an initial backup" with the antivirus disabled and the shadow copies enabled.

    I don't really understand how DPM works and if it "adds" something upon initial backup.

    The problem is solved and unfortunately I'm still in the dark regarding the actual cause.

    I don't know which reply to mark as answer since they all helped one way or another.

    Please advise. Thank you for all the help.

    Tuesday, May 28, 2019 10:34 AM
  • Hi Bogdan,

    It would indeed be very interesting to know the exact cause, from personal experience I've witnesses many issues when having an antivirus enabled. I'm not saying that organizations should not use an antivirus software, but they can lead to various kinds of different problems.

    When troubleshooting it is crucial to perform only one change at the time (and document it), this way it's easier to identify the root cause.

    So for you it seems to have something to do with the BitDefender antivirus software or Shadow Copies, or both.

    You can vote as helpful, the replies that led you to your solution, and mark as answer the reply that is closest to your solution.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Tuesday, May 28, 2019 11:12 AM

  • We have recently upgraded much of our Windows estate to Windows Server 2016 and as part of this work we also upgraded our DPM Server to DPM 2016 1801. Since doing so, the BMR/System State backups for Windows 2016 servers frequently fail. The BMR backups on our Windows 2008R2 servers continued to work without any problems. Looking through the backup/event logs we see the following logged when the backups fail:

    0x8078015B  Windows Backup encountered an error when accessing the remote shared folder
    0x80070079 The semaphore timeout period has expired

    I spent quite some time trying to find a solution to this problem, initially without a great deal of success. If I removed the server from the protection group in DPM, cleared the backup history, and then re-added the server, I would find that the BMR backups worked once or twice before failing consistently once again. If I ran Windows backup on the protected computer, backing up the BMR to local disk it would also work fine. The same BMR backup to a share on the DPM server also worked consistently. Further digging revealed that the DPM server presents a share to the protected computer for the duration of the backup. Testing revealed that the share was removed from the DPM server before the BMR backup completes, resulting in the backup failing. I attempted various configuration changes to try and resolve the problem, but none of these worked. For completeness, here is what I tried unsuccessfully:

    1. Creating/modifying the following registry keys on both the DPM server and the protected server
    HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\SPP\CreateTimeout and set decimal value to 3600000. To increase VSS timeout.
    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\VSS\Settings\IdleTimeout and set decimal value to 3600000. To increase VSS timeout.
    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue and changed decimal value from 60 to 180.
    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\TcpMaxDataRetransmissions with a decimal value of 5.
    2. Running “wbadmin delete catalog” on the protected server to clear the backup catalog.
    3. Checking for shadow copies and removing them on the protected server using the command “vssadmin list shadows”.
    4. Increasing the paging file to 16Gb from 2Gb on the DPM server.
    5. Attempting to throttle the bandwidth using the bandwidth throttling in DPM.
    6. Modifying the disk allocation to 100Gb for the backup within DPM.
    7. Running chkdsk /f on the protected server.
    8. Modifying the maximum size of storage allocated to volumes for shadow copies to have no limit at all.

    As I said, none of the above helped at all. What has finally led to us being consistently able to run BMR backups reliably on Windows 2016 servers was throttling the bandwidth between the protected server and the DPM server using a QoS profile on the protected server itself. I’ve read somewhere that the throttling built into DPM does not apply to BMR backups. I’m not sure whether that’s true, but trying that did not help us at all. It’s very simple to create a QoS profile in Windows 2016, and it’s not necessary to install any additional software, roles or features. To do so, simply carry out the following steps:

    1. Start “Local Group Policy Editor” by typing gpedit.msc on the protected computer.
    2. Browse to Local Computer Policy > Computer Configuration > Windows Settings > Policy-based QoS.
    3. Right click and select “Create New Policy”.
    4. Provide a name for the policy, uncheck “Specify DSCP Value” and set an outbound throttle rate (I used 100MBps successfully).
    5. Select “All applications” (though if your backup server provides other functions you may wish to restrict this just to the Windows Backup application.
    6. Select “Any source IP address” and enter the IP address of the backup server in the destination field.
    7. Select both the TCP and UDP protocols, and any source/destination port.
    8. The policy takes immediate effect without a requirement to reboot.

    Since applying this policy our BMR backups on Windows 2016 have worked reliably and consistently. Hopefully this is of help to others as well, as this can be quite a frustrating problem!

    Simon Edwins
    Senior IT Security Specialist
    Rothamsted Research

    Thursday, June 6, 2019 3:04 PM