none
DPM 2016 MBS Performance downward spiral

    Question

  • Hi Guys!
    I´m so pissed, really i´m at a point were i would like to just uninstall and go home.
    Why? Since we upgrade to DPM 2016 and switched to MBS we have massive problems.
    The System
    We have 3 independent locations,
    On every location is one physical machine with Windows Server 2016 and DPM 2016 UR4.
    On 2 Location we have and Raid 6 Storage attachd with Fibrechannel.

    On 1 Location we use and Buffalo NAS attached via ISCSI and an 1 Gigabit connection.
    On all 3 Location we also use Tapes.

    The Problem:
    On all 3 locations the performance is gradually going down. If something took 15 minutes at the start it took 1 our and 50 minutes a few weeks later.
    The performance goes down until every single backup takes so long that new scheduled backups are in qeue. 
    So, after 2 – 3 Months my backups basicly stop working.

    The current "Workaround":

    1. Take the Storage offline.
    2. Violently delete it from DPM.
    3. Take the Storage online an reattach it to DPM.
    4. DPM now formats that with REFS.
    5. DPM Shell: dmpsync.exe –ReallocateReplica
    6. Get-DPMProtectionGroup -DPMServerName scdpm | Get-DPMDatasource | Start-DPMDatasourceConsistencyCheck
    7. Wait until everything is finishd.

    Now Backups will work fine for a few weeks, than they work for a few weeks, than they work, then they kind of work, then you start to pull your hair and then I start at 0 again.

    At the moment it´s Round number 6.


    I also had very long conversation with the storage manufacturer and i´m sure: It´s not the storage.
    The problem is REFS and MBS.

    So i´m not the first one with this problem, but please - is there anyone who found a real solution for this. Yes? Can you explain it to me as simple as possible. Not because I would not understand but im already at a point that I need easy words to be sure there is absolutely nothing I do wrong. ^^
    Oh it´s not the Windows Defender / Anti Virus problem.


    Thanks in advance. I also will provide an Screenshot were I write to tape from the storage. The first 3 entries are from before my last rebuild and the last one from today. I rebuild yesterday.
    I also did not wait as long as usually, so if you think 1:53 is not so much longer than 00:19… if I would wait longer it would be 5+ hours and than 8+ hours and than …

    https://imgur.com/ejf7bJm










    • Edited by Intirius Thursday, January 25, 2018 12:50 PM
    Thursday, January 25, 2018 12:42 PM

All replies

  • You are not alone. The problem in this case is mostly ReFS, the file system behind MBS. DPM is doing things with ReFS that fills up the servers RAM and also causes slower backups.

    Things that helped, but didn't solve the problem:

    1. Install the latest Cumulative Update for Windows Server 2016. Somewhere in there is an updated refs driver that helps a bit with the memory consumption. See: https://support.microsoft.com/en-us/help/4016173/fix-heavy-memory-usage-in-refs-on-windows-server-2016-and-windows-10

    2. There is also this KB about the server becoming unresponsive: https://support.microsoft.com/en-us/help/4035951/refs-volume-using-dpm-becomes-unresponsive-on-windows-server-2016

    You can also play around with the registry settings in the two KBs. With these articles i was able to get our server stable for the most part. Some backups are still super slow, but at least it doesn't completly freeze anymore.

    There is also a lengthy discussion in the Veeam Forums about this: https://forums.veeam.com/veeam-backup-replication-f2/refs-4k-horror-story-t40629-780.html

    According to the Veeam forums another update to the ReFS problems will be included in the February 2018 CU for Windows Server 2016. Let's hope this fixes this for good. I am tired of seeing a backup of a 6TB fileserver running for almost 22 hours with one thirds of the backups never completing.

    Thursday, January 25, 2018 2:13 PM
  • Hi,

    there is a known Bug with RefS on Server 2016, not DPM related.

    We have tested a private Fix, and the result has been very positive. From Microsoft we heard the RTM of this Fix should be released at the End of February.

    So only thing you can do, is still wait, sorry.


    Michael Seidl (MVP)

    SYSCTR Senior Consultant, Blogger, CEO

    Blog | Twitter | Facebook | LinkedIn | Xing | Youtube

    Note: Posts are provided "AS IS" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    Friday, January 26, 2018 9:06 AM
  • Another thing that helped, at least in the short term, is to put more RAM into your DPM server and reboot it regularly. Not a permanent fix, but for us some backups run a lot faster, at least for a couple of days.
    Friday, January 26, 2018 12:39 PM
  • Hey guys, Microsoft just released an optional patch that includes a new ReFS driver that reportably improves performance: https://support.microsoft.com/en-us/help/4074590/windows-10-update-kb4074590

    I'm installing it now and testing it out on my server. I'm going to do the "workaround" to start from scratch after I install it.

    Thank the guys over at https://forums.veeam.com/veeam-backup-replication-f2/refs-4k-horror-story-t40629-810.html I've been following that thread for awhile.

    Friday, February 23, 2018 3:56 PM
  • Brilliant - after playing around with those "tuneable parameters" I don't get "vhdmp" error messages anymore in System log but backup itself is as slow and buggy as before.

    Microsoft, fix this issue ASAP!

    Wednesday, February 28, 2018 12:48 PM
  • The fix is here: https://support.microsoft.com/en-us/help/4077525

    Michael Seidl (MVP)

    SYSCTR Senior Consultant, Blogger, CEO

    Blog | Twitter | Facebook | LinkedIn | Xing | Youtube

    Note: Posts are provided "AS IS" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    Wednesday, February 28, 2018 3:04 PM
  • Thanks@Michael - seems I can't find that update within SCCM/WSUS right now?
    Has it been pulled (I read about some errors caused by that update)?

    Thursday, March 1, 2018 10:17 AM
  • I have no info about that.

    Michael Seidl (MVP)

    SYSCTR Senior Consultant, Blogger, CEO

    Blog | Twitter | Facebook | LinkedIn | Xing | Youtube

    Note: Posts are provided "AS IS" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    Thursday, March 1, 2018 12:43 PM
  • Is anyone here using LTO7 tape (sas connection) and having slow issues as well?  Would 400 gb per hour with compression be a tad slow, even from a sas 12gbps raid 6 array ?  

    The specs on our library state min 2.2 TB to 5.4 TB / hour though.


    Tech, the Universe, Everything: http://tech-stew.com Just Plane Crazy http://flight-stew.com



    • Edited by techfun89 Thursday, March 1, 2018 4:37 PM
    Thursday, March 1, 2018 4:36 PM
  • It is an optional update.
    Thursday, March 1, 2018 9:03 PM
  • Totally weird stuff again - none of my 2016 servers finds this update, neither on SCCM/WSUS nor directly via Windows Update (and yes, I check for optional updates as well). By manually downloading the update from Microsoft Catalogue one can clearly see that the update is only available for Windows 10 systems.

    Nonetheless, I manually installed the update on my 2016 DPM server, at least that worked flawlessly. Now let's see how DPM and MBS behave...

    Friday, March 2, 2018 8:40 AM
  • Unfortunately, I still have problems with BMR's with MBS after this update.  After more than a year of waiting and many $$$ spent in premier support, it would be good to finally get these to work.
    Friday, March 2, 2018 4:35 PM
  • Installed the Update linked by Michael - absolutely no difference, Backups are slow and unreliable. I know we're a bit oldschool running our Disk2Disk-Storage on Synology NAS systems connected via ISCSI. But that worked flawlessly with DPM2012 - could have been faster (how ironic, compared to the situation now) but it simply worked.

    What I see now is DPM reading with around 10MB/s when doing Tape backups where it was reading with ten times the speed before on DPM2012. Write speed drops below 1MB/s - ridiculous.

    I played around a bit with the Synos, switching LUNs from block- to file-level, using RAID10 instead of RAID5 and so further and so on. No changes to be seen, absolutely no changes.

    I'm now gonna open a ticket with MS but from what I read in several other threads I might already know about their answer...

    Monday, March 5, 2018 12:23 PM
  • Good luck with that.  May be you'll get someone who actually cares about getting these issues resolved, but I doubt it.
    Monday, March 5, 2018 2:21 PM
  • Have you also made some Changes to the registry like described in the KB Article?

    Michael Seidl (MVP)

    SYSCTR Senior Consultant, Blogger, CEO

    Blog | Twitter | Facebook | LinkedIn | Xing | Youtube

    Note: Posts are provided "AS IS" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    Monday, March 5, 2018 7:38 PM
  • Only tried the timeout registry value without success.  Anyone have luck with the others?  Don't really have time to diagnose all of them, esp. if it turns out to be a waste of time.  Some guidance from MS would be helpful here.  I expect DPM to work without all of this effort as this is not my job.

    Monday, March 5, 2018 9:24 PM
  • Hi, we are having this issue on about 10 DPM servers in our environment, I've installed the february hotfix that is supposed to solve the performance issues with ReFS but I see no improvement whatsoever. I've configured one of those servers with the registry changes but no improvement either.

    Tape backups go below 1GB/min, where we should be able to achieve 10GB/min.

    However it looks like disk based backup and console responsiveness are better with the hotfix.

    Marc

    Tuesday, March 6, 2018 11:22 AM
  • Hello Guy's

    we also have Perfomance Problems since we use  DPM 2016.
    Has andybody a recommendation for the "tuneable REfs Settings" ?

    The truth about MBS is: 3x slower Backups with a (big) pinch of unrelability.

    regards
    Stefan

    Wednesday, March 7, 2018 9:13 AM
  • Hi,

    here are the recommended REG Settings from MS Support Case.


    Michael Seidl (MVP)

    SYSCTR Senior Consultant, Blogger, CEO

    Blog | Twitter | Facebook | LinkedIn | Xing | Youtube

    Note: Posts are provided "AS IS" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    Wednesday, March 7, 2018 1:12 PM
  • Michael:

    Did those recommended settings work?  I've had many recommendations for more than a year of various support cases from Microsoft and none have worked.  I'm highly skeptical about wasting more time.

    Thanks.


    • Edited by simdoc Wednesday, March 7, 2018 2:25 PM
    Wednesday, March 7, 2018 2:25 PM
  • At the moment we are not seeing any Troubles, but the related Customer to our Support Case, wasn't touched so far with the official Fix, cause we havent had time till now.

    Michael Seidl (MVP)

    SYSCTR Senior Consultant, Blogger, CEO

    Blog | Twitter | Facebook | LinkedIn | Xing | Youtube

    Note: Posts are provided "AS IS" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    Wednesday, March 7, 2018 4:28 PM
  • Hi,

    Good News Maybe

    I'm not sure if you had a fix but I experienced the same issues, worked fine for a while but then backups started to hang and finally to a point where I had to delete the storage and re-create, this put us at risk with our clients as we had agreements for 28 days retention. Logged with Microsoft, ReFS is the issue

    First suggestion, install KB4077525 - Didn't make any changes.

    Second suggestion, install reg keys for for optimal performance, these are: 

    Windows Registry Editor Version 5.00

     

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]

    "RefsEnableLargeWorkingSetTrim"=dword:00000001

    "RefsNumberOfChunksToTrim"=dword:00000032

    "RefsDisableCachedPins"=dword:00000001

    "RefsProcessedDeleteQueueEntryCountThreshold"=dword:00002048

    "RefsEnableInlineTrim"=dword:00000001

     

     

    Windows Registry Editor Version 5.00

     

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\Configuration\DiskStorage]

    "DuplicateExtentBatchSizeinMB"=dword:00000100

     

     

    Windows Registry Editor Version 5.00

     

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Disk]

    "TimeOutValue"=dword:00000120

    This did improve the backup speed however again still not good enough, backups had gone from finishing at 09:00 to 07:00.

    Microsoft now stated that it was a known issue and that the Re-FS storage team are working on a fix.

    The good part now, well for me, going back to basics I went back to the start to check all config. I found that on the RAID card it was set to 'Write Through', changed this to 'Write Back'. Magic..a Backup which was taking 8-9 hours to complete now completes in 20-40 minutes.

    I made the change 2 days ago and working fine now, obviously could go back to issues but for now I'm happy.

    John


    Friday, March 9, 2018 9:16 AM
  • Hi JRH81,

    Did you only change the setting or did you recreate the volume as well? Some RAID Controllers change from Write Back to Write Through if they detect an issue with the battery.

    Our servers are already on write back, so not a solution for us.

    Marc

    Friday, March 9, 2018 9:29 AM
  • Hi Marc,

    Just changed the RAID setting, battery is fine, I think it was just missed during the config.

    Sorry to hear it hasn't helped, could be a temp solution and could be back to square one next week.

    John

    Friday, March 9, 2018 9:35 AM
  • Installed the mentioned updated and also used the regkeys but nothings helps. The back-ups just go every time slower and slower...The DPM server is only using 10% CPU and 20% memory so that is not the problem. Does anybody already tried DPM 1801 if that brings some improvements?

    Monday, March 12, 2018 11:32 AM
  • Hi,

    as this is related to RefS and not to DPM, 1801 will not change anything.


    Michael Seidl (MVP)

    SYSCTR Senior Consultant, Blogger, CEO

    Blog | Twitter | Facebook | LinkedIn | Xing | Youtube

    Note: Posts are provided "AS IS" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    Monday, March 12, 2018 1:00 PM
  • Ok thanks for the reply. Does anyone contact with Microsoft regarding this issue? Tomorrow the new patches will be released, is there a new update maybe for this? 
    Monday, March 12, 2018 1:51 PM
  • We were having an issue with 2 DPM2016 servers at 2 different sites where during a scheduled backup or online recovery point upload the jobs would hang, and when you tried to login to the server all you got was a black screen - resetting the machine was the only way to get it back.  Installed the February 2018 updates and it helped a little bit, but then eventually the same thing would happen after a few days.  Played with the registry settings a bit - not really alot of science here just increasing values.

    DPM server has 4 cores and 64gb of RAM - never shows any signs of being stressed at all, but it seems like ReFS must do something with the memory that doesn't show in normal tools.

    Right now I have these registry settings, and I've had successful runs (knock on wood) for the past week straight, which is the longest it's gone in a while without intervention:

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
    "RefsDisableLastAccessUpdate"=dword:00000001
    "RefsEnableInlineTrim"=dword:00000001
    "RefsDisableCachedPins"=dword:00000001
    "RefsProcessedDeleteQueueEntryCountThreshold"=dword:00010000
    "RefsNumberOfChunksToTrim"=dword:00000020
    "RefsEnableLargeWorkingSetTrim"=dword:00000001

    Hope this helps...

    Monday, March 12, 2018 2:09 PM
  • Hi,

    ReFS does not like many "cheap" RAID Controllers (including those in SAN Storages) and the way they are behaving.

    I would be really interested in the target storage you guys are using.

    With high end storage we do not experience those massive impacts.

    Furthermore DPM 2016 is even more picky on shared storage than 2012R2; so it does not like when you share the backend disks (not LUNs!) with other DPM-Server or serves.
    But that depends on the storage, of course.

    There is a difference between a 500k Netapp FAS and a Qnap or Synology

    regards

    /bkpfast


    My postings are provided "AS IS" with no warranties and confer no rights

    Tuesday, March 13, 2018 10:55 AM
  • Tried all the regkeys, but no luck.

    We are not using any RAID controllers. We have made a storage pool from the physical disks in Windows itself. It is a physical server so no shared storage. Disk cabinets are connected with SAS HBA's. 

    So still very poor performance. 



    • Edited by Lucas-076 Wednesday, March 14, 2018 12:37 PM
    Wednesday, March 14, 2018 12:20 PM
  • That's us too. No RAID but JBOD configured as Windows storage pools.
    Wednesday, March 14, 2018 2:44 PM
  • I have to revise my last entry, at least a little bit:
    Changing the LUNs on my Synologys from Block-Level to File-Level seems to do a bit of the trick, read/write speed has indeed increased. Following several advises from forums on the net, only File-Level LUNs do support caching whereas Block-Level LUNs don't.

    Did not make any difference on DPM2012 with NTFS-formatted storage but for DPM2016 using ReFS it obviously does. At least from what I've seen during the last two weeks while changing the LUNs (not completed yet), this seems to solve the issues with Disk2Disk-Backups.

    Nonetheless, read speed while doing Tape Backup is still slow as hell. Maybe this also get's solved once all LUNs are File-Based LUNs....

    Thursday, March 15, 2018 7:43 AM
  • Hi,

    First of all congratulations bkfast on your expensive storage.

    I don't read in the hardware requirements of DPM that we need an all-flash array to get decent performance. And it also does not explain the fact that if we destroy the filesystem and recreate all replica's the same system is up to 20-25 times faster and DOES perform at decent speed, that is 10GB/min tape backups to LTO6.

    Only after several weeks/months (or a certain number of iterations of the tape backup) we see a severe regression in tape backup speed. This is what the initial poster of this thread is reporting btw.

    For the record, we have a mix of Dell T620 and T630 servers running DPM. They all have a PERC RAID controller with 1GB cache.

    Marc

    Thursday, March 15, 2018 8:34 AM
  • hi all,

    pls exist solution for slow backup in the 2016 dmp ur4 ... i have instaled 2018-02 but not afect.

    OS: 2016

    DPM 2016 UR4

    backup storage is SAN dell sc2000.....

    is solution create ntfs and backup to this ?


    Falcon

    Sunday, March 18, 2018 1:25 PM
  • Hi all.

    I have the same problem with DPM 2016 on Refs system. Installed KB4077525 not help and registry settings too. DPM 2012 R2 hasn't performance problem in this server and storage. DPM storage is IBM Storwize v3700 and attached by SAS.

    I think start search other backup solution for our company.

    Tuesday, March 20, 2018 8:46 AM
  • HI Guy's

    i think we are seeing some improvements after applying kb4077525.
    The Registry Settings where already applied.
    Still the Backups are not 100% reliable.

    We are Using Direct Attatached 3,5' 7200rpm Disk  withing Raid 6 (HP Raid Controller).

    regards
    Stefan

    Tuesday, March 20, 2018 9:58 AM
  • what you use ? the storage is connect to storage pool and this add to dpm ? Or direct add volume with disk manager and connect to dpm ?

    Falcon

    Tuesday, March 20, 2018 1:33 PM
  • Hi,

    we have 3x disk enclouser (each enclouser with 12x SATA/Midline SAS, arragned as one Raid 6 Array) resulting into 3 Luns on the Raid Controller.

    I've added these 3 lun'S to one storage Pool and created on Big Vdisk.

    The BIG vdisk was added to dpm.

    regards

    Tuesday, March 20, 2018 1:50 PM
  • HI Guy's,

    today seems like a  little miracle, all backup finished with no erros and in a acceptable to fast time.
    10TB File Server Recovery Point in arround 30 Minutes.

    Could it be that the update does some secret optimization in the Background?

    regards

    Stefan

    Wednesday, March 21, 2018 6:58 AM
  • Applied the patch of last week (KB4088787) still no performance improvement. After 17hours only 28GB copied. 
    Thursday, March 22, 2018 7:02 AM
  • Applied the patch of last week (KB4088787) still no performance improvement. After 17hours only 28GB copied. 

    You are talking about Tape Peformance right?

    Thursday, March 22, 2018 7:10 AM
  • I'm sorry, no just a simple disk to disk recovery point creation
    Thursday, March 22, 2018 7:23 AM
  • We've been having exactly the same problem.

    Initially we see great performance.  The disks sit at around transfer 1.2GB/s speed during replica creation, and then slowly as the weeks go by performance gets worse and worse, until tape jobs that normally finish before we get into work Monday are finishing Thursday morning!

    I've been following the Veam thread and have all the patches and tried all the registry keys.  The memory issue has gone away, but ReFS performance still degrades over time.

    From what I can see it appears to be a fragmentation issue.  Initially most of the reading and writing is sequential, but as time goes on it gets more and more random which is why performance drops.  Blowing the disk away and recreating the replicas lays everything out nicely for sequential access again.  Allocate on write side affect or something..  


    • Edited by DJL Thursday, March 22, 2018 9:26 AM
    Thursday, March 22, 2018 9:24 AM
  • For us the latest Windows Server patches fixed most of the performance issues we had. DPM still uses a lot more memory than the system requirements suggest, but at least the backups are stable now. We had a fileserver protection point than run for ~36 hours, it's down to around 3 hours now.
    Thursday, March 22, 2018 2:32 PM
  • well, today i installed the March-Updates (KB4088787, KB4089510) and did a Reboot.

    Now the Problem is Back, 500 GB Recovery Point -> now over 3 hours (it was down to 30 Minutes).

    Now the question is: Did the update change something, or is refs getting faster with longer uptime?

    Will Monitor this again after the weekend.

    Regards

    Stefan

    Friday, March 30, 2018 1:12 PM
  • i mean that after restart is situation better, but after few days.....

    Falcon

    Monday, April 2, 2018 8:07 PM
  • After the March updates are also seeing slower backups after a couple of days. Not nearly as bad as before, but still noticable. My guess is that ReFS is still using too much memory on our really large backups (6 TB fileserver) and that eventually slows down things a bit. But for us it's managable, i am just happy that our server doesn't completly lock up anymore.
    Friday, April 6, 2018 1:13 PM
  • Months have passed, situation still unchanged. Installed May updates, DPM UR5, ... - nothing has changed, disk2disk backups take forever, tape backups will never finish, whole system gets blocked..

    Microsoft, get this issues sorted out IMMEDIATELY!

    Monday, May 14, 2018 6:37 AM
  • Hello Joerg,

    we also had a very big performance Problems with DPM and Mondern Backup Storage.

    But Since the refs driver was udpated (March Update, if i remember correct) the Perfomance now is pretty good.

    I don't know if this observation is correct but to me it seems that, after a reboot the performance is slower and after 1 oder 2 Days the performance is very satisfying for us.

    We are using DAS Storage (Local RAID6 Arrays with HP Raid Controller). So maybe thers something special to your iscsi setup. 

    Do you have the possibility to do a test Backup to a local disk? If the local Backup performs better, you know that you need to dig into your iscsi setup. 

    regards

    Stefan

     

    Monday, May 14, 2018 7:30 AM
  • Thanks Stefan - we see an opposite behaviour, like after a reboot system runs pretty well and gets slower and slower over days until it gets stuck completely. Had > 500 recovery points in queue this morning, where like >400 were in pending state and around 100 were "In Progress" but not doing anything (network usage on my DPM server <1Mbps).

    Nonetheless I will (again) try your suggestion and use a local disk as target....
    Can't be a "general ISCSI issue" as jobs work when being newly created and start getting slow after like 2 or three weeks. From my point of view there still is a big, big bug within the implementation of ReFS.

    Maybe you can gimme some details on your setup?
    DPM-Database on same machine? How much RAM is it allowed to use?
    How much RAM is installed on your system in total?
    Stuff like that... Thx. ;-)



    • Edited by Joerg Ott Monday, May 14, 2018 11:14 AM
    Monday, May 14, 2018 8:08 AM
  • Horrible performance for us too--much, much worse than 2012 R2.  We're not using iSCSI.  We are using JBOD configurations with SATA3 in Windows Storage Pools for MBS.  We have 12 physical cores (not using VM any more because performance was even worse) and 80 GB RAM in each box, much more hardware than 2012 R2 for much less performance. 

    While all types of backups are slower and more unreliable, it is almost impossible to get BMRs to work.

    Monday, May 14, 2018 1:31 PM
  • we have same trouble , DPM 1801 connected to SAN storage... in the system eventlog is 
    The description for Event ID 129 from source vhdmp cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.
    If the event originated on another computer, the display information had to be saved with the event.
    The following information was included with the event:
    \Device\RaidPort4.....

    Falcon

    Monday, May 14, 2018 3:31 PM
  • Still the same here. Horrendous performance.  

    We've just bought a new server to try and get things moving again.

    We'll be looking at alternative solutions soon as DPM is almost useless now.

    Monday, May 21, 2018 6:55 PM
  • Good morning.  DPM 1801 on W2016 here as well.  Backing up to local VHD.  Performance is horrible and now we cannot even remove protection.  MMC will sit for days on "Processing" trying to remove protection from a very small backup set.   I've tried all the mentioned fixes through this thread.  Any updates from anyone else?
    Wednesday, May 23, 2018 12:52 PM
  • Hey Guy's

    how much Memory does your server have?

    What i forgott to mention: When the problems with our DPM 2016 started i upgrade the RAM a few times.

    With DPM 2012R2 we where running on 16 Gig ram.
    Now the Same Server is Running DPM 2016 with 128 GB.

    Maybe thats the reason why my Sever runs well.

    regards
    Stefan

    Wednesday, May 23, 2018 1:54 PM
  • Update on our end.  Our DPM server is a HyperV VM  I found an obscure article about poor performance when "Enable Virtual Machine queue" was checked on certain servers with certain NICs.  UNCHECKING this box in Hyper V fixed all our issues.  Our DPM server is now running better than it ever has.  

    I will mention that I had also tried nearly (if not all)  suggestions in this thread, so it could be possible a combination of these things resolved our issue.
    Thursday, May 24, 2018 5:18 PM
  • Would you mind sharing the link you found with the suggested fix of unchecking "enable virtual machine queue"?
    Thursday, May 24, 2018 9:27 PM
  • We're running DELL servers and guess what, they are using BroadCom NICs that are affected by this VMQ bug... never had an issue before with that, so I'm wondering if that really can be crucial now after migrating DPM from 2012R2 to 2016. Nonetheless, I will give it a try...

    Here's some stuff to read:
    https://www.dell.com/support/article/de/de/debsdt1/sln132131/windows-server-slow-network-performance-on-hyper-v-virtual-machines-with-virtual-machine-queue-vmq-enabled?lang=en

    Don't go to deep with the firmware/driver versions, there's other threads across the net stating this also happens with different BroadCom NICs that already have latest firmware and drivers.

    And here's what MS has to say:
    https://support.microsoft.com/en-us/help/2902166/poor-network-performance-on-virtual-machines-on-a-windows-server-2012

    Cheers,
    Joerg

    Tuesday, May 29, 2018 12:31 PM
  • Having the performance issues myself, and find it unfathomable how bad it is. 

    I'm using a Dell MD3060e with a 12GB/s redundant HBAs and 20 8TB drives. Server is physical with 160GB of RAM, 2 CPUs and 16 Cores, and the storage is set up as a Windows Storage Spaces pool.  This server well equipped for this task. I'm also on the 2018-06 cumulative update and have enabled the registry entries listed in the previous posts KBs. The server was also rebooted yesterday to test performance again.

    My Hyper-V replicas (as an example) are taking forever to back up. Currently I have a job that has taken 16.5 hours and the backup snapshot size is only 70GB (verified by checking the avhd created during the volume snapshot). I just manually copied the snapshot avhd over to my DPM server and it took about 15 minutes (not bad over a Gigabit network), so I know it's not a network issue.

    My REFS volume currently has a queue of 100. Can't say I've ever seen it that high when the same storage space was formatted as NTFS for the legacy storage setup, so I'll add another confirmation that it's definitely an ReFS problem. 

    I have tried everything I can think, so once again it's back to waiting on Microsoft (and my backups to complete...maybe by Sunday?). I really don't want to roll back as despite the performance issues the space used by replicas in MBS is 1/3 to 1/2 of what it was on Legacy storage replicas.

    Wednesday, June 20, 2018 3:16 PM
  • We have applied all the reg fixes mentioned on here, all patches, even contacted Microsoft tech support.  Even worse is when you do a VM backup using VMware backups in DPM it only does 1 VM.  On a daily basis we spend so much time just tweaking DPM to make sure it will run as expected.   We are still waiting for the Microsoft tech support to get back to use with a fix for the VMware backup issues and the overall slowness.  Its been about 3 months and still no updates from the tech support team.  DPM in my opinion has just gotten worse.  Its the worst backup software I have ever experienced.  At this point it looks like robocopy will work better in getting daily backups rather than this filled with bugs DPM. 

    Ishan

    Thursday, June 28, 2018 1:16 PM
  • Similar problem here. DPM 1801 with MPS on WS 2016 running in Azure IaaS. Straight file copy between DPM and File Server completes with expected IOPS and throughput based on VM size (300 MB/s avg). DPM backups get 10 MB/s avg.

    It doesn't seem like ReFS is the culprit since both my file server and the DPM volume are formatted ReFS, and like I said I can copy content just using Windows Explorer and performance is great. It's only the DPM backups that have problems.

    Wednesday, July 11, 2018 3:07 AM
  • Ok, here's a little write-up that seems to point into the right direction:
    After adding some more RAM to my physical DPM2016 server we now have 128GB available - indeed just increasing the RAM solved some of the problems, but not all of them.

    We're using Synology NAS systems (3 at this time) as short term storage and back with DPM2012 it was fine (and recommended by Syno users around the globe) to use block-level LUNs on the NAS, connect (MPIO for sure) them via ISCSI to DPM and just add these NTFS volumes into DPM.

    Things (kinda?) changed with DPM2016/ReFS... Me and my friend "Trial&Error" now re-configured two LUNs on one of the Synos from block level to file based LUNs with advanced features (guess what: With DSM6.2 and up Synology stripped out the block level LUNs, you can only use file based LUNs now); in fact, I just killed and re-created the LUNs after doing the update to DSM6.2.

    While doing Exchange backups from my DAG it now is more than obviuos that "old" block level LUNs (even on updated Synos) seem to be the major issue: After re-creating the LUNs I removed two Exchange DBs from the PG and re-added them using the new LUNs as storage target. Now while backups runs, DBs located on new LUNs are backed up within minutes whereas DBs on old LUNs stay with 0MB transferred for one hour or even longer.

    Same goes for Hyper-V VM backups from my Hyper-V clusters and I guess it will be the same for all other backup types as well. Main difference I can see here: Ability to read/write simultaneously is pretty bad on old, file based LUNs whereas it gets dramatically better on new, file based LUNs.

    For the new LUNs I also decided to set the Volumes up as NTFS volumes and then put VHDX files on those NTFS volumes. After that I created a Storage Pool using those VHDX files, basically according to Charbels instructions available here: https://charbelnemnom.com/2016/10/how-to-reduce-dpm-2016-storage-consumption-by-enabling-deduplication-on-modern-backup-storage/

    So still on our way but for now it looks promising... ;-)

    Thursday, July 12, 2018 8:56 AM
  • Hi Cory,

    I wouldn't jump to conclusion that this is not a ReFS problem, you cannot compare a file copy job with the way DPM syncs data because DPM uses the copy on write feature in ReFS which could be the cause of the problem.

    A DPM support engineer confirmed to me that the issue is with ReFS and that the Windows team is supposed to fix this. I wonder what is taking them so long to do so.

    Best regards,

    Marc

    Friday, July 13, 2018 8:27 AM
  • Hi Joerg,

    The thing is that (see the initial post of this thread) when you (re-)create your backup storage, backups run fast but performance degrades over time.

    So you really have to see this over a period of one/two months from the time you (re-)created the ReFS volume in DPM.

    Marc

    Friday, July 13, 2018 8:32 AM
  • I agree with Marc cause we have a case open with Microsoft as well and the support engineer  told us that ReFS is the issue and the ReFS team is aware of this and they are trying to fix it. 

    Ishan

    Friday, July 13, 2018 1:35 PM
  • I had several support cases open with Microsoft using a premier account on DPM 2016.  Some lasted almost 2 years starting near the time of its initial release.  I got little for the cost of those cases and the cost of my time.  Don't expect much.
    Friday, July 13, 2018 1:55 PM
  • Simdoc,

    Yup i have the same feeling too.  Its been  3 months and they havent even updated me.  Not even anything positive that oh hey we might have a fix.  I am not having any expectations from them.  Its coming to a point that we might drop DPM if their new release doesn't have all the fixes.  Looks like DPM needs a dedicated admin who is hired only to babysit DPM all day. 


    Ishan

    Friday, July 13, 2018 2:28 PM
  • Hi Guys

    I just wanted to add that I have been following this post with closely for a while to help us out with similar issues.

    We have a number of backup servers that are running Veeam and Synology iSCSI LUNs with 50TB+ drives. We upgraded them to Windows Server 2016 with REFS and backups ran so slow that we couldn't complete.

    We changed the config on the Synology from normal iSCSI block  ended up formatting our LUNs to file level EXT4 on the Synology side and NTFS on the Windows side. We have also downgraded and of the backup servers that we upgraded to Server 2016 back to 2012R2 as we got much better performance on 2012R2.

    The downgrade to 2012R2 to a nightmare because 2012R2 can't read an REFS drive that is formatted by 2016.

    Good luck with REFS, not going to touch that for at least a couple years.

    Andrew

    Thursday, August 16, 2018 6:14 AM
  • Hi guys,

    after several weeks of running DPM 2016 connected to Synology with volume-based LUNs using Advanced LUN features I can't see any performance issues anymore. But be aware that you should use Storage Pools as described in Charbels instructions (see my previous post for the link) - if you just mount the LUN directly into DPM it may lead to data loss in case your Synology get's a hickup on the ISCSI connections (suffered from that, ReFS was not cabable of repairing the file system, lost around 15TB of data).

    I just skipped all the dedup-steps Charbel describes and since then read and write performance are no longer an issue, also my tape backups don't get blocked by running disk-2-disk jobs anymore.

    As a sum up:
    - increase memory on your DPM machine, we went from 64GB to 128GB
    - use "intelligent" LUNs on your storage
    - mount your LUNs using Storage Pools on Server 2016

    After having solved this we can now take care of the "real" problems like MSDPM crashing around midnight or not being able to add new VMs to my protection group when the VMs are located on a 2016 Hyper-V Cluster, running version 8... Sigh...

    Thursday, September 6, 2018 12:26 PM
  • Hi,

    performance is an issue with MBS - we mostly see it while migrating replicas / RPs.

    But some other ReFS related issues in our environments disappeared after we switches from mount points to drive letters.

    Perhaps it helps in other environments, too

    regards

    /bkpfast


    My postings are provided &quot;AS IS&quot; with no warranties and confer no rights

    Thursday, October 18, 2018 3:23 PM
  • Hello guys, I have experienced the same issues everyone else here has but I was able to resolve them completely. The only thing I can tell that I have different from you is a registry entry I made in my tunable parameters section. I dont see where anyone on this thread has disabled the DPM storage calculation. This has singularly had a major impact on my DPM performance. I am pasting all my parameters below but pay particular attention to the "DisableReFSStorageComputation"="1" entry.

    This will break the computed storage size value as reported on your protection group status detail screen but I didn't care as long as DPM functioned without my constant intervention. Quickly here, my DPM server is DAS Physical dedicated ose. 8 core, 80GB RAM on 12 7.2K SAS2 4TB drives RAID 10. DPM 1801 and serv 16 latest build.

    You will notice after making this entry and rebooting DPM that the long delay at the start of a job where your disk queue length shoots up and DPM seemingly is processing the meaning of life for 20 mins before transferring the first byte of data across the wire goes away. Good luck everybody

    Windows Registry Editor Version 5.00
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Disk]
    "TimeOutValue"=dword:00000078
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\Configuration\DiskStorage]
    "DuplicateExtentBatchSizeinMB"=dword:00000064
    "DisableReFSStorageComputation"="1"
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
    "RefsDisableLastAccessUpdate"=dword:00000001
    "RefsEnableLargeWorkingSetTrim"=dword:00000001
    "RefsNumberOfChunksToTrim"=dword:00000020
    "RefsDisableCachedPins"=dword:00000001
    "RefsProcessedDeleteQueueEntryCountThreshold"=dword:00000200
    "RefsEnableInlineTrim"=dword:00000001


    -Jason

    Sunday, November 11, 2018 4:29 PM
  • I had several support cases open with Microsoft using a premier account on DPM 2016.  Some lasted almost 2 years starting near the time of its initial release.  I got little for the cost of those cases and the cost of my time.  Don't expect much.
    complain to your account manager and get your money back.   this is a known issue per this thread and the veeam forums. I just got directed here finally after another search last night.  The veeam forums were always hte only discussion post I saw on Google about DPM 2016 and slowness.
    Thursday, December 13, 2018 5:44 PM
  • Hello guys, I have experienced the same issues everyone else here has but I was able to resolve them completely. The only thing I can tell that I have different from you is a registry entry I made in my tunable parameters section. I dont see where anyone on this thread has disabled the DPM storage calculation. This has singularly had a major impact on my DPM performance. I am pasting all my parameters below but pay particular attention to the "DisableReFSStorageComputation"="1" entry.

    This will break the computed storage size value as reported on your protection group status detail screen but I didn't care as long as DPM functioned without my constant intervention. Quickly here, my DPM server is DAS Physical dedicated ose. 8 core, 80GB RAM on 12 7.2K SAS2 4TB drives RAID 10. DPM 1801 and serv 16 latest build.

    You will notice after making this entry and rebooting DPM that the long delay at the start of a job where your disk queue length shoots up and DPM seemingly is processing the meaning of life for 20 mins before transferring the first byte of data across the wire goes away. Good luck everybody

    Windows Registry Editor Version 5.00
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Disk]
    "TimeOutValue"=dword:00000078
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Data Protection Manager\Configuration\DiskStorage]
    "DuplicateExtentBatchSizeinMB"=dword:00000064
    "DisableReFSStorageComputation"="1"
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
    "RefsDisableLastAccessUpdate"=dword:00000001
    "RefsEnableLargeWorkingSetTrim"=dword:00000001
    "RefsNumberOfChunksToTrim"=dword:00000020
    "RefsDisableCachedPins"=dword:00000001
    "RefsProcessedDeleteQueueEntryCountThreshold"=dword:00000200
    "RefsEnableInlineTrim"=dword:00000001


    -Jason

    I made the DisableReFSStorageComputation per your post.  I did notice it starting the data transfer portion faster but my recovery points are still taking hours.
    Thursday, December 13, 2018 5:46 PM
  • In perfmon whats the queue length look like for your DPM logical disk while the problem recovery point is running? 

    I am not having any further issues with regard to performance. DPM for me is back to the pre-2016 days before ReFS. 

    This is just a screen grab from my completed jobs list for today. Everything completes in a reasonable time for me. A mix of SQL DBs and VMs here..


    -Jason

    Thursday, December 13, 2018 6:33 PM
  • @JVorbeck


    We also "solved" the issue by throwing more disk I/O at it.

    14 instead of 8 spindles; the windows file cache appears to hang the entire system flushing to disk when network I/O exceeds disk I/O and buffers are full, then written to disk. It causes everything to freeze until finished.

    Thursday, December 13, 2018 6:35 PM
  • I found the exact same issue you describe but I began with this same configuration. The registry changes, operating system updates and software updates have culminated in the resolution for me. I didn't have to throw anything extra at it, but then I started with a larger IO subsystem than you. 

    With all the new features that 2016 and ReFS bring to the table in DPM I am not one bit surprised that the IO requirements are going to be a lot greater than previous versions. Tons of work going on to cram all that data into a much tighter space than before and a lot faster. This setup I am working with saturates my 10GbE NIC to about 8Gbps while backup sets are running. 2012R2 was lucky to hit 2Gpbs.


    -Jason

    Thursday, December 13, 2018 6:50 PM
  • The problem is exacerbated if Disk-to-Tape backups start at the same time that recovery points are slamming the disks.

    Regardless of actual I/O or cramming more into less space, I firmly consider this faulty O/S behavior to not throttle the buffering accordingly and maintain a usable system. Instead the O/S lets this turn into a runway condition until the system basically freezes.

    Another indication if you're suffering from this issue is a lot of events with ID 51 "An error was detected on device \Device\Harddisk<someting>\<something> during a paging operation."

    Thursday, December 13, 2018 6:57 PM
  • You'll get no argument from me. They should never have released this in that state. Its almost 2019 and mine has only been happy for a few months. Talk about an easy thing to catch in dev. Its like they didn't even try it before they went stamping DVDs.

    -Jason

    Thursday, December 13, 2018 7:52 PM
  • Hi Guy's

    our DPM'S are also running fine now.

    I've somebody stills has problems, after adding the registry keys i would suggest to try more RAM.

    Not only more, really really really a hell lot  more RAM.

    5 to 10 Times the Ram you had without ReFS.

    Our old DPM 2012 was Running with 16 GB -> After the Migration to DPM 2016 i ende up with 128GB.
    Our new Backup Server (DPM 2016) now is running with 192 GB

    regards

    Stefan

    Friday, December 14, 2018 12:25 PM
  • We run fine with even 8Gb RAM now.

    We don't have infinite hardware to throw after DPM and other windows loads when Bacula for the non-windows loads chugs along happily with only 4Gb and causes only 0.1% of the headache...


    Friday, December 14, 2018 12:47 PM

  • Regardless of actual I/O or cramming more into less space, I firmly consider this faulty O/S behavior to not throttle the buffering accordingly and maintain a usable system. Instead the O/S lets this turn into a runway condition until the system basically freezes.

    That's why it got fixed through patches to the Windows Server OS. Just took them a couple of years  to fix things...

    My guess is they never did long term tests before DPM2016 was released. Not shure if they did much testing on DPM2016 anyways, it was pretty buggy on release. 

    Friday, December 14, 2018 1:12 PM
  • That's why it got fixed through patches to the Windows Server OS. Just took them a couple of years  to fix things...

    Huh? In what version / with what patch?

    I can deterministically provoke this behavior in Server 2016 with KB4478877 (2018-11 Cumulative update) and DPM 2016 UR6 or DPM 1807.

    Mind you also that more RAM / more Disk I/O only masks and works around the problem, because either the buffers never reach this flushing point before the backup is finished and/or the disks can write the data away fast enough.

    The actual underlying faulty behavior is not resolved.

    Friday, December 14, 2018 3:33 PM
  • I totally agree with Jason.  Its a very weak product that needs a lot of improvement.  We have new random issues like when we try to attach a new server in DPM you wont be able to see what severs you selected.  All you see if a blank menu so you just have to guess that hopefully you selected the correct ones.

    With the whole push for new features more frequently now, their product quality has gone down so much.


    Ishan

    Friday, December 14, 2018 4:20 PM
  • "With the whole push for new features more frequently now, their product quality has gone down so much."  I wish I could say this statement only applied to DPM, but unfortunately many other products are following the same trend.
    Friday, December 14, 2018 5:30 PM
  • Do someone has solved this problem without clearing all Backups?

    Tuesday, January 22, 2019 9:51 AM
  • Hi

    I tried all the Settings. Nothing help. Now I check the Performance of the Disk and you can see the Problem!

     Disk read and write with slow Performance

    Disk Performance after recreate the Backup Volume with NTFS

    Disk Performance after first Backup this Night

    I have not change any Setting on the Raid Controller. We have a big SAS Disk Encloser from Dell with 50 TB Disk Space.

    This Morning the Backup was finish bevor I come in to the Office. I hope for the Fix with DPM 2019


    Roendi

    Friday, January 25, 2019 7:34 AM
  • Hi Roendi,

    This is exactly what we are seeing the past two years, disk based backups show a serious regression over time, just as the original poster of this thread has reported.

    I'm managing several DPM 2016 servers with MBS and have this issue as well, every 3-6 months we have to reset the DPM disk and recreate all replica's. Off course this is not optimal because we also loose all disk based recovery points. After the disk reset our tape backups go from 6 days to 1,5 days and avg disk queue length perfmon counters are dropping a factor 50-100!

    What bothers me the most is that this is a known issue for many years now (let's say since DPM 2016 came out). I talked to several DPM support engineers who all tell the same, it's a ReFS problem in Windows and the Windows team has to fix it. Well Microsoft, you know the problem, you know many people are struggling with it, FIX IT! But it is very clear that isn't going to happen.

    So sad,

    Marc

    Friday, January 25, 2019 8:02 AM
  • Exactly the same situation here.  We're going to have to reset the disk again shortly as our servers are beginning to crawl.  It's very frustrating.

    I think we'll probably jump ship to veam and avoid REFS soon.  I was hoping it would get fixed, but clearly not.

    Friday, January 25, 2019 11:44 AM
  • Hi DJK,

    DPM 2019 is set to release in Q1/2019 and one of its highlighted features is that Modern Backup Storage (aka disk based backups on ReFS) actually work (or as MS states: made improvements to MBS). I didn't test how accurate that statement is and off course we will only know afther having 2019 in production for a couple of months but it might be enough to keep you on board.

    I've have evangelized DPM since DPM 2007, I was on the TAP program for 2007 and 2010, but if it doesn't get fixed this time, I'm out.

    Marc

    Friday, January 25, 2019 1:04 PM
  • We are already looking into moving to some other backup solution.  This is a waste of time product.  DPM VM backup only backs up  1 vm at a time unless you create multiple protection groups.  Extremely inefficient.  Not sure what the DPM engineers were thinking when they had such limitation.  I'm not expecting much from the new version that will be released in Q1. 

    Ishan

    Friday, January 25, 2019 3:16 PM
  • We are waiting for DPM 2019 also.

    If this doesn't work we will abandon DPM altogether this year for something that actually works. All day every day someone has to keep resuming the failed sync jobs.

    Monday, February 11, 2019 12:17 PM
  • Last Friday I ran an in-place upgrade of my performance affected Win2016/DPM2016/MBS server to DPM2019/Win2019 and there is absolutely no improvement on MBS performance. If anyone of Microsoft is reading this, I strongly suggest to enable the option for DPM2016 and DPM2019 to revert to "Legacy Backup Storage" (NTFS based dynamic volumes) instead of Modern Backup Storage .

    Hey, this (LBS) works when you upgrade a DPM2012R2 to 2016 so this is not a technical restriction. Therefore reverting to LBS is the quickest way to get our backup systems to an acceptable stability level, rather than to wait for the Windows team to fix ReFS, something they are now trying to do for the last 3 years...

    LBS Please!

    Marc

    Sunday, March 17, 2019 10:20 AM
  • Hi,

    If this is still an issue in DPM 2019, I suggest you submit a feedback on the link below (if you haven't already) for DPM 2019:

    https://feedback.azure.com/forums/914488-system-center-data-protection-manager

    Once the feedback is submitted, you can post the link in this thread and have others vote on it.

    This is probably the best way to get the DPM product team to know about this issue and then hopefully they will take action.


    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Sunday, March 17, 2019 10:49 AM
  • Thanks for your suggestion Leon, but I'm not going to wait for DPM2022.

    It's a known issue, MS support admits it, the only answer is that they wait for the Windows team to fix ReFS.

    BR,

    Marc

    Sunday, March 17, 2019 11:42 AM
  • Hi!

    Contact with David Jenner in our fb.com group "Azure Backup & System Center Data Protection Manager"

    He got an excellent result with Modern Backup Storage (MBS).

    Also, I recommend to fresh install and setting DPM 2019

    For example, you can use the next article for installing DPM 2019
    "
    Check how to Install System Center 2019 Data Protection Manager on Windows Server 2019 and SQL Server 2017"


    Have a nice day !!!
    DPM 2012 R2: Remove Recovery Points
    DPM blog
    System Center
    Hyper-V

    Tuesday, March 19, 2019 9:35 AM
    Moderator
  • Now available: Microsoft System Center 2019!

    On March 7, 2019, we shared that System Center 2019 would be coming soon. As of March 14, 2019, we are pleased to let you know that System Center 2019 is generally available. Customers with a valid license of System Center 2019 can download media from the Volume Licensing Service Center (VLSC). We will also have the System Center 2019 evaluation available on the Microsoft Evaluation Center.

    P.S. Use fresh DPM installation!


    Have a nice day !!!
    DPM 2012 R2: Remove Recovery Points
    DPM blog
    System Center
    Hyper-V

    Tuesday, March 19, 2019 9:38 AM
    Moderator
  • Enhancing Backup Performance on DPM 2016 Modern Backup Storage
    There is one optimization change you can make in DPM 2016 using Modern Backup Storage that “may” help reduce the backup times.  The issue is that DPM tries to get the size of the recovery point by asking ReFs to count the blocks associated with that recovery point so it can update the DPMUI “storage used” in the UI.

    To eliminate that additional calculation that can take a long time and allow the next backup to start you can disable that calculation using the below DPM powershell command.

        Manage-DPMDSStorageSizeUpdate.ps1 -ManageStorageInfo StopSizeAutoUpdate

    As a result, the “storage consumed” for each data source will not be displayed after a new backup is completed.

    The above is covered in the release notes: https://docs.microsoft.com/en-us/system-center/dpm/dpm-release-notes

          Section titled: Recovery Points not being pruned, leading to an accumulation of Recovery Points.

    If you need to know the amount used for a data source you can use the associated UpdateSizeForDS command.


    Have a nice day !!!
    DPM 2012 R2: Remove Recovery Points
    DPM blog
    System Center
    Hyper-V

    Tuesday, March 19, 2019 9:46 AM
    Moderator
  • I have turned off the StopSizeAutoUpdate with no success myself. Microsoft worked on my issue for three months last year and essentially gave up with "this is up to the Server team to fix".

    I am inclined to in-place upgrade to DPM 2019, but as far as I can see you can no longer backup workloads on Server 2012 R2 or Server 2008 R2? Unfortunately I have a very small number of critical workloads that are still on older operating systems.

    Wednesday, March 20, 2019 3:51 PM
  • True, Windows Server 2008 R2 or Windows Server 2012 R2 are not supported, but if they run as guest virtual machines on host that has a minimum of Windows Server 2016, they can be backed up.

    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, March 20, 2019 6:51 PM
  • I agree with TheWaker1.  A Dell engineer disabled the same settings for us as well but didnt see any positive outcomes. 

    Ishan

    Wednesday, March 20, 2019 9:02 PM
  • I am inclined to in-place upgrade to DPM 2019, but as far as I can see you can no longer backup workloads on Server 2012 R2 or Server 2008 R2? Unfortunately I have a very small number of critical workloads that are still on older operating systems.

    Yes indeed that's a disappointment.

    We are running DPM 2019 on Server 2019 now and hope the situation improves; it's too early to tell as it takes weeks to degrade.

    For the "legacy" workloads such as 2012, we went to a scripted solution that allows us to utilize WSB with our BareOS for BMR backups.
    If DPM 2019 now doesn't work out for BMR, I think we will finally give up on it completely.


    Thursday, March 21, 2019 7:07 AM
  • After 2 monts the performance is down again.

    I repair it again today and Monday I make the Inplace Upgrade.


    Roendi

    Friday, March 29, 2019 6:58 AM
  • Hi,

    we updated right when it came out (out of desperation), but it's too early to tell whether it will stay well.We also updated the O/S to 2019.

    Friday, March 29, 2019 7:00 AM
  • is possible backup with DPM 2019 some old solution, ShP 2013 Exchange 2013 ? This product is not in support matrix but after upgrade client, backup  works.... I must install DPM 2019 to protect vmware 6.7...

    Falcon

    Friday, March 29, 2019 10:56 PM
  • I make an in place Upgrade from the older Version
    and everything works fine for the Moment.<o:p></o:p>

    I can back up my Hyper-V Host with 2012 R2 and all
    Machines on it. I can back up my SharePoint 2013 as Virtual Machine on the
    Hyper-V Cluster 2016. Barmetal is working. It looks good now.<o:p></o:p>

    For the in place Upgrad I repair again the Backup
    Store. Now we will see how long it works.

    Röndi<o:p></o:p>


     

    Roendi

    Friday, April 5, 2019 7:44 AM
  • Unfortunately, now there is some sort of ReFS exeption...

    An unexpected error caused a failure for process 'msdpm'.  Restart the DPM process 'msdpm'.
    
    Problem Details:
    <FatalServiceError><__System><ID>19</ID><Seq>93781</Seq><TimeCreated>4/14/2019 9:30:30 PM</TimeCreated><Source>DpmThreadPool.cs</Source><Line>163</Line><HasError>True</HasError></__System><ExceptionType>KeyNotFoundException</ExceptionType><ExceptionMessage>The given key was not present in the dictionary.</ExceptionMessage><ExceptionDetails>System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
       at System.ThrowHelper.ThrowKeyNotFoundException()
       at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
       at Microsoft.Internal.EnterpriseStorage.Dls.StorageManager.ReFSQueueDepthManager.SignalQueueDepth(ReFSInstrumentation key)
       at Microsoft.Internal.EnterpriseStorage.Dls.StorageManager.ReFSQueueDepthManager.ManageQueueDepth.Dispose(Boolean A_0)
       at Microsoft.Internal.EnterpriseStorage.Dls.StorageManager.ReFSQueueDepthManager.ManageQueueDepth.Dispose()
       at Microsoft.Internal.EnterpriseStorage.Dls.StorageManager.RefsDuplicateExtentcontext.CallBackDuplicateExtents(Object stateInfo)
       at Microsoft.Internal.EnterpriseStorage.Dls.EngineUICommon.DpmThreadPool.Function(Object state)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
       at System.Threading.ThreadPoolWorkQueue.Dispatch()</ExceptionDetails></FatalServiceError>
    
    
    The message resource is present but the message was not found in the message table
    

    Monday, April 15, 2019 6:08 AM
  • Update Rollup 7 has been released for DPM 2016.

    Update Rollup 7 for System Center 2016 Data Protection Manager
    https://support.microsoft.com/en-us/help/4494084/update-rollup-7-for-system-center-2016-data-protection-manager


    Blog: https://thesystemcenterblog.com LinkedIn:

    Tuesday, April 23, 2019 7:36 PM
  • Thanks for the update Leon, will this soon be available for 2019 too?

    Fixing the constant VHD mount errors would be splendid!

    Tuesday, April 23, 2019 9:04 PM
  • I have no information about this, but hopefully it will be fixed soon, I believe the first update rollup for System Center 2019 will be in the early autumn if not sooner.

    Blog: https://thesystemcenterblog.com LinkedIn:

    Tuesday, April 23, 2019 9:31 PM
  • Has anyone managed to install DPM 2016 UR7?

    We just get the error message "RunDpmPatch.exe has stopped working" shortly after starting the update.  Tried multiple downloads on different DPM servers

    Wednesday, April 24, 2019 5:18 AM
  • Yes, I didn't encounter any issues at least.

    1. Open up a Command Prompt (Admin).

    2. Change directory to where you downloaded the DPM 2016 UR7.

    3. Run the biggest file within the Command Prompt.
    (Detectoid for System Center 2016 - Data Protection Manager Server-all-dataprotectionmanager2016-kb4494084_1f4b075ba94cadf136e4535da6f57d10d59d0a44)

    If you run it without the Command Prompt, make sure to run it as administrator.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, April 24, 2019 5:37 AM
  • Stupid question but this update won’t apply to v1807 correct?
    Thursday, April 25, 2019 6:05 PM
  • Stupid question but this update won’t apply to v1807 correct?

    No it won't, DPM 2016 is a Long-Term Servicing Channel (LTSC) release, while DPM 1807 is a Semi-Annual Channel (SAC) release.

    More information about these two different releases here:

    Overview of System Center release options
    https://docs.microsoft.com/en-us/system-center/ltsc-and-sac-overview


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, April 25, 2019 6:21 PM
  • I have absolutely the same slowness issue. We all blame the DPM but have you tried to copy some big file (20-30GB) directly to the MBS drive? I noticed same write speed as the one during the backups. It starts fast and after several seconds it slows down to 5-15MB/sec...

    My DPM (Already UR7) is VM on 4-node HV cluster. I have dedicated storage iSCSI to Hyper-V nodes, and 16TB pass-through disks. I presented same way another NTFS formatted drive and did some tests:

    5 TB LUN NTFS - ~250-300 MB/sec (cluster disk on node) / ~220-250 MB/sec (pass-through to DPM)

    16 TB LUN REFS ~250-300 MB/sec (cluster disk on node) but ~10-50 MB/sec in pass-through mode !

    I think there is something wrong of how DPM server manage REFS...



    Thursday, April 25, 2019 11:46 PM
  • Neven, you're right; but I say who cares. If not for DPM I wouldn't be forced to REFS.

    If they choose this they haven't tried this?

    Why can't it store the VHDXs on NTFS?

    Friday, April 26, 2019 6:35 AM
  • We have the same issue with 2016.

    Reinstalled everything from scratch and it was working perfectly for a couple of months, but now performance has deteriorated to the point where it’s taking 1 week+ to complete jobs that were previously taking two days.

    Did upgrading to 2019 fix this performance deterioration issues once and for all for any of you?

    Its going to mean a fair amount of upheavel to rebuild (again), so want to ensure that its not another false dawn!  

    Thursday, May 9, 2019 10:57 AM
  • Don't bother. DPM2019 suffers from even more issues. Since upgrade the "VHD couldn't be mounted" appeared and there isn't a UR or hotfix as there would haven been with 2016/UR7.

    The ReFS issue hasn't been fixed either.

    Thursday, May 9, 2019 11:01 AM
  • DPM 2019 on win2016 or win2019 suffers from the same Refs issue. The only thing that is resolving the issue is to reset your disk with your dpm data and rebuild it with dpmsync. You’ll loose all your Recovery points
    • Edited by mvds Thursday, May 9, 2019 11:06 AM
    Thursday, May 9, 2019 11:05 AM
  • I agree. I see the same vhd mount errors, in dpm2016 it is fixed with UR7, they didnt fix it for 2019. I wonder if these products get tested AT ALL
    • Edited by mvds Thursday, May 9, 2019 11:08 AM
    Thursday, May 9, 2019 11:08 AM
  • Oh, that doesn't sound too promising then!

    We are also having to rebuild every 3 months or so. How are you guys managing? 

    I see various people saying that adding more memory alleviates the symptoms, but from all standard  metrics our server is only using half of the memory available, so reluctant to spend money upgrading.

    Thursday, May 9, 2019 12:06 PM
  • DO NOT move to server 2019 and DPM 2019. I had a functional 2016/2016 install that I finally got to a space where I didn't have to babysit it constantly. I tragically wiped that out in favor of 2019/2019 with the desire to support my server 2019 workloads and what a mistake that was. 

    In one month I have already had to blow out my DPM storage disk because backups ground to a halt but worse than that is the constant alert storm of failed backups due to "The VHD could not be mounted or unmounted".

    My God Microsoft did you fork the DPM code for 2019 from DPM 2016 RTM? Let me guess, if my backup destination was Azure storage your software would probably work flawlessly? This whole DPM thing is absurd.


    -Jason

    Thursday, May 9, 2019 1:25 PM
  • My Server has 64 GB Ram. For sure this is not the Problem. REFS is the Problem. I recreate the Disk now on the Server 2019 with DPM 2019 and for sure it stop working again after a view Weeks.

    My Chef love MS so I need to do that. 


    Roendi

    Thursday, May 9, 2019 1:30 PM
  • Guys, at the moment the only official way to solve this problem is to install DPM 2016 or CB on WS 2012 R2 with legacy storage (NTFS disks). The issue is OS dependant, is caused by ReFS fragmentation and affects "only" DPM deployments configured with Modern Backup Storage (and so ReFS).

    MS is working to solve the issue on WS 2019 with an update for ReFS. As I know there are no plans to solve the issue on WS 2016 but when ReFS updates will be released it'll be possible to upgrade from DPM and OS in the right order to reach the desired functional level. 

     

    Thursday, May 9, 2019 1:36 PM
  • Hi Alberto,

    That fix has been promised since 2017 (for Windows 2016), I don't have high hopes on it. The easiest fix would be to allow to use NTFS volumes when running DPM2016-2019 on Windows 2016-2019, but they administratively block it, it's not a technical issue because when you do an inplace upgrade of DPM 2012R2 to 2016 you can still keep using the NTFS volumes...

    It's just scandalous that this bug is still around and doesn't get the attention from the Windows team. If you cannot fix it in at least two years, just allow us to go back to what worked in the first place, even that is not possible.

    I hope someone at MS has at least the decency to get to a solution because we are suffering, big time! 

    I guess the message they are sending us is as follows: go cloud or go dead. At this time, I like neither of them.

    Marc


    • Edited by mvds Thursday, May 9, 2019 1:52 PM
    Thursday, May 9, 2019 1:49 PM
  • I see it on the same way. No hope for this fix. 

    Roendi

    Thursday, May 9, 2019 1:52 PM
  • I agree with Marc.  I had an open ticket with Dell 8 months ago and they said this and that change blah blah.  Nothing has been seen. Rather its causing more headaches for us.  There is no hope for DPM at this point and the rate at which MS is fixing all the pending issues they have. 

    Ishan

    Thursday, May 9, 2019 1:54 PM
  • Hi,

    If this is still an issue in DPM 2019, I suggest you submit a feedback on the link below (if you haven't already) for DPM 2019:

    https://feedback.azure.com/forums/914488-system-center-data-protection-manager

    Once the feedback is submitted, you can post the link in this thread and have others vote on it.

    This is probably the best way to get the DPM product team to know about this issue and then hopefully they will take action.


    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Hello everyone,

    As I mentioned earlier in this thread (it might be difficult to see as this is a long thread), if you haven't already, I would suggest giving feedback or reporting this as a bug to the DPM link mentioned above.

    Microsoft rarely check these forums, but they will check the link above for feedback/bugs.

    So what I suggest is that someone having this issue, creates either a feedback or reports this as a bug in the following link: https://feedback.azure.com/forums/914488-system-center-data-protection-manager

    Once the feedback or bug has been submitted to the above link, post the link to the submitted feedback/bug report in this thread, and then we all vote go and vote for it, the more votes, the more likely Microsoft will react and actually do something about it.

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, May 9, 2019 2:22 PM
  • Hi Leon,

    Oh they know, no need to post feedback. 

    I think most of us posters have had a support call regarding this issue

    Thursday, May 9, 2019 2:25 PM
  • Hi Marc, I managed this problem in a MS Premier Support incident. They said the fix is in the roadmap for WS 2019 but there wasn't an ETA. So, for us, the only fast and definitive way to solve was leave MBS and go back to legacy storage.

    I absolutely agree with you on the "lack of flexibility" when you have a fresh installation of DPM on WS 2016/2019. As you said there are no real technical reasons against the use of legacy storage in Protection Groups, it's absolutely a choice of the DPM team that blocked this functionality. In my opinion they are going to pay this choice. A lot of people that wrote in this thread said they are frustrated of wasting time managing DPM. I'm not sure they are available to support this situation forever and "they could discover that there are alternative solutions to DPM ..."

    Thursday, May 9, 2019 2:49 PM
  • REFS is the Problem. I recreate the Disk now on the Server
     2019 with DPM 2019 and for sure it stop working again 
    after a view Weeks.
    The easiest fix would be to allow to use NTFS volumes when
     running DPM2016-2019 on Windows 2016-2019, but they 
    administratively block it

    Yes, NTFS as storage would be a dream. After this nightmare ReFS I don't think we'll ever use this for anything. No trust anymore.
    They wouldn't even have to support storage on NTFS; how much worse than now could it get?

    I see various people saying that adding more memory alleviates the symptoms

    Don't bother either. Whether it'd 64 or 4 Gb or anything in between doesn't make a difference. Might mask it for longer because of file caching (see 'modified memory'). Eventually you won't be able to write the cache to disk during 'idle times' anyway.

    Oh they know, no need to post feedback. 
    I think most of us posters have had a support call 
    regarding this issue

    Second this.

    How are you guys managing? 

    Short/midterm: Split the storage into two disks on VM level. That way when a group gets slow we can migrate from one to the to the other and back and not wipe all at once.

    Midterm/Longterm: With time and extra energy to expend, migrate the BMRs far away from DPM and this circus as possible since we don't see any attempt to fix or alleviate this at all.

    EDIT: Might have some temporary success using SetFileSystemCacheSize.exe . At least this can effectively disable the cache and make BMRs work (slow but consistent) since the server won't "hang" flushing to disk.

    Only fast and definitive way to solve was leave MBS and go back to legacy storage.
    I'd love to, but how to you migrate back? Even if not touching after upgrade, sometimes, in real life, new workloads appear.

    Thursday, May 9, 2019 2:55 PM
  • The feedback is already present: https://feedback.azure.com/forums/914488-system-center-data-protection-manager/suggestions/36912991-fix-refs-or-allow-pre-mbs-storage-option-in-dpm-20

    I voted/commented and hope everyone who follows this thread is going to do the same.

    Thursday, May 9, 2019 3:09 PM
  • Thanks Alberto, just voted on it.

    Ishan

    Thursday, May 9, 2019 3:11 PM
  • Thank you for your advice everybody, very helpful.

    Looks like using WS 2012 R2 and doing in-place upgrade of DPM is a workaround.   

    I don't recall seeing this anywhere else in this thread, so surprised it hasn't come up previously. 

    Has anybody else had success with this method ?

    Thanks for the feedback link, I have also up-voted it.

    Thursday, May 9, 2019 3:18 PM
  • In place works, but you won't be able to add/modify any protection groups afterwards, so it's of limited use.

    Better stay put where you are unless you HAVE to "upgrade".

    Thursday, May 9, 2019 3:20 PM
  • Absolutely. This is the official upgrade path: https://docs.microsoft.com/en-us/system-center/dpm/upgrade-dpm?view=sc-dpm-2016

    Since ReFS isn't working "well" there's no strong reason to upgrade the OS to WS 2016. In any case the important thing is to stay on legacy storage.

    Thursday, May 9, 2019 3:31 PM
  • I'm pretty sure I wrote it :-)

    Marc

    Thursday, May 9, 2019 7:39 PM
  • You are right, the only stable option to run dpm right now is DPM2016 on Win2012R2 (upgrade or fresh installation). I even have one DPM implementation where I run DPM2019 on Win2012R2.

    The thing is, I also have about 20 DPM2016 servers running on Win2016 and moving them all back to Win2012R2 is a big effort (time and money) and I guess MS is not going to pay for that….



    • Edited by mvds Thursday, May 9, 2019 7:44 PM
    Thursday, May 9, 2019 7:42 PM
  • Ohh DPM2019 on 2012R2 works? With partition based recovery (no MBS??)

    Does it still support 2019 workloads?

    We'd be willing to downgrade to that in a heartbeat.

    Friday, May 10, 2019 10:40 AM
  • DPM 2019 is supported to be installed on Windows Server 2016 & Windows Server 2019.

    But DPM 2019 supports Windows Server 2012/R2 workloads.


    Blog: https://thesystemcenterblog.com LinkedIn:

    Friday, May 10, 2019 11:20 AM
  • Hi,

    Yes, although not officially supported (are we with win2016-2019/dpm2016-2019? I don't see fixes for ReFS/MBS coming my way...), I'm running DPM2019 on Win2012R2 without MBS (MBS is only possible with Windows 2016) on one of my DPM servers.

    I'm not protecting any Windows Server 2019 payloads at this moment on that particular installation, I just wanted the DPM version to be on par with the other DPM servers to be able to switch agents to the other DPM server when needed.

    I'm running the file backups of our remote branch file servers (win2012R2) because when I switched these over to the Win2019/DPM2019 it was causing major headache, so I reverted to the old Win2012R2 server and upgraded to DPM2019.

    Marc

    Friday, May 10, 2019 12:06 PM
  • Thanks mvds,

    I really don't care about officially supported anymore with this because it really can't get much worse than the "supported" 2016/201x or 2019/201x combo. None of this works and Microsoft is just running a dog and pony show with its customers on this.

    I'll try this in a test environment.

    Thanks a lot!



    Friday, May 10, 2019 12:08 PM
  •   :-)

    Roendi

    Friday, May 10, 2019 12:25 PM