none
Performance of SQL Server recovery point jobs with DPM 2016 and modern storage? RRS feed

  • Question

  • I have been reading promising things about the new "modern storage" in DPM 2016.

    Can I expect better performance for large SQL recovery point jobs if I switch from DPM 2012 R2?

    I would do a complete reinstall of DPM in a new Windows Server 2016 OS.

    I would leave the SQL Server as 2014 sp1 or maybe sp2. And I would prefer to leave the SQL Server running on Windows Server 2012 R2.

    Today it can take several hours to run large recovery point jobs - sometimes not finishing before the next days jobs were scheduled to start. Depending on how much load the applications using the databases also puts on the server.

    The server houses about 12 SQL Server instances each with one large database (0.5 - 3.0 TB each). All databases use simple recovery model. The databases are divided into two protection groups both with the same goals: 1 daily recovery point using Express Full Backup.

    When the SQL Server is under high load and it has to perform recovery point jobs, which need to move a lot of data, it tends to get volsnap error 25 in the server event log. Typically after recovery point jobs have been running for a long time. All running jobs fail when volsnap error 25 occurs.

    I have seen shadowstorage in use (vssadmin list shadowstorage) up to about 500 GB during such long jobs.

    It seems to me that VSS on the SQL Server simply might be overloaded and I guess that I am hoping that another technology is used in DPM 2016. It looks like it is for HyperV, but I am not sure for SQL Server. Hence this question.

    Thursday, November 17, 2016 4:18 PM

Answers

  • Hi,

    Using DPM 2016 along with modern storage "could" help if there is an IO bottleneck on the DPM 2012 R2 Server due to copy-on-write (COW) when making a new recovery point. Under that circumstance, DPM 2016 could finish the backup quicker and may help eliminate the need for maintaining a large snapshot on the protected sql server.   However, there are things you can do today on the SQL server to try to eliminate the volsnap 25 event messages.

    1) Schedule a NTFS defrag on the volume(s) hosting the SQL databases.
    -or-
    2) Use vssadmin.exe to move the shadow copy storage space to a separate volume that is either higher performance and / or is not under as heavy IO load when the backups are scheduled.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Thursday, November 17, 2016 9:40 PM
    Moderator

All replies

  • Hi,

    Using DPM 2016 along with modern storage "could" help if there is an IO bottleneck on the DPM 2012 R2 Server due to copy-on-write (COW) when making a new recovery point. Under that circumstance, DPM 2016 could finish the backup quicker and may help eliminate the need for maintaining a large snapshot on the protected sql server.   However, there are things you can do today on the SQL server to try to eliminate the volsnap 25 event messages.

    1) Schedule a NTFS defrag on the volume(s) hosting the SQL databases.
    -or-
    2) Use vssadmin.exe to move the shadow copy storage space to a separate volume that is either higher performance and / or is not under as heavy IO load when the backups are scheduled.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Thursday, November 17, 2016 9:40 PM
    Moderator
  • Thanks for the quick reply. It might be worth a try to go for 2016, although I am not optimistic. I do not think that the DPM server has an IO bottleneck, but I have not been logging the performance on that.

    I already looked at fragmentation. There isn't any.

    Your suggestions to move the shadow copy storage might work, but the server only has the one raid 10 volume using all the disks in the array. It is the C-drive and is used for Windows, shadow copy storage and all of the database files and logs.

    The server has two SSDs as well, but right now they are used as a read-only cache for the raid 10 volume set up on the raid controller to boost performance for the most heavily used tables in the databases. I could look into making the SDDs into a separate raid 0 or raid 1 array and move the shadow copy storage to that. But I am not sure that it will fit.

    My focus with this post was on DPM 2016. I have an open support case for fixing the issues in the current environment.

    Thursday, November 17, 2016 10:01 PM
  • Mike,

    Can I ask your opinion on my current 2012 R2 problem?

    There are multiple issues and it is difficult to determine with any certainty which problem is the root cause and which are side-effects.

    But my view is now that the root cause is this:

    Right now there are two running recovery point jobs. Both are for large (2-3 TB) databases. They have both been running for 60-70 hours. This long runtime causes multiple other problems and is in my view the root cause.

    So why might the job take so long time?? That to me is the key question.

    Right now, neither the protected SQL Server nor the DPM server are experiencing any sort of serious load. They are practically idling.

    The DPM Server sees 0% CPU usage, loads of available memory and 0.00 disk queue length. The DPMRA process is receiving a steady but pathetic 350KB/s over the network from the SQL Server.

    The SQL Server also seed 0% CPU usage and has available memory. Disk queue length below 0.1.

    In case you wonder if there are network problems between the two servers: ping reports time<1ms. And if I initiate a large file copy it runs at a steady 1GB/s.

    Finally: The SQL Server is a physical server and the DPM server is a HyperV vm. It is the only vm on the HyperV host. One of the vm's two vhdx files is local on the host and the other is on a separate file server, which only is there to host that one file. Nether of the two physical servers supporting the vm are experiencing any load either.

    Saturday, November 19, 2016 12:32 PM