none
DPM SQL Backup Failures - ID 207 Details: VssError RRS feed

  • Question

  • I am seeing a number of failed SQL backup jobs in DPM 2012 R2.

    This is the error that I am seeing on each of the failing database backup jobs.

    The replica of SQL Server 2008 database Server\model on server.company.com is inconsistent with the protected data source. All protection activities for data source will fail until the replica is synchronized with consistency check. You can recover data from existing recovery points, but new recovery points cannot be created until the replica is consistent.For SharePoint farm, recovery points will continue getting created with the databases that are consistent. To backup inconsistent databases, run a consistency check on the farm. (ID 3106)

    An unexpected error occurred on DPM server machine during a VSS operation. (ID 207 Details: VssError:The specified object was not found. (0x80042308))

    Running a consistency check after restarting SQL VSS Writer service still fails. 

    I ran a VSSADMIN LIST WRITERS command on the DPM server and saw that the DPM Writer was in a Non-retryable error state.  So, I stopped and restarted each DPM related service in the following stop/start order.

    Stop order

    1) DPM (if started)

    2) DPMRA (if started)

    3) DPM Writer

    4) DPM AccessManager Service

    5) SQL Server Agent (MSDPM2012)

    6) SQL Server (MSDPM2012)

    Start order

    1) SQL Server (MSDPM2012)

    2) SQL Server Agent (MSDPM2012)

    3) DPM AccessManager Service

    4) DPM Writer

    5) DPMRA

    6) DPM

    The DPM Writer is now showing No Error when I re-run a VSSADMIN LIST WRITERS.  

    A consistency check still failed.  

    The DPM Writer is still in a No Errors state.

    Any advice? What other data can I collect to help troubleshoot this?


    David Jenner IT Systems Engineer Colonial Williamsburg Foundation



    Wednesday, February 13, 2019 4:55 PM

All replies

  • Hello David!

    Could you check your Windows Application log of the DPM server for any VSS related errors?

    Also what DPM build are you currently running?


    You could also try removing the SQL database from the protection group with the retain data option and then re-adding it.

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, February 13, 2019 5:06 PM
  • Leon,

    I think your advice has given me the epiphany that will lead to a fix! 

    The Application Log revealed:

    Volume Shadow Copy Service error: Volume/disk not connected or not found. Error context: CreateFileW(\Device\HarddiskVolumeShadowCopy4157,0xc0000000,0x00000003,...).

    Operation:
    Query diff area for this volume

    Context:
    Volume Name: \\?\Volume{b999b357-7d0f-11e7-945b-00155dd12703}\

    I am not sure the proper steps to remedy this.  If i remove the SQL databases from the protection group while retaining the data, then add it back, will this create new volumes in the DPM storage?

    Oh, and I am using DPM 2012 R2 (4.0.1603, which is the latest version UR14)


    David Jenner IT Systems Engineer Colonial Williamsburg Foundation


    Wednesday, February 13, 2019 6:28 PM
  • I have removed these SQL database backups from their protection group. They happen to make up the whole PG, so the whole Protection Group is gone. I retained data as I stopped protection.

    I recreated my protection group, selecting the same SQL databases. This successfully built the PG, however, the databases are still inconsistent.  A consistency check still fails with the same error.

    Based on the Application Log error mentioned above, there is a problem with the disk or the volume on the disk. When I look at the Administration> Disks area of the DPM console, I do not see any disks that are in error or showing as missing.  There are two disks present. 

    I looked in Disk Manager and see no volume named 00155dd12703, the volume name shown in the Application Event error. I am shocked that my DPM server's two iSCSI attached disks have lost volumes. I imagine if there was a third iSCSI attached disk presented to DPM, there would be an error in DPM showing that there is a disk missing. I admit, I cannot remember if there was a third one and my storage guy is out sick today. 

    I am going to ask him when he returns.

    If all else fails, I am going to stop the protection and delete the PG again, this time without retaining the data. This should delete the volume. When I recreate this PG and start protecting the SQL databases, it should create an all new volume on whatever disks are available/accessible. 


    David Jenner IT Systems Engineer Colonial Williamsburg Foundation

    Wednesday, February 13, 2019 8:22 PM
  • Can you check the disks in the DPM console under the Management tab, are they healthy or are they giving errors?


    Blog: https://thesystemcenterblog.com LinkedIn:

    Wednesday, February 13, 2019 9:23 PM
  • They are healthy.  I think I am going to carefully read these articles from Mike Jaquet.  I think he may be on to something.

    https://social.technet.microsoft.com/Forums/en-US/a39af676-2c51-41e3-82de-aae8b73550d8/vss-error-id-12289-in-the-event-log?forum=dataprotectionmanager 

    https://social.technet.microsoft.com/Forums/en-US/8b1d03ee-320a-46c6-a09b-3eca019626e9/dpm-errors-when-losing-connection-to-iscsi-volumes?forum=dpmstorage 

    I'll let you know how this goes.


    David Jenner IT Systems Engineer Colonial Williamsburg Foundation

    Tuesday, February 19, 2019 9:48 PM
  • Seems diff area is not healthy from this volume.

    Run the following PS:

    Disconnect-DPMServer

    $pglist = Get-ProtectionGroup (&hostname)

    foreach ($pg in $pglist)

    {

     

        $dslist = get-datasource $pg |  ? { $_.state -eq 'Invalid' } | Sort-Object displaypath

        foreach ($ds in $dslist)

        {

            $volume = $ds.replicapath

            mountvol x: $volume

            chkdsk x:

            write-host "`n Running Concistency Check job for datasource"  $ds.displaypath "from protection group" $pg.FriendlyName -ForegroundColor Yellow

            $Job = Start-DatasourceConsistencyCheck $ds

            while (!($Job.Hascompleted))

            {

                sleep 3

                Write-Host '.' -NoNewline

            }

            write-host "`n   Job for" $ds.displaypath "complete with status: " -NoNewline

            if ($job.Status -eq 'Succeeded')

            { write-host $job.Status -ForegroundColor green }

            else

            { write-host $job.Status -ForegroundColor Red }

            Write-Host

            mountvol x: /d

        }

    }


    • Proposed as answer by Tome Lopes Saturday, February 23, 2019 4:26 PM
    • Edited by Tome Lopes Saturday, February 23, 2019 5:43 PM
    Saturday, February 23, 2019 4:14 PM
  • Tome,

    This did not fix the above error. However, another backup on this server did get fixed by your script. That error was:

    The replica of Microsoft Hyper-V [SERVERNAME] on SCVMM [SERVERNAME] Resources.[CLUSTERNAME] is not consistent with the protected data source. DPM error ID = 92. (ID 33123)

    So thank you for that.  Can you provide a little detail about what that script is intended for? Thanks!


    David Jenner IT Systems Engineer Colonial Williamsburg Foundation

    Monday, February 25, 2019 3:48 PM
  • The script it will mount a replica, perform a check disk and run a consistency check.

    You can export the mountvol and understand if the diff-area is present or not for the volume gui of this data source but I would say you will need to stop and delete the data and add back again to the protection group. 

    Wednesday, March 6, 2019 9:51 PM
  • Seems diff area is not healthy from this volume.

    Run the following PS:

    Disconnect-DPMServer

    $pglist = Get-ProtectionGroup (&hostname)

    foreach ($pg in $pglist)

    {

     

        $dslist = get-datasource $pg |  ? { $_.state -eq 'Invalid' } | Sort-Object displaypath

        foreach ($ds in $dslist)

        {

            $volume = $ds.replicapath

            mountvol x: $volume

            chkdsk x:

            write-host "`n Running Concistency Check job for datasource"  $ds.displaypath "from protection group" $pg.FriendlyName -ForegroundColor Yellow

            $Job = Start-DatasourceConsistencyCheck $ds

            while (!($Job.Hascompleted))

            {

                sleep 3

                Write-Host '.' -NoNewline

            }

            write-host "`n   Job for" $ds.displaypath "complete with status: " -NoNewline

            if ($job.Status -eq 'Succeeded')

            { write-host $job.Status -ForegroundColor green }

            else

            { write-host $job.Status -ForegroundColor Red }

            Write-Host

            mountvol x: /d

        }

    }


    This fixed a couple for me to but ultimately a lot of data was lost.  In my case an array controller failed.

    -=Chris

    Friday, July 26, 2019 6:42 PM
  • UPDATE:

    All SQL database backups (on a two-node SQL cluster) have been failing for a few weeks.  The error message includes the following text:

    Please check that the Event Service, the VSS service and the shadow copy provider service is running, and check for errors associated with these services in the Application Event Log on the server sqlserver1.domain.com. Please allow 10 minutes for VSS to repair itself and then retry the operation.

    For more information on this error, go to http://go.microsoft.com/fwlink/?LinkId=132612.

    On both nodes of the cluster, I looked at the following services:

    • Hyper-V Volume Shadow Copy Requestor
    • Microsoft Software Shadow Copy Provider
    • SQL Server VSS Writer
    • Volume Shadow Copy

    If they were stopped, I started them.  If they were running, I restarted them.  Then I started/re-started the following service:

    • DPMRA

    Then I ran a vssadmin list writers to confirm that all were stable and without error.  After completing these steps, DPM was able to complete a consistency check of each SQL database.

    This has only fixed backups on one clustered database instance, one time, so I am not claiming this is a silver bullet, but please consider these steps when troubleshooting backsups.


    David Jenner IT Systems Engineer Colonial Williamsburg Foundation

    Wednesday, October 16, 2019 7:35 PM