none
Regular DPM 2016 service crash RRS feed

  • Question

  • We run DPM 2016 UR4 on a Windows Server 2016 latest CU with a local SQL Server 2014 database, also latest SP and CU installed. This installation was upgraded from DPM 2012R2. I often find that in the morning some longer running backup jobs failed. In the Event Viewer i see that the MSDPM service terminated unexpectedly and was restarted. Usually failed backup jobs run again later so this is isn't much of a problem. But i also have a larger backup to tape backup job that runs once per month for over 20 hours and that doesn't finish anymore.

    This happens almost around 00:23 every day. Before the service terminates i see an error message related to an error with a SQL procedure. Does anyone have a recommendation on how to fix this?

    Log Name:      System
    Source:        Service Control Manager
    Date:          02.11.2017 00:23:48
    Event ID:      7031
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      Backup.ub.uni-muenchen.de
    Description:
    The DPM service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 100 milliseconds: Restart the service.


    Log Name:      Application
    Source:        MSDPM
    Date:          02.11.2017 00:23:41
    Event ID:      945
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      Backup.ub.uni-muenchen.de
    Description:
    The description for Event ID 945 from source MSDPM cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

    If the event originated on another computer, the display information had to be saved with the event.

    The following information was included with the event: 

    Unable to connect to the DPM database because of a general database failure.  Make sure that SQL Server is running and that it is configured correctly.

    Problem Details:
    <FatalServiceError><__System><ID>19</ID><Seq>3621</Seq><TimeCreated>01.11.2017 23:23:41</TimeCreated><Source>DpmThreadPool.cs</Source><Line>163</Line><HasError>True</HasError></__System><ExceptionType>SqlException</ExceptionType><ExceptionMessage>Procedure or function prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica has too many arguments specified.</ExceptionMessage><ExceptionDetails>System.Data.SqlClient.SqlException (0x80131904): Procedure or function prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica has too many arguments specified.
       at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
       at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
       at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean&amp; dataReady)
       at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
       at System.Data.SqlClient.SqlDataReader.get_MetaData()
       at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption)
       at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task&amp; task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
       at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task&amp; task, Boolean&amp; usedCache, Boolean asyncWrite, Boolean inRetry)
       at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
       at System.Data.SqlClient.SqlCommand.ExecuteScalar()
       at Microsoft.Internal.EnterpriseStorage.Dls.DB.SqlRetryCommand.ExecuteScalar()
       at Microsoft.Internal.EnterpriseStorage.Dls.PRMCatalog.RecoverySourceFactory.GetValidDatasetCountOnPhysicalReplica(SqlContext sqlContext, Guid datasourceId, Guid physicalReplicaId, Boolean includeDatasetsWithoutSC)
       at Microsoft.Internal.EnterpriseStorage.Dls.PRMCatalog.PrmCatalog.GetValidDatasetCountOnPhysicalReplica(Guid datasourceId, Guid physicalReplicaId, Boolean includeDatasetsWithoutSC)
       at Microsoft.Internal.EnterpriseStorage.Dls.Intent.IntentManager.DeallocateInactiveLogicalReplicaWithNoRecoveryPoints(Replica logicalReplica)
       at Microsoft.Internal.EnterpriseStorage.Dls.Intent.IntentManager.RemoveRecoveryPoint(String recoveryPointsInformationXML)
       at Microsoft.Internal.EnterpriseStorage.Dls.Engine.CIntentServices.RemoveRecoveryPoint(UInt16* removeRecoveryPointInformationXml)
       at Microsoft.Internal.EnterpriseStorage.Dls.Engine.CCoreServices.RemoveRecoveryPoint(CCoreServices* , UInt16* bstrRemoveRecoveryPointInformationXml, tagSAFEARRAY** exceptionResult)
    ClientConnectionId:eb0c087b-d40f-4d88-9962-f967a4debcae
    Error Number:8144,State:2,Class:16</ExceptionDetails></FatalServiceError>

    Friday, November 3, 2017 12:20 PM

Answers

  • Hello,

    Please do the following:

    1)Backup your DPM DB

    2)Run SQL Script:

    ------------------------------------------------------

    IF EXISTS (SELECT * FROM dbo.sysobjects
               WHERE id = OBJECT_ID(N'prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica')
               AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
    DROP PROCEDURE dbo.prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica
    GO

    -- Returns the number of valid datasets for that datasource on the physical replica for the given datasource
    CREATE PROCEDURE dbo.prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica
    (
      @DatasourceId GUID,
      @ReplicaId GUID,
      @IncludeDatasetsWithoutSC BIT = 1     -- Default value 1
    )
    AS
        DECLARE @error int
        SET @error = 0
       
        SET NOCOUNT ON

        IF @IncludeDatasetsWithoutSC = 1
        BEGIN
            SELECT COUNT(Dataset.DatasetId)
            FROM dbo.tbl_RM_ReplicaDataset AS Dataset
                LEFT OUTER JOIN tbl_RM_ShadowCopy ShadowCopy
                    ON ShadowCopy.ShadowCopyId = Dataset.ShadowCopyId
                JOIN dbo.tbl_RM_RecoverySource AS RecoverySource
                    ON Dataset.DatasetId = RecoverySource.DatasetId
                JOIN tbl_PRM_LogicalReplica ReplicaDS
                    ON Dataset.ReplicaId = ReplicaDS.ReplicaId
            WHERE ReplicaDS.PhysicalReplicaId = @ReplicaId AND
                ReplicaDS.DatasourceId = @DatasourceId AND
                ReplicaDS.Validity <> 4 AND     -- not destroyed
                (ShadowCopy.Validity = 2 OR Dataset.ShadowCopyId IS NULL) AND
                RecoverySource.IsValid = 1 AND
                RecoverySource.IsGCed = 0 AND
                ReplicaDS.IsGCed = 0 AND
                (ShadowCopy.IsGCed = 0 OR ShadowCopy.IsGCed IS NULL) AND
                Dataset.IsGCed = 0
        END
        ELSE
        BEGIN
            SELECT COUNT(Dataset.DatasetId)
            FROM dbo.tbl_RM_ReplicaDataset AS Dataset
                JOIN tbl_RM_ShadowCopy ShadowCopy
                    ON ShadowCopy.ShadowCopyId = Dataset.ShadowCopyId
                JOIN dbo.tbl_RM_RecoverySource AS RecoverySource
                    ON Dataset.DatasetId = RecoverySource.DatasetId
                JOIN tbl_PRM_LogicalReplica ReplicaDS
                    ON Dataset.ReplicaId = ReplicaDS.ReplicaId
            WHERE ReplicaDS.PhysicalReplicaId = @ReplicaId AND
                ReplicaDS.DatasourceId = @DatasourceId AND
                ReplicaDS.Validity <> 4 AND     -- not destroyed
                ShadowCopy.Validity = 2 AND
                RecoverySource.IsValid = 1 AND
                RecoverySource.IsGCed = 0 AND
                ReplicaDS.IsGCed = 0 AND
                ShadowCopy.IsGCed = 0 AND
                Dataset.IsGCed = 0
        END
         
        SET @error = @@ERROR
        SET NOCOUNT OFF

        RETURN @error
    GO

    ------------------------------------------------------

    This should fixe the issue.

    Thank you

    • Proposed as answer by Marcel_H Tuesday, November 28, 2017 12:43 PM
    • Marked as answer by Christian_Wimmer Monday, December 4, 2017 2:31 PM
    Monday, November 27, 2017 3:12 PM

All replies

  • Same Problem here. Our DPM Service crashes at 0:04 am every day. On our second Server the same happens at 0:02 am...
    • Edited by Marcel_H Tuesday, November 7, 2017 10:49 AM
    Tuesday, November 7, 2017 9:54 AM
  • Same Problem here as well...

    WARNING No retry on exception System.Data.SqlClient.SqlException (0x80131904): Für die Prozedur oder Funktion prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica wurden zu viele Argumente angegeben.
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.SqlDataReader.get_MetaData()
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption)
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei System.Data.SqlClient.SqlCommand.ExecuteScalar()
    0A30 21F4 11/07 23:13:18.706 09 Catalog.cs(1093)   WARNING    bei Microsoft.Internal.EnterpriseStorage.Dls.DB.SqlRetryCommand.ExecuteScalar()

    Wednesday, November 8, 2017 7:29 AM
  • We have the same problem. Give Mircosoft a solution yet?
    Friday, November 10, 2017 7:36 AM
  • I opened a case with MS but the support engineer told me he is on vacation this week in his initial response...
    Friday, November 10, 2017 7:52 AM
  • I opened a case with MS but the support engineer told me he is on vacation this week in his initial response...

    What? Have MS only one support engineer for DPM? The answer from MS is laughable. I'm tense.
    Friday, November 10, 2017 8:23 AM
  • I am seeing the exact same issue after upgrading to DPM 2016 CU4. I didn't have this problem with CU2.

    My server was previously upgraded from Server 2012 R2 and DPM 2012 R2 to Server 2016 and DPM 2016.

    Please update this thread if you find a solution.

    Saturday, November 11, 2017 11:34 PM
  • Same problem here. We have two DPM 2016 servers, a primary and secondary. The secondary doesn't exhibit this problem, but the primary is regularly going into 100% cpu usage and eventually will spontaneously reboot. Memory goes to 100% used too.
    Tuesday, November 14, 2017 3:32 PM
  • Is there any news on the topic?
    Wednesday, November 15, 2017 12:54 PM
  • Could it be same as I have. Maybe server needs more memory to not crash. At least it looks like this in my case until now. It has an DPM service crash and then crashes Win2016. Different kind of errors. Probably dependet on what DPM was planning to do.

    Just strange that Swap memory is not used instead of crashing.

    https://social.technet.microsoft.com/Forums/en-US/049e8692-bb4c-4dd8-9293-f8a384278978/dpm-2016-ur24-dpm-exception-and-then-crash-windows-2016?forum=dpmfilebackup

    Wednesday, November 15, 2017 5:04 PM
  • In our case the DPM server is a physical machine with 32 GB RAM. So i don't think it's a problem of less memory. In UR2 we don't have any problems of this kind. I think it's a bug in UR4!
    • Edited by Marcel_H Thursday, November 16, 2017 7:41 AM
    Thursday, November 16, 2017 7:41 AM
  • OK, let me guess. I think you all were eager like me and tested the new UR4 feature of migrating a backup to other Storage MBS. And it crashed while migrating ?

    I did the same and after that the DPM 2016 UR4 server started crashing very night :( As I installed UR4 + tested migration on the same day and it started crashing after that I also thought it was the reason until I started digging deeper ;)

    I guess it started creating some physical things or write it in the database, therefor this returns too much data:

     WARNING    Message: Procedure or function prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica has too many arguments specified. 

    No matter if you did this or not, here is how to fix it.

    If you know exactly what backup you tried to move, then DELETE that completely and create a new backup. NOTE! There was two backups in the protection group I tried to migrate and I only moved one of them. To be secure I deleted BOTH of them and recrerated them. I don't think it is needed. but just in case it does not help to delete the one you migrated. That fixed the crashing :)

    Anyway you should check what backup i causing this crash. Look in the DPM TEMP dir "C:\Program Files\Microsoft System Center 2016\DPM\DPM\Temp" on the DPM server and find the *.crashlog logs for the time your server crashed. Find the above error "prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica " and the following lines around that place. Look for the line with BOLD here:

    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING    SqlException encountered, SqlRetryCommand diag details - SqlCommandText  =49ff86e9-c47a-418e-bc0b-a882d72cd475> Name=dbo.prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica, CommandType=StoredProcedure
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING    CommandDiagInfo => CanRetry=True, CommandTimeout=3600
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING    CommandParams   => Count=4, InTx=False
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING         Param[0]   => ParameterName=@DatasourceId | Value=49ff86e9-c47a-418e-bc0b-a882d72cd475 | Size=0 | DbType=UniqueIdentifier | Direction=Input | IsNullable=False
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING         Param[1]   => ParameterName=@ReplicaId | Value=7e1db26e-6299-40ab-bff6-077709ddaeb6 | Size=0 | DbType=UniqueIdentifier | Direction=Input | IsNullable=False
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING         Param[2]   => ParameterName=@IncludeDatasetsWithoutSC | Value=True | Size=0 | DbType=Bit | Direction=Input | IsNullable=False
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING         Param[3]   => ParameterName=@RETURN_VALUE | Value=[DBNull] | Size=0 | DbType=Int | Direction=ReturnValue | IsNullable=False
    31F8    4808    11/01    00:12:59.413    09    Catalog.cs(1093)            WARNING    No retry on exception System.Data.SqlClient.SqlException (0x80131904): Procedure or function prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica has too many arguments specified.

    Then start "SQL Server management studio" go to you DPM database, open a query windows and do the following select with your version of the above GUID:

    SELECT *
      FROM [dbo].[tbl_IM_DataSource] where [DataSourceId] like '49ff86e9-c47a-418e-bc0b-a882d72cd475'

    The field "DataSourcename" will give you a hint, but also look in the XML of field "ApplicationPath" It will tell you what backup it is that crashes DPM. Copy it to notepad and look through it.

    Stop protection of that backup you found, including "Delete Replica On disk" and create a new backup. Tape backups you can keep. That should fix it.

    If it crashes again, do all this above procedure again to look if there is more backups that makes it crash. Keep on doing this until there is no errors like the above left. Then it should stop crashing. For me it was only one backup.

    But as I said, I delete the whole protection group, just to be sure that I was out of the problem ;) and that worked.

    I hope this will help you :)


    • Edited by ChristianJacobsen Thursday, November 16, 2017 9:14 AM
    • Proposed as answer by AleksKey Friday, June 1, 2018 6:25 AM
    Thursday, November 16, 2017 9:04 AM
  • The post from Uncle Arctica was what misleaded me to the first post ;)

    I guess he has too little memory in his DPM server ;)

    See below.



    Thursday, November 16, 2017 9:13 AM
  • Hi Christian,

    i have found the backup that makes the dpm server crashes. I have renewed the backup for this. I'll let you know about the results of this tests. Hope for the best.

    Marcel

    Thursday, November 16, 2017 11:36 AM
  • We're also having issues with the crashing behaviour too - we already had some support cases open to do with DPM 2016 crashing for all manner of reasons, and our contact confirmed we have the same issue as you guys all do here, and that they're both aware of it and looking into it, but just wanted to add our voice to the complaints.

    DPM 2016 has been painful since initial deployment and has shown no sign of improvement.

    Thursday, November 16, 2017 4:00 PM
  • Hi Christian,

    i have found the backup that makes the dpm server crashes. I have renewed the backup for this. I'll let you know about the results of this tests. Hope for the best.

    Marcel

    How did it go ?
    Friday, November 17, 2017 12:28 PM
  • I followed Christian's guidance for this issue. In my case I only upgraded to UR4 and started seeing the issue, I did not even attempt migrating datasources to another MBS volume.

    Anyway, I located the data source causing the crash and removed. The next morning it crashed again due to a different data source. I suppose I can continue down this path. The downside is that many of these are backed up over the WAN so it's not easy to re-create the replica's.

    Friday, November 17, 2017 5:08 PM
  • I have the same issue like tpullins. I identified the source, delete it in the PG and create a new replica. On the next day there is the next source is causing the Crash. I cant create for any source a new replica because we have also small WAN-links. This would take too much time...

    MS should immediately solved this problem. Until now there is no official Statement about this issue...

    Sunday, November 19, 2017 10:17 AM
  • Thanks ChristianJacobsen.

    We didn't migrate any of our modern backup storage. Most of our Protection Groups where migrated from DPM 2012R2 legacy storage and moved from a SAN to local storage on our phyiscal DPM 2016 server.

    I've indentified the datasource that caused the crashes and removed it. *fingers crossed*

    Monday, November 20, 2017 10:00 AM
  • I am sad to hear that you did not migrate anything and still has this issue :( So there must be some data in the database creating this problem.

    My advice if there is somebody with more time. In SSMS (SQL Studio) take a look at the stored procedure dbo.prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica and the parameters here (that is from my example of the log above) :

    Param[0]   => ParameterName=@DatasourceId | Value=49ff86e9-c47a-418e-bc0b-a882d72cd475 | Size=0 | DbType=UniqueIdentifier | Direction=Input | IsNullable=False

    And find out what it is querying and why it returns this error :

    No retry on exception System.Data.SqlClient.SqlException (0x80131904): Procedure or function prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica has too many arguments specified.

    Just try calling it. My guess is that there is a GUID or alike that returns more than one data source or something like that. 

    Maybe also compare the stored procedure with the one from UR2 - maybe it was changed and they made it buggy.

    I hope I could lead seomeone the right way as I am too busy with other (paying ;) ) customers, so I don't have a time right now. Good luck.

    Kind regards Christian

    Monday, November 20, 2017 10:21 AM
  • Thanks ChristianJacobsen.

    We didn't migrate any of our modern backup storage. Most of our Protection Groups where migrated from DPM 2012R2 legacy storage and moved from a SAN to local storage on our phyiscal DPM 2016 server.

    I've indentified the datasource that caused the crashes and removed it. *fingers crossed*

    So after upgrading from 2012 R2 til DPM 2016 and Windows Server 2016, you used the procedure in DPM 2016 RTM or UR1 that you stopped protection of the old Protection Group without deleting backups and then created the Protection Group it again on new MBS (Modern Backup Storage) ?


    Monday, November 20, 2017 10:25 AM
  • Yes, we wanted to keep the old backups. So we deleted the protection groups, kept the data and recreated the Protection Groups and pointed them at the new backup storage (MBS). 

    I followed the Microsoft documentation on each of the steps.

    Migrating legacy storage to Modern Backup Storage

    https://docs.microsoft.com/en-us/system-center/dpm/upgrade-to-dpm-2016?view=sc-dpm-1711

    Monday, November 20, 2017 11:04 AM
  • Yes, we wanted to keep the old backups. So we deleted the protection groups, kept the data and recreated the Protection Groups and pointed them at the new backup storage (MBS). 

    I followed the Microsoft documentation on each of the steps.

    Migrating legacy storage to Modern Backup Storage

    https://docs.microsoft.com/en-us/system-center/dpm/upgrade-to-dpm-2016?view=sc-dpm-1711


    We also follow this guide and with UR2 all was fine...
    Monday, November 20, 2017 12:09 PM
  • Yes, we wanted to keep the old backups. So we deleted the protection groups, kept the data and recreated the Protection Groups and pointed them at the new backup storage (MBS). 

    I followed the Microsoft documentation on each of the steps.

    Migrating legacy storage to Modern Backup Storage

    https://docs.microsoft.com/en-us/system-center/dpm/upgrade-to-dpm-2016?view=sc-dpm-1711

    OK, so maybe that is root of the problem. There is something leftover from the old backup in the database.

    My customers have NOT migrated the backups. So no crash for that reason here.

    On the customer where i fixed the migration-crashed-problem we did an upgrade from DPM 2012 R2. But on that customer I was not able to do that Old-to-new-storage migration :( It just protects the OLD storage again if I follow that above link-description on DPM 2016 UR2. And I found nothing about how to do it anymore then :(

    Did you migrate the backups on DPM 2016 UR2, UR1 or RTM ? 

    On all my other customer we have either not upgraded to DPM 2016 yet or we have made totally new DPM 2016 servers with new storage. So there I don't see the problem either.

    If one of you are a litle be-wandered in SQL, take a look at what I suggested above. Or just compare the SQL stored procedure from UR2 with the one from UR4.

    Or If you open a case or have a case open with MS. Show them this thread and let them look at that.

    Monday, November 20, 2017 1:07 PM
  • Our migration happened on DPM 2016 UR1. The datasource i found and removed for now is one of the datasources we migrated to DPM 2016.

    Monday, November 20, 2017 5:18 PM
  • I had to recreate two protection groups (with deleting the data). Since then it didn't crash anymore - running for 4 days without a crash now.
    Tuesday, November 21, 2017 11:42 AM
  • For us this is not a option...
    • Edited by Marcel_H Tuesday, November 21, 2017 4:24 PM
    Tuesday, November 21, 2017 12:48 PM
  • I had a little more time. There is no difference between the SP in UR2 and UR4.

    It must definetly be a bug in DPM 2016 UR4. One programmer made a mistake.

    The Stored Procedure has TWO arguments defined:

    CREATE PROCEDURE [dbo].[prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica]
    (
      @DatasourceId GUID,
      @ReplicaId GUID
    )
    AS

    As we can see in the errorlog below it looks like the Store Procdedure is called with THREE arguments !? (plus one return value) and then of course dies with "Procedure or function prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica has too many arguments specified."

    The first two are expected, the last one not. Someone fucked it up. 

    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING    SqlException encountered, SqlRetryCommand diag details - SqlCommandText  =49ff86e9-c47a-418e-bc0b-a882d72cd475> Name=dbo.prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica, CommandType=StoredProcedure
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING    CommandDiagInfo => CanRetry=True, CommandTimeout=3600
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING    CommandParams   => Count=4, InTx=False
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING         Param[0]   => ParameterName=@DatasourceId | Value=49ff86e9-c47a-418e-bc0b-a882d72cd475 | Size=0 | DbType=UniqueIdentifier | Direction=Input | IsNullable=False
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING         Param[1]   => ParameterName=@ReplicaId | Value=7e1db26e-6299-40ab-bff6-077709ddaeb6 | Size=0 | DbType=UniqueIdentifier | Direction=Input | IsNullable=False
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING         Param[2]   => ParameterName=@IncludeDatasetsWithoutSC | Value=True | Size=0 | DbType=Bit | Direction=Input | IsNullable=False
    31F8    4808    11/01    00:12:59.273    09    Catalog.cs(1091)            WARNING         Param[3]   => ParameterName=@RETURN_VALUE | Value=[DBNull] | Size=0 | DbType=Int | Direction=ReturnValue | IsNullable=False
    31F8    4808    11/01    00:12:59.413    09    Catalog.cs(1093)            WARNING    No retry on exception System.Data.SqlClient.SqlException (0x80131904): Procedure or function prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica has too many arguments specified.

    So they must fix it.

    Report this to MS who ever has a case open.

    Who ever has the problem could also try to change the procedure to have the third arguments and see if it then continues. I will look at posting an example that might work in the next post. 



    Wednesday, November 22, 2017 8:34 AM
  • Who ever has the problem could also try to change the procedure to have the third arguments and see if it then continues.

    But be warned !!! you are changing your DPM database on your own risks !! Don't blame me for any losses. Don't blame me if MS says your database is no longer supported or anything like that.

    Better try this in an test environment.

    Or at least

    1. backup the DPM database

    2. and script the procedure as ALTER to query window so you can change it back. In SSMS Expand the databse, Programability, Stored Procedures. Find "prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica"  Right click on it an chose  "Script stored procedure as/ALTER to/ New query window". Then you should have something like below. Save it in a secure place.

    I just added one line here in bold that sets an optional parameter as it is now needed by UR4.

    I have no Idea what happens AFTER this, so as I said. You are on your own risk here. 


    /****** Object:  StoredProcedure [dbo].[prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica]    Script Date: 22-11-2017 09:39:01 ******/
    SET ANSI_NULLS OFF
    GO

    SET QUOTED_IDENTIFIER OFF
    GO


    -- Returns the number of valid datasets for that datasource on the physical replica for the given datasource
    ALTER PROCEDURE [dbo].[prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica]
    (
      @DatasourceId GUID,
      @ReplicaId GUID,
      @IncludeDatasetsWithoutSC Bit = NULL

    )
    AS
        DECLARE @error int
        SET @error = 0
        
        SET NOCOUNT ON

        SELECT COUNT(Dataset.DatasetId)
        FROM dbo.tbl_RM_ReplicaDataset AS Dataset
            LEFT OUTER JOIN tbl_RM_ShadowCopy ShadowCopy
                ON ShadowCopy.ShadowCopyId = Dataset.ShadowCopyId
            JOIN dbo.tbl_RM_RecoverySource AS RecoverySource
                ON Dataset.DatasetId = RecoverySource.DatasetId
            JOIN tbl_PRM_LogicalReplica ReplicaDS
                ON Dataset.ReplicaId = ReplicaDS.ReplicaId
        WHERE ReplicaDS.PhysicalReplicaId = @ReplicaId AND
            ReplicaDS.DatasourceId = @DatasourceId AND
            ReplicaDS.Validity <> 4 AND     -- not destroyed
            (ShadowCopy.Validity = 2 OR Dataset.ShadowCopyId IS NULL) AND
            RecoverySource.IsValid = 1 AND
            RecoverySource.IsGCed = 0 AND
            ReplicaDS.IsGCed = 0 AND
            (ShadowCopy.IsGCed = 0 OR ShadowCopy.IsGCed IS NULL) AND
            Dataset.IsGCed = 0
              
        SET @error = @@ERROR
        SET NOCOUNT OFF

        RETURN @error

    GO

    Good luck ;)


    Wednesday, November 22, 2017 8:45 AM
  • I've noticed that datasources that cause this service crash all have display problems in the GUI when changing the protection groups.

    Some show the C:\ drive twice, others only show the C: drive while the protected data is on the D: drive.

    Thursday, November 23, 2017 2:39 AM
  • I've noticed that datasources that cause this service crash all have display problems in the GUI when changing the protection groups.

    Some show the C:\ drive twice, others only show the C: drive while the protected data is on the D: drive.

    Interesting. Maybe that is the reason why they removed the possibility to migrate old storage to new storage in UR2. It was buggy and instead of fixing it , they just removed it.

    • Proposed as answer by Jon7219 Thursday, November 23, 2017 4:27 PM
    • Unproposed as answer by Christian_Wimmer Thursday, November 23, 2017 9:38 PM
    Thursday, November 23, 2017 6:24 AM
  • I had to recreate two protection groups (with deleting the data). Since then it didn't crash anymore - running for 4 days without a crash now.

    Christoph von Wittich,

    You had a case open at Microsoft.

    What did they say to it ?

    Or did you just follow my advice ?

    Thursday, November 23, 2017 6:26 AM
  • Patiently waiting to see if anyone gets an answer from Microsoft. We migrated multiple TB's worth of protection groups to MBS and each migrated datasource is causing this issue. It isn't feasible to recreate each replica.
    Thursday, November 23, 2017 6:19 PM
  • Maybe someone can just reproduce the problem in test.

    Installing a Windows 2012 R2 with DPM 2012 R2, protecting a couple of production servers in different protection groups. Then update DPM to 2016 UR1 and Windows to 2016. Create a VMWare/Hyper-V snapshot, migrate one datasource and upgrade to DPM 2016 UR4. And see if it crashes.

    If yes apply the SQL fix i posted above and see what then happens.

    If it does not crash, then go back to the snapshot, run a CC on the backups and maybe migrate both PG + take the way over UR2 to UR4 and again see if it crashes.


    Friday, November 24, 2017 10:05 AM
  • Christoph von Wittich,

    You had a case open at Microsoft.

    What did they say to it ?

    Or did you just follow my advice ?

    I just followed your advice as the protection groups I had to recreate were quite small and the support engineer did not have a solution yet.
    Monday, November 27, 2017 9:58 AM
  • Hello,

    Please do the following:

    1)Backup your DPM DB

    2)Run SQL Script:

    ------------------------------------------------------

    IF EXISTS (SELECT * FROM dbo.sysobjects
               WHERE id = OBJECT_ID(N'prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica')
               AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
    DROP PROCEDURE dbo.prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica
    GO

    -- Returns the number of valid datasets for that datasource on the physical replica for the given datasource
    CREATE PROCEDURE dbo.prc_RM_ReplicaDataset_GetValidDatasetCountOnPhysicalReplica
    (
      @DatasourceId GUID,
      @ReplicaId GUID,
      @IncludeDatasetsWithoutSC BIT = 1     -- Default value 1
    )
    AS
        DECLARE @error int
        SET @error = 0
       
        SET NOCOUNT ON

        IF @IncludeDatasetsWithoutSC = 1
        BEGIN
            SELECT COUNT(Dataset.DatasetId)
            FROM dbo.tbl_RM_ReplicaDataset AS Dataset
                LEFT OUTER JOIN tbl_RM_ShadowCopy ShadowCopy
                    ON ShadowCopy.ShadowCopyId = Dataset.ShadowCopyId
                JOIN dbo.tbl_RM_RecoverySource AS RecoverySource
                    ON Dataset.DatasetId = RecoverySource.DatasetId
                JOIN tbl_PRM_LogicalReplica ReplicaDS
                    ON Dataset.ReplicaId = ReplicaDS.ReplicaId
            WHERE ReplicaDS.PhysicalReplicaId = @ReplicaId AND
                ReplicaDS.DatasourceId = @DatasourceId AND
                ReplicaDS.Validity <> 4 AND     -- not destroyed
                (ShadowCopy.Validity = 2 OR Dataset.ShadowCopyId IS NULL) AND
                RecoverySource.IsValid = 1 AND
                RecoverySource.IsGCed = 0 AND
                ReplicaDS.IsGCed = 0 AND
                (ShadowCopy.IsGCed = 0 OR ShadowCopy.IsGCed IS NULL) AND
                Dataset.IsGCed = 0
        END
        ELSE
        BEGIN
            SELECT COUNT(Dataset.DatasetId)
            FROM dbo.tbl_RM_ReplicaDataset AS Dataset
                JOIN tbl_RM_ShadowCopy ShadowCopy
                    ON ShadowCopy.ShadowCopyId = Dataset.ShadowCopyId
                JOIN dbo.tbl_RM_RecoverySource AS RecoverySource
                    ON Dataset.DatasetId = RecoverySource.DatasetId
                JOIN tbl_PRM_LogicalReplica ReplicaDS
                    ON Dataset.ReplicaId = ReplicaDS.ReplicaId
            WHERE ReplicaDS.PhysicalReplicaId = @ReplicaId AND
                ReplicaDS.DatasourceId = @DatasourceId AND
                ReplicaDS.Validity <> 4 AND     -- not destroyed
                ShadowCopy.Validity = 2 AND
                RecoverySource.IsValid = 1 AND
                RecoverySource.IsGCed = 0 AND
                ReplicaDS.IsGCed = 0 AND
                ShadowCopy.IsGCed = 0 AND
                Dataset.IsGCed = 0
        END
         
        SET @error = @@ERROR
        SET NOCOUNT OFF

        RETURN @error
    GO

    ------------------------------------------------------

    This should fixe the issue.

    Thank you

    • Proposed as answer by Marcel_H Tuesday, November 28, 2017 12:43 PM
    • Marked as answer by Christian_Wimmer Monday, December 4, 2017 2:31 PM
    Monday, November 27, 2017 3:12 PM
  • This solved the problem for me.
    Tuesday, November 28, 2017 12:43 PM
  • The service no longer crashes since running that. However I have noticed that the DPM console has been slow and unresponsive ever since. Perhaps it's just a coincidence.
    Friday, December 1, 2017 10:07 PM
  • The DPM 2016 on the server where the DPM server is installed has always been kinda slow. Especially when a lot of backups are starting at the same time.

    One problem at a time.. :)

    Updating the Stored Procedure fixed the server crash for us.

    Monday, December 4, 2017 2:33 PM