none
Consistency check fails with error description "The DPM service terminated unexpectedly during completion of the job..." RRS feed

  • Question

  • Hi,

    Consistency checks on our fileserver are failing with error description

    The DPM service terminated unexpectedly during completion of the job. The termination may have been caused by a system reboot. (ID 910)

    Of course no system reboot occurred on the protected server nor on the DPM server.

    DPMRA log on fileserver cluster node is showing below entries when the job fails

    ...

    1870 2854 12/09 23:32:26.649 18 filefixupreaderhelper.cpp(71) [0000000000E107F0] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CFileFixupReaderHelper, ~CFileFixupReaderHelper
    1870 2854 12/09 23:32:26.649 20 session.cpp(1560) [0000000000E055A0] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL Hr: = [0x00000000] DM: Will attempt to post DM_SESSION_CLOSED_BY_REMOTE, pSes=0000000000E055A0 rcv=0000000000000000 snd=0000000000000000 sesop=0000000000DF16D0 seserr=0000000000000000
    1870 2854 12/09 23:32:32.504 18 dsmsendersubtaskbase.cpp(188) [0000000000DB8D10] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CDsmSubTaskBase::Cancel, hCancelReason: 0xdf52a0, subtask state: 3
    1870 2854 12/09 23:32:32.504 18 dsmsendersubtaskbase.cpp(227) [0000000000DB8D10] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL Cancel received in CLEANUP state
    1870 2854 12/09 23:32:32.504 31 readdatasetfixupsubtask.cpp(413) [0000000000DB8870] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CReadDatasetFixupSubTask::CleanUp [0000000000DB8870]
    1870 2854 12/09 23:32:32.504 31 datasetfixupsubtaskbase.cpp(361) [0000000000DB8870] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CDatasetFixupSubTaskBase::CleanUp [0000000000DB8870]
    1870 2854 12/09 23:32:32.504 18 perftimer.cpp(288) [0000000000DF0CB0] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CCInstr: Lifetime for Subtask: LWVSenderSubTask =         85980 sec, Trustable: TRUE         Time for Proc: 15264 sec, trustable: TRUE         Time for Proc_Specific: 0 sec, trustable: TRUE         Time for IO: 13903 sec, Trustable: TRUE         Time for IO_Specific: 0 sec, Trustable: TRUE         Time for Network: 70716 sec, Trustable: TRUE
    1870 2854 12/09 23:32:32.504 18 dsmsubtaskbase.cpp(86) [0000000000DB8D10] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CDsmSubTaskBase: destructor [0000000000DB8D10]
    1870 2854 12/09 23:32:32.504 18 dsmsubtaskbase.cpp(86) [0000000000DF1030] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CDsmSubTaskBase: destructor [0000000000DF1030]
    1870 2854 12/09 23:32:32.504 18 filefixupreaderhelper.cpp(71) [0000000000E10C30] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CFileFixupReaderHelper, ~CFileFixupReaderHelper
    1870 2854 12/09 23:32:32.504 18 perftimer.cpp(288) [0000000000DF60B0] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CCInstr: Lifetime for Subtask: FixupSenderSubTask =         85980 sec, Trustable: TRUE         Time for Proc: 83216 sec, trustable: TRUE         Time for Proc_Specific: 1623 sec, trustable: TRUE         Time for IO: 42983 sec, Trustable: TRUE         Time for IO_Specific: 0 sec, Trustable: TRUE         Time for Network: 2764 sec, Trustable: TRUE
    1870 2854 12/09 23:32:32.504 18 dsmsubtaskbase.cpp(86) [0000000000DF1450] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CDsmSubTaskBase: destructor [0000000000DF1450]
    1870 2854 12/09 23:32:32.504 31 aasubtask.cpp(906) [0000000000DB8870] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING <?xml version="1.0"?>
    1870 2854 12/09 23:32:32.504 31 aasubtask.cpp(906) [0000000000DB8870] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING <Status xmlns="http://schemas.microsoft.com/2003/dls/StatusMessages.xsd" StatusCode="-2137453999" Reason="Error" CommandID="RAReadDatasetFixup" CommandInstanceID="51751b11-9e35-487d-a781-db34aabf3ed8" GuidWorkItem="53eaca2b-d6b3-4df6-91bc-e6e8b7ca1829" TETaskInstanceID="84ec8f69-b6bb-42f0-8cad-16487dc65e1e"><ErrorInfo xmlns="http://schemas.microsoft.com/2003/dls/GenericAgentStatus.xsd" ErrorCode="998" DetailedCode="-2137453999" DetailedSource="2"/><RAStatus><RAReadDatasetFixup xmlns="<LWVStatus">http://schemas.microsoft.com/2003/dls/ArchiveAgent/StatusMessages.xsd"><LWVStatus BytesTransferred="242475614" NumberOfFilesTransferred="2158080" NumberOfFilesFailed="0" DataCorruptionDetected="false"/><FixupStatus BytesTransferred="738547138560" NumberOfFilesTransferred="496173" NumberOfFilesFailed="0" DataCorruptionDetected="false"/></RAReadDatasetFixup></RAStatus></Status>
    1870 2854 12/09 23:32:32.520 03 runtime.cpp(1376) [0000000000D54320] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E FATAL Subtask failure, sending status response XML=[<?xml version="1.0"?>
    1870 2854 12/09 23:32:32.520 03 runtime.cpp(1376) [0000000000D54320] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E FATAL <Status xmlns="http://schemas.microsoft.com/2003/dls/StatusMessages.xsd" StatusCode="-2137453999" Reason="Error" CommandID="RAReadDatasetFixup" CommandInstanceID="51751b11-9e35-487d-a781-db34aabf3ed8" GuidWorkItem="53eaca2b-d6b3-4df6-91bc-e6e8b7ca1829" TETaskInstanceID="84ec8f69-b6bb-42f0-8cad-16487dc65e1e"><ErrorInfo xmlns="http://schemas.microsoft.com/2003/dls/GenericAgentStatus.xsd" ErrorCode="908" DetailedCode="-2137453999" DetailedSource="2"><Parameter Name="AgentTargetServer" Value="SSECFSN01.mydomain.net"/></ErrorInfo><RAStatus><RAReadDatasetFixup xmlns="<LWVStatus">http://schemas.microsoft.com/2003/dls/ArchiveAgent/StatusMessages.xsd"><LWVStatus BytesTransferred="242475614" NumberOfFilesTransferred="2158080" NumberOfFilesFailed="0" DataCorruptionDetected="false"/><FixupStatus BytesTransferred="738547138560" NumberOfFilesTransferred="496173" NumberOfFilesFailed="0" DataCorruptionDetected="false"/></RAReadDatasetFixup></RAStatus></Status>
    1870 2854 12/09 23:32:32.520 03 runtime.cpp(1376) [0000000000D54320] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E FATAL ]
    1870 2854 12/09 23:32:32.551 18 dsmsendersubtaskbase.cpp(157) [0000000000DF1450] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING Failed: Hr: = [0x80072746] : Encountered Failure: : lVal : OnSessionClosed(dwNumberOfBytes, pAgentOvl, dwError)
    1870 2854 12/09 23:32:32.551 18 dsmsendersubtaskbase.cpp(278) [0000000000DF1450] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING Failed: Hr: = [0x80072746] : Encountered Failure: : lVal : ProcessWaitCompletion(dwNumberOfBytes, pAgentOvl, dwError)
    1870 2854 12/09 23:32:32.551 18 dsmsendersubtaskbase.cpp(157) [0000000000DB8D10] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING Failed: Hr: = [0x80072746] : Encountered Failure: : lVal : OnSessionClosed(dwNumberOfBytes, pAgentOvl, dwError)
    1870 2854 12/09 23:32:32.551 18 dsmsendersubtaskbase.cpp(278) [0000000000DB8D10] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING Failed: Hr: = [0x80072746] : Encountered Failure: : lVal : ProcessWaitCompletion(dwNumberOfBytes, pAgentOvl, dwError)
    1870 2854 12/09 23:32:32.551 18 hwvreceiversubtask.cpp(144) [0000000000DF1030] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING Failed: Hr: = [0x80990a51] : Encountered Failure: : lVal : m_pBuffersQueue->AddElement(pBuffer)
    1870 2854 12/09 23:32:32.551 18 dsmreceiversubtaskbase.cpp(445) [0000000000DF1030] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING Failed: Hr: = [0x80990a51] : Encountered Failure: : lVal : hr
    1870 2854 12/09 23:32:32.551 18 dsmreceiversubtaskbase.cpp(247) [0000000000DF1030] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING Failed: Hr: = [0x80990a51] : Encountered Failure: : lVal : OnReadCompleted(dwNumberOfBytes, pAgentOvl, dwError)
    1870 2854 12/09 23:32:32.551 18 dsmreceiversubtaskbase.cpp(198) [0000000000DF1030] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E WARNING Failed: Hr: = [0x80990a51] : Encountered Failure: : lVal : ProcessWaitCompletion(dwNumberOfBytes, pAgentOvl, dwError)
    1870 2854 12/09 23:32:32.754 31 freesnapshotsubtask.cpp(46) [000000000743BC50] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CFreeSnapshotSubTask: constructor [000000000743BC50]
    1870 2854 12/09 23:32:32.770 31 freesnapshotsubtask.cpp(123) [000000000743BC50] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CFreeSnapshotSubTask::TriggerDone [000000000743BC50]
    1870 2854 12/09 23:32:32.817 31 freesnapshotsubtask.cpp(562) [000000000743BC50] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CFreeSnapshotSubTask: StartBackupComplete [000000000743BC50]
    1870 2854 12/09 23:32:32.817 31 vssbaserequestor.cpp(1067) [0000000000D73640] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CVssBaseRequestor::StartGatherWriterStatus [0000000000D73640]
    1870 2854 12/09 23:32:34.825 31 vssbaserequestor.cpp(943) [0000000000D73640]  NORMAL QueryStatus returned 0x4230a, Releasing VssAsync [0000000001056670]
    1870 2854 12/09 23:32:34.825 31 freesnapshotsubtask.cpp(675) [000000000743BC50]  NORMAL CFreeSnapshotSubTask: ReleaseSnapshot [000000000743BC50]
    1870 2854 12/09 23:32:34.825 31 vssbaserequestor.cpp(1100) [0000000000D73640]  NORMAL CVssBaseRequestor::CheckWriterStatus [0000000000D73640]
    1870 2854 12/09 23:32:34.825 31 vsssnapshotrequestor.cpp(1472) [0000000000D73640]  NORMAL CVssSnapshotRequestor::ReleaseSnaphot [0000000000D73640]
    1870 2854 12/09 23:32:34.825 31 vsssnapshotrequestor.cpp(2148) [0000000000D73640]  NORMAL CVssSnapshotRequestor::CleanUp [0000000000D73640]
    1870 2854 12/09 23:32:34.825 31 vsssnapshotrequestor.cpp(1734) [0000000000D73640]  NORMAL CVssSnapshotRequestor::ReleaseVolumesForSnapshot [0000000000D73640]
    1870 2854 12/09 23:32:36.608 31 vsssnapshotrequestor.cpp(117) [0000000000D73640] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CVssSnapshotRequestor::~CVssSnapshotRequestor [0000000000D73640]
    1870 2854 12/09 23:32:36.608 31 vsssnapshotrequestor.cpp(2148) [0000000000D73640] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CVssSnapshotRequestor::CleanUp [0000000000D73640]
    1870 2854 12/09 23:32:36.608 31 vsssnapshotrequestor.cpp(1734) [0000000000D73640] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CVssSnapshotRequestor::ReleaseVolumesForSnapshot [0000000000D73640]
    1870 2854 12/09 23:32:36.608 31 vssbaserequestor.cpp(69) [0000000000D73640] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E NORMAL CVssBaseRequestor: destructor [0000000000D73640]
    1870 2854 12/09 23:32:36.608 03 workitem.cpp(272) [0000000000DB74D0] 84EC8F69-B6BB-42F0-8CAD-16487DC65E1E ACTIVITY WorkItem stopping

    Unfortunately the guy who administered DPM has fleed the company and it has silently been handover to me :-).

    Any ideas on how to properly troubleshoot this is much appreciated.

    Thanks!


    • Edited by BallieWallie Monday, December 11, 2017 4:41 PM type
    Monday, December 11, 2017 4:39 PM

All replies

  • What DPM version? What rollup?
    Wednesday, December 13, 2017 6:40 PM
  • Sorry, version is 2012 R2 Rollup Update 10.

    In the meanwhile the consistency check succeeded! I unchecked "Run a daily consistency check according the following schedule" which was set at 02:00AM and max duration of 480 hours (which it never reached) and relaunched the check. Don't know if that did the trick.

    I've started the check the day before yesterday. Yesterday afternoon it transferred more data than it had ever before in the last couple of weeks. Does a consistency check in fact continues where it left of or does it restart from the beginning if you restart the job?

    Grts.

    Thursday, December 14, 2017 10:55 AM
  • You are couple rollups behind, better update your DPM server and agents to a more up-to-date version before troubleshooting this any further.
    Thursday, December 14, 2017 10:08 PM