none
Protection of one volume on client consistently fails with 'Replica is inconsistent' RRS feed

  • Question

  • Hi

    Running DPM 2012 on 2012 server. One of the protected clients is a server 2003. This has multiple disk volumes. All the volumes are successfully protected apart from one volume which seems to consistently fail, reporting 'replica is inconsistent'. I've tried re-running the backup but this keeps failing. I cannot see what seems to eb the issue at either the client or server side. Does anybody have any pointers as to how i can get to the bottom of the issue ?.

    Additional info

    There seems to be enough free on this particular volume for snapshots.

    The error in dpm manager is listed as Internal error code 0x8099090E, the agent is not responding, though it seems to work for other volumes on the same server.


    Monday, January 27, 2014 2:26 PM

All replies

  • I suggest starting at eventlog on that server and look for error events. one cant magically guess whats the error :)
    Monday, January 27, 2014 3:39 PM
  • No errors in the event log on the client (protected server) side. And i'm just looking at ways to investigate the error. As far as i'm aware DPM just uses ntbackup on the client side. Is there a way of getting hold of these logs anybody know ?.

    Some more additional info. That particular volume is a dynamic volume - could this be an issue ?.
    Monday, January 27, 2014 3:46 PM
  • Hi,

    DPM does not utilize Ntbackup for volume protection, only for system state.  The error seems to be a timeout error, so maybe the agent is crashing.    You can look at the DPMRA*.errlog on the protected server under C:\Program files\Microsoft Data Protection Manager\DPM\temp and see if you can see any meaningful errors when then the problem occurs.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Monday, January 27, 2014 11:35 PM
    Moderator
  • Hi Mike

    Yes i've been looking at that error log whichi s why i was hoping there were better ways of tackling the issue. The error log is not easy to understand (to say the least) and i'm having a lot of trouble just tryig to find the correct part of the log that is relevant to the error. Do you have any pointers for me about how to use the log for troubleshooting ?

    One thing i have noticed is that if i force a re-sync it does this ok, but after a period of time (not sure but longer than 30 mins) the error re-appears (it's set to sync evry 15 mins).

    Regards

    Roger


    Tuesday, January 28, 2014 2:12 PM
  • Hi,

    All of the times in the logs are GMT time, so you would need to add / subtract the time for your local time zone.  So look for problems within a few minutes of the failed job.

    If you have any .crash logs (IE: DPMRA##.errlog.YYYY-MM-DD_HH-MM-SS.Crash) around the time of the failure, then that means dpmra is crashing and you will need to open a support case to find out why.

    If it isn't crashing, you can enable verbose logging and see if it always fails on a certain file or directory.

    COLLECT DPM VERBOSE LOGS 
    =======================

    Make sure no jobs are active or scheduled to run while doing these steps.

    To enable full VERBOSE logging add the following on both the protected server and the DPM Server:

    HKLM\Software\Microsoft\Microsoft Data Protection Manager
    add a DWORD value named TraceLogLevel
    set it to 43e in hexadecimal

    Stop the DPM services you want to enable verbose logging for. (dpmra, dpmla, DPM AccessManager Service, MSDPM service).
    Delete all the old logs. (BE CAREFUL NOT TO DELETE THE MTA FOLDER)
    Now reproduce the problem / error.

    The DPM log files will now contain MUCH MORE logging to help troubleshoot issues.

    DPM SERVER 2010 (or if upgraded to DPM 2012) Logs are in the C:\Program files\Microsoft DPM\DPM\temp folder.
    DPM SERVER 2012 and Sp1 logs are in the C:\Program Files\System Center 2012\DPM\DPM\temp folder.
    DPM SERVER 2012 R2 logs are in the C:\Program Files\System Center 2012 R2\DPM\DPM\temp folder.
    PROTECTED SERVER logs are always in C:\Program files\Microsoft Data Protection Manager\DPM\temp


    When done, be sure delete or rename the TraceLogLevel registry setting and restart the DPM services so normal (non-Verbose) logging resumes.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Tuesday, January 28, 2014 3:13 PM
    Moderator
  • Mike

    Thanks, this gives me something to work with. Not sure if the cleint dm is crashing. I know there are crash logs in the folder but not sure how recent they are. I'll take a look and report back. If the client is crashing you say to open a support case - i'm assuming you mean with MS PSS ? - would this be treated as chargeable do you know ?

    Roger

    Wednesday, January 29, 2014 11:45 AM
  • Hi,

    If DPMRA is crashing, and you need to open a support ticket to help diagnose why it's crashing, it will not be chargeable if it turns out to be a Microsoft code defect.

    Although I don't believe it's related to your issue you mentioned at the beginning that you are running DPM 2012 on 2012 server.  I hope you are running DPM 2012 SP1 as that is the first version that supported running on server 2012.  See announcement. In any case, be sure you have the latest DPM update rollup installed and agents updated so you are running latest fixes before opening a support ticket.


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Wednesday, January 29, 2014 3:29 PM
    Moderator
  • Mike

    So i'm running DPM 2012 SP1 (just to confirm), there is no current crash log but there are error logs. Is it ok to post an excerpt of an error log ?. The error for the sync kicks in at 9.20 (timed in the logs) and looking at this there are warnings and errors around the 9.23 mark so:

    at 9.20 get this

    1030    17A8    01/30    09:20:14.549    20    destination.cpp(1283)    [00E5EDE0]        NORMAL    DM: Doing Connection TimeOut for  Destination 00E5EDE0: connection 00E9D908 m_dwLastCompletionTime: 4, m_bUseLongDMConnectionTimeOut: 0, Diff: 301
    1030    13FC    01/30    09:20:14.549    20    cc_extcalls.cpp(517)    [00E9D908]    4FCE162C-174D-44C3-8557-7A45EE8459C9    WARNING    DM: TempErr: err=0x40 read=1 write=0
    1030    17A8    01/30    09:20:14.549    20    destination.cpp(1283)    [00E60020]        NORMAL    DM: Doing Connection TimeOut for  Destination 00E60020: connection 00EA3FE8 m_dwLastCompletionTime: 4, m_bUseLongDMConnectionTimeOut: 0, Diff: 301
    1030    17A8    01/30    09:20:14.549    20    destination.cpp(1283)    [00E5F028]        NORMAL    DM: Doing Connection TimeOut for  Destination 00E5F028: connection 00EC4BF8 m_dwLastCompletionTime: 4, m_bUseLongDMConnectionTimeOut: 0, Diff: 301
    1030    06F8    01/30    09:20:14.549    20    cc_extcalls.cpp(517)    [00EC4BF8]    FCD18731-BD16-4B6C-A5FD-2391FEA85750    WARNING    DM: TempErr: err=0x40 read=1 write=0
    1030    17A8    01/30    09:20:14.549    20    destination.cpp(1283)    [00E8F128]        NORMAL    DM: Doing Connection TimeOut for  Destination 00E8F128: connection 00ECC218 m_dwLastCompletionTime: 4, m_bUseLongDMConnectionTimeOut: 0, Diff: 301

    and at 9.23 it develops into this:

    13F8    1698    01/30    09:22:02.032    03    runtime.cpp(1376)    [008E76C0]    9EF6428F-CACF-40B8-99F0-1C0CE8755B37    FATAL    Subtask failure, sending status response XML=[<?xml version="1.0"?>
    13F8    1698    01/30    09:22:02.032    03    runtime.cpp(1376)    [008E76C0]    9EF6428F-CACF-40B8-99F0-1C0CE8755B37    FATAL    <Status xmlns="http://schemas.microsoft.com/2003/dls/StatusMessages.xsd" StatusCode="-2137454160" Reason="Error" CommandID="RAGetWorkItemInfo" CommandInstanceID="4834a0b7-9a26-43f8-954b-984c592c3797" GuidWorkItem="309d147c-9912-4b2e-a306-3445ccc868a3" TETaskInstanceID="9ef6428f-cacf-40b8-99f0-1c0ce8755b37"><ErrorInfo xmlns="http://schemas.microsoft.com/2003/dls/GenericAgentStatus.xsd" ErrorCode="536872913" DetailedCode="-2137454160" DetailedSource="2"><Parameter Name="AgentTargetServer" Value="servername"/></ErrorInfo></Status>
    13F8    1698    01/30    09:22:02.032    03    runtime.cpp(1376)    [008E76C0]    9EF6428F-CACF-40B8-99F0-1C0CE8755B37    FATAL    ]
    13F8    1698    01/30    09:22:02.032    29    radefaultsubtask.cpp(360)    [00E5E278]    9EF6428F-CACF-40B8-99F0-1C0CE8755B37    WARNING    Failed: Hr: = [0x809909b0] : CRADefaultSubTask: WorkitemID does not exist, {309D147C-9912-4B2E-A306-3445CCC868A3}
    13F8    1698    01/30    09:22:02.032    05    defaultsubtask.cpp(546)    [00E5E278]    9EF6428F-CACF-40B8-99F0-1C0CE8755B37    WARNING    Failed: Hr: = [0x809909b0] : Encountered Failure: : lVal : CommandReceivedSpecific(pCommand, pOvl)
    13F8    1698    01/30    09:22:02.032    05    defaultsubtask.cpp(751)    [00E5E278]    9EF6428F-CACF-40B8-99F0-1C0CE8755B37    WARNING    Failed: Hr: = [0x809909b0] : Encountered Failure: : lVal : CommandReceived(pAgentOvl)
    13F8    1698    01/30    09:23:01.635    03

    Have you seen this before ?, As mentioned the initial backup seems to work it's just the resulting sync's that seem to fail.

    One further thing that may be relevant is that the volume was re-sized just before we started getting these errors so i'm assuming that this may be connected.


    Thursday, January 30, 2014 10:09 AM
  • Hi,

    No new revelation from the dpm side logs, it just details what we already know, the communications is timing out.

    May need a network trace to see what is occurring. Have you tried removing that problematic volume from protection then re-protecting it to see if that makes a difference.  


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.

    Monday, February 3, 2014 7:33 PM
    Moderator
  • Mike

    That is exactly what i tried and weirdly it has worked !. I removed the protection for the volume, ticked the option to remove on disk data and for tape also. re-applied the protection and it has now been working ok since last friday. All i can think was going on was that it had a old config for the volume and deleteing the backup removed this and allowed the correct one to be seen, though i could be way off on this though. Anyway it seems to be working again. Thanks for the help.

    Regards

    Roger

    Tuesday, February 4, 2014 1:04 PM