none
DPM: One of two Agents fails to make a checkpoint, error 41 is reported RRS feed

  • Question

  • We just setup a new DPM configuration with 3 systems:

    DMP server = W2K8-R2  on xxx.yyy.78.116 address

    Agent-1 running Windows 7, on xxx.yyy.78.149 address

    Agent-2 running W2K8-SP2, on xxx.yyy.78.113 address

    The Agent install log shows version:  SHIP UNICODE 4.05.6002.00

    Agent-1 runs checkpoints and restore fine.

    Agent-2 fails to run the 1st checkpoint with an error 41, "DPM failed to communicate with <server name> because the computer is unreachable.".

    Agent-2 seems to have complete network commectivity with the DPM server.  I traced the network packets on the failing Agent-2 and noticed 2 things:  the DPM server stars of looking ( ARP'ing) xxx.yyy.78.112, but after 5 tries uses xxx.yyy.78.113.  There are then many DCE RPC packets between the DPM server and Agent-2.  After a while, a Kerberos AP-REQ/AP-RSP goes out to a different server than what the working Agent-1 is using.  There is no obvious failure packet, but the data transfers never seem to start.

    How can I get more details on the root cause of this failure?

    Thanks,  Jim

    Thursday, February 3, 2011 11:27 PM

All replies

  • Hi,

    Please run ipconfig /flushdns followed by ipconfig /registerdns on the Agent-2 machine having the problem, then run a Consistency Check for that data source from the DPM Server - that has a good chance of fixing the issue.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Friday, February 4, 2011 2:56 AM
    Moderator
  • We found that Agent-2 had 2 DNS entries.  This happens sometimes because we re-load servers without releasing the ip-address.  So the DNS had two entries:  Agent-2 and Agtent-2-1.  We correcteted that problem and verified it running the nsloopup <ipv4-addr>  command.  We also ran the ipconfig /flushdns and ipconfig /registerdns commands.  The consistency check failed.  Then removed the share and re-added it incase it some how had saved an old address. Agent-2 still fails. Last error was Synchronization failed DPM error ID 51.  Any recommendations on what to try?

    Tuesday, February 8, 2011 5:19 PM
  • Jim,

        Can you check what the error message reads? and what is the recommended action? Can you also paste here? Is it DPM v3 2010?

    Thanks, GeethaKrishna [As is provided without warranties and confers no rights]

     

    Wednesday, February 9, 2011 6:00 AM
  • When I open the Management tab, it shows version 3.0.7696.0

    We deleted and created the protection group again.  We are selecting a very small folder on the C: drive ( ~20KBytes).  For 5-10 minutes the status is "Replica creation in progress"  why would this take so long?

    Then it reports success.  I then try to create a restore point.  This is what has been failing.

    What I noticed this time is that in the "Create recovery point" dialog, only the middle option, "Create recovery point without synchronizing" is available.  On the working system, all 3 options are available.  Does this indicate what the problem might be?

    Then if I select OK, the Errors Tab shows:

    Triggering create recovery point on C:\ failed:

    Error 97: Job failure on replica of C:\ on win106.labdom.xxx.com caused by ongoing consistency check.

    Recommended action: Cancel the operation, or wait for it to complete. Then retry the job.

    Under Monitoring : Alerts I see that it's reporting that the replica is inconsistent.  Now that I went back to the Proctection Tab, it also reports "Replica is inconsistent.". 

    I went to the Monitoring:Jobs tab and did a "Retry" of the Consistency Check.  It ran for 8minutes and then failed with:  DPM failed to communicate with win106.labdom.xxx.com because the computer is unreachable. (ID 41 Details: No such host is known (0x80072AF9))

     

    I don't think this matters, but will mention...

    We loaded this as an evaluation copy.  When I open the Management tab, it shows version 3.0.7696.0 .  It also shows that we need more licenses.  It does report the Agent status as Okay.

     The Update "DPM licenses" action is greyed out.

    Wednesday, February 9, 2011 9:54 PM
  • Hi,

    Lack of licensing will not cause this error. On the protected server, please open the c:\program files\microsoft data protection manager\dpm\temp\dpmracurr.errlog and search for the error code 0x80072AF9 for details.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Wednesday, February 9, 2011 10:03 PM
    Moderator
  • The log runs from Jan 21 until Feb 10th.  The most recent occurance of the 0x80072af9 error was on January 31st.  Another person did the original DPM setup, so I don't know what was being done at that time.  Here are a few error log etries around the most recent occurance of the error you asked about.

    15C0 17C0 01/31 16:25:38.686 31 basewriterhelperplugin.cpp(155) [00000000041800B0] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL SetOperationType(6)
    15C0 17C0 01/31 16:25:38.797 18 dsmsendersubtaskbase.cpp(47) [000000000175E560] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL CDsmSenderSubTaskBase: constructor [000000000175E560] openovl[000000000175E6C8] msgovl[000000000175E7B8] closeovl[000000000175E710]
    15C0 17C0 01/31 16:25:38.797 31 backupsubtask.cpp(53) [0000000001833230] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL CBackupSubTask: constructor [0000000001833230]
    15C0 17C0 01/31 16:25:38.797 31 basewriterhelperplugin.cpp(174) [00000000041800B0] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL SetSystemFlags(0)
    15C0 17C0 01/31 16:25:38.797 18 readeriterator.cpp(257) [0000000004182080] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL CFileReaderIterator::SetFileSystemFlags (0x1) called
    15C0 17C0 01/31 16:25:38.797 18 iteratorutils.cpp(637)  B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL CaseSensitivity NOT enabled on the machine
    15C0 17C0 01/31 16:25:38.797 31 filewriterhelperplugin.cpp(455) [00000000041800B0] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL CFileWriterBackupHelper::GetIterator Adding read path \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy1\4f1edb23-df43-4408-86fa-a8cd4ee51c92\Full\automation\ [00000000041800B0]
    15C0 17C0 01/31 16:25:38.797 18 readeriterator.cpp(108) [0000000004182080] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL CFileReaderIterator:AddIncludeFiles(filepath:\\?\Volume{5a5894a3-ba8e-11df-b03b-806e6f6e6963}\automation\, filespec:*, snapshotpath:\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy1\4f1edb23-df43-4408-86fa-a8cd4ee51c92\Full\automation\) called
    15C0 17C0 01/31 16:25:38.797 18 dsmsendersubtaskbase.cpp(625) [000000000175E560] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL Page size to be used: 4096
    15C0 17C0 01/31 16:25:38.797 18 dsmsendersubtaskbase.cpp(644) [000000000175E560] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL Failed to read MaxBufferSize from registry. Using default size: 262144
    15C0 17C0 01/31 16:25:38.798 31 backupsubtask.cpp(131) [0000000001833230] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL CBackupSubTask::TriggerDone [0000000001833230]
    15C0 17C0 01/31 16:25:38.798 31 backupsubtask.cpp(518) [0000000001833230] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL CBackupSubTask::BeginDataMove [0000000001833230]
    15C0 17C0 01/31 16:25:38.799 20 agentutils.hpp(68)  B71D6FF7-9A9F-4BDC-8240-197368F23130 WARNING Failed: Hr: = [0x80070002] : F: lVal : r.GetValue(pszKey, pT)
    15C0 17C0 01/31 16:25:38.799 20 destination.cpp(1442) [0000000004188100] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL DM: GetPreferredProtocolFamilyToConnect for WIN106 : 0, PingBeforeConnect : 0
    15C0 17C0 01/31 16:25:38.800 20 cc_base.cpp(913)  B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL DM: - Spotted QoS IPv6, Index:4
    15C0 17C0 01/31 16:25:38.800 20 cc_base.cpp(908)  B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL DM: - Spotted QoS IPv4, Index:5
    15C0 17C0 01/31 16:25:38.800 20 cc_base.cpp(1036)  B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL Created Socket with Family: 2, QoS Index: 5
    15C0 17C0 01/31 16:25:38.800 20 cc_base.cpp(761) [00000000041883F0] B71D6FF7-9A9F-4BDC-8240-197368F23130 NORMAL DM: - EndPoint:0000000000000000, Family: 2, Indx:5, Port:53373
    15C0 05EC 01/31 16:25:38.800 20 cc_extcalls.cpp(272) [00000000041883F0]  NORMAL Doing DNS Lookup for host:win106.labdom.stratus.com, Port: 5718, Address Family: 2
    15C0 05EC 01/31 16:25:38.800 20 cc_extcalls.cpp(292) [00000000041883F0]  NORMAL DM: - Name lookup without DNS failed
    15C0 05EC 01/31 16:25:38.801 20 cc_extcalls.cpp(338) [00000000041883F0]  WARNING Failed: Hr: = [0x80072af9] : DLS_ERROR_HOST_UNREACHABLE ssServerName win106.labdom.stratus.com
    15C0 05EC 01/31 16:25:38.801 20 cc_extcalls.cpp(344) [00000000041883F0]  WARNING DM: - DNS Lookup FAILURE for win106.labdom.stratus.com
    15C0 05EC 01/31 16:25:38.801 20 cc_base.cpp(1226) [00000000041883F0]  NORMAL DM: Aborting  http state machine for reason 8
    15C0 05EC 01/31 16:25:38.801 20 agentutils.hpp(68)   WARNING Failed: Hr: = [0x80070002] : F: lVal : r.GetValue(pszKey, pT)
    15C0 05EC 01/31 16:25:38.801 20 destination.cpp(1442) [0000000004188100]  NORMAL DM: GetPreferredProtocolFamilyToConnect for WIN106 : 0, PingBeforeConnect : 0
    15C0 05EC 01/31 16:25:38.801 20 cc_base.cpp(913)   NORMAL DM: - Spotted QoS IPv6, Index:4
    15C0 05EC 01/31 16:25:38.801 20 cc_base.cpp(908)   NORMAL DM: - Spotted QoS IPv4, Index:5
    15C0 05EC 01/31 16:25:38.801 20 cc_base.cpp(1036)   NORMAL Created Socket with Family: 2, QoS Index: 5
    15C0 05EC 01/31 16:25:38.802 20 cc_base.cpp(761) [000000000418C7D0]  NORMAL DM: - EndPoint:0000000000000000, Family: 2, Indx:5, Port:53374
    15C0 05EC 01/31 16:25:38.802 20 cc_base.cpp(1226) [000000000418C7D0]  NORMAL DM: Aborting  http state machine for reason 8
    15C0 05EC 01/31 16:25:38.802 20 session.cpp(1693) [0000000004187DF0]  NORMAL Hr: = [0x80072af9] DM: Will attempt to post DM_SESSION_ERROR, pSes=0000000004187DF0 rcv=0000000000000000 snd=0000000000000000 sesop=000000000175E6C8 seserr=0000000000000000
    15C0 17C0 01/31 16:25:38.802 18 dsmsendersubtaskbase.cpp(162) [000000000175E560]  WARNING Failed: Hr: = [0x80072af9] CDsmSenderSubTaskBase received session error completion in WAIT state
    15C0 17C0 01/31 16:25:38.802 18 dsmsubtaskbase.cpp(252) [000000000175E560]  WARNING Session error before data move completed
    15C0 17C0 01/31 16:25:38.802 18 dsmsendersubtaskbase.cpp(163) [000000000175E560]  WARNING Failed: Hr: = [0x80072af9] : F: lVal : OnSessionError(dwNumberOfBytes, pAgentOvl, dwError)
    15C0 17C0 01/31 16:25:38.802 18 dsmsendersubtaskbase.cpp(278) [000000000175E560]  WARNING Failed: Hr: = [0x80072af9] : F: lVal : ProcessWaitCompletion(dwNumberOfBytes, pAgentOvl, dwError)
    15C0 17C0 01/31 16:25:38.802 18 dsmsubtaskbase.cpp(277) [000000000175E560]  NORMAL Hr: = [0x80072af9] CDsmSubTaskBase::ErrorCleanup: subtask state: 1
    15C0 0BAC 01/31 16:25:38.807 31 backupsubtask.cpp(648) [0000000001833230]  NORMAL CBackupSubTask::CleanUp [0000000001833230]
    15C0 0BAC 01/31 16:25:38.808 31 backupsubtask.cpp(204) [0000000001833230]  NORMAL CBackupSubTask::GetFinalStatus [0000000001833230]
    15C0 0BAC 01/31 16:25:38.808 31 aasubtask.cpp(913) [0000000001833230]  WARNING <?xml version="1.0"?>

    Thursday, February 10, 2011 3:10 PM
  • Hi,

     

    OK, based on the log, we have DSN problems for the host named win106.labdom.stratus.com

    15C0 05EC 01/31 16:25:38.800 20 cc_extcalls.cpp(272) [00000000041883F0]  NORMAL Doing DNS Lookup for host:win106.labdom.stratus.com, Port: 5718, Address Family: 2
    15C0 05EC 01/31 16:25:38.800 20 cc_extcalls.cpp(292) [00000000041883F0]  NORMAL DM: - Name lookup without DNS failed
    15C0 05EC 01/31 16:25:38.801 20 cc_extcalls.cpp(338) [00000000041883F0]  WARNING Failed: Hr: = [0x80072af9] : DLS_ERROR_HOST_UNREACHABLE ssServerName win106.labdom.stratus.com
    15C0 05EC 01/31 16:25:38.801 20 cc_extcalls.cpp(344) [00000000041883F0]  WARNING DM: - DNS Lookup FAILURE for win106.labdom.stratus.com

    Please go to that host and run ipconfig /flushdns and ipconfig /registerdns - then check the dns server and make sure you only have a single entry for win106.labdom.stratus.com.


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Thursday, February 10, 2011 3:51 PM
    Moderator