none
DPM agents lose connection to the DPM server RRS feed

  • Question

  • Hi,

     

    I have DPM 2010 which im using to backup different domains.

    Agents are lossing connection to the DPM server on daily basis. And i have to run install agent again from DPM server and if that will not work then i will i have to run setdpmserver.exe from the protected server and then run the install agent from the DPM server to re-establish the agent-server DPM connection.

    It happen many times per day. So i wounder if you know the cause and the solution for this problem?

    // Laith.

    Monday, November 15, 2010 11:59 AM

All replies

  • Hi Laith,

    could you explain a little further pls.?
    Are you saying you have agents that are already protecting something, then some communication error and then all over sudden the agent does not belong to the DPM server anymore and/or DPM server thinks no agent is installed?
    (because you refer to "re establish" and 'connection' seems more meant as agent/server relationship than network communication?)

    -or-
    Are you referring to installing and protecting the first time which fails due to communication errors?

    In general DPM server/agent/domain-controllers/DNS must be 'well connected' and able to maintain proper communication.


    \R2 This posting is provided "AS IS" with no warranties, and confers no rights
    Sunday, November 21, 2010 11:18 AM
  • Hi Ruud,

    I have a DPM 2010 that is protecting several untrusted servers.

    I'm facing a problem that when i go to the agent tab on DPM i see error that the agent is unavailable.

    I have to run setdpmserver.exe from the protected server and then attach the agent one more time at the DPM site.

    This is happening on a daily basis on different servers.

    P.S: most of the servers that im losing connection to is Vm mashines on a cluster.

    Any idea about the cause?

    Tuesday, December 7, 2010 6:27 AM
  • setdpmserver/attach should only be done/needed once.

    Agent "unavailable" indicates a generic connectivity problem.
    Right click refresh should do same, does that work as well?

     


    \R2 This posting is provided "AS IS" with no warranties, and confers no rights
    Tuesday, December 7, 2010 10:18 AM
  • The refresh does not work.

    Sometimes it works with attaching the server only from DPM and sometimes i have to run the setdpmserver.exe from the protected server and then attach it on the DPM

    Tuesday, December 7, 2010 12:32 PM

  • The only thing I can think of that could make sense is some policy resetting firewall rules that get defined again through setdpmserver/attach.

     


    \R2 This posting is provided "AS IS" with no warranties, and confers no rights
    Tuesday, December 7, 2010 12:41 PM
  • What i thought about is that since the servers that im protecting are VM. And i'm using CSV and they might be migrated from one node to another... Is that might be a reason for DPM to lose connection with the agents?
    Tuesday, December 7, 2010 1:30 PM
  • No, that shouldn't be an issue. Is the issue still seen?

    Did you check the network health of the DPM and the VM being protected when the issue is seen?

    Is the issue seen only on VMs on that particular CSV Cluster? Or Is the issue seen only for agents on the DPM Server?

    Thanks, GeethaKrishna [As is provided without warranties and confers no rights]

    Wednesday, February 9, 2011 8:35 AM
  • Hello,

    Does this problem still exists?

    Thanks
    Shane
    Monday, April 25, 2011 12:25 PM
    Moderator
  • Hi,

     

    The problem is still exist and not with one DPM server but with many. Ive been trying to find out what is the problem but i couldnt.

    Is there any program that can check the connectivity between the dpm agent and the protection server?

    Any suggestions?

    // Laith.

    Friday, July 1, 2011 5:09 AM
  • Hello,

    You may have seen this in other responses by me already but overall general connectivity tests are as follows.


    From protected server to the DPM server
     ********************************
     ping <protected server name>  <---succeed or fail
     net view \\<protected server name>  <---succeed or fail
     Sc \\<protected server name> query  <---succeed or fail
     Wmic /node:"<protected server name>" OS list brief   <---succeed or fail

    From the DPM server to the protected server
     ************************************
     ping <protected server name> <---succeed or fail
     net view \\<protected server name> <---succeed or fail
     Sc \\<protected server name> query <---succeed or fail
     Wmic /node:"<protected server name>" OS list brief <---succeed or fail


    When this issue occurs do any of these tests fail?
    If you believe that the migration of the CSV may be causing an issue then this can be easily tested. Check out the agent status via DPM. Migrate the guest and see if the communication is broken again.
    Do you see any 80042308 errors?


    Thanks,
    Shane

     

     

    Tuesday, July 5, 2011 1:32 PM
    Moderator
  • I tested one agent that lost the contact with the protected server and below are the results.

     

    From protected server to the DPM server
    ********************************
    ping <protected server name> <---succeed


    net view \\<protected server name>  

    System error 5 has occurred.

    Access is denied.


     Sc \\<protected server name> query 

    [SC] OpenSCManager FAILED 5:

    Access is denied.


     Wmic /node:"<protected server name>" OS list brief <---succeed or fail

    Node - DPMSERVER
    ERROR:
    Description = Access is denied.

     

    From the DPM server to the protected server
    ************************************
    ping <protected server name> <---succeed


    net view \\<protected server name>

    System error 5 has occurred.

    Access is denied.


    Sc \\<protected server name> query'

    [SC] OpenSCManager FAILED 5:

    Access is denied.


    Wmic /node:"<protected server name>" OS list brief

    Invalid Global Switch.

     

    I migrate the server and nothing went wrong with the communication.

    What happend is that after i ran the above test the communication with agent is back again.

     

    Wednesday, July 6, 2011 11:24 AM
  • Hello,

    So do you also get "access denied" when things are working well? Or do you get a positive responses?


    Thanks
    Shane
    Friday, July 8, 2011 10:17 PM
    Moderator
  • access denied.

    Monday, July 11, 2011 10:24 AM
  • Hello,

    Your statement of "What happend is that after i ran the above test the communication with agent is back again.", is odd in itself.  I am leaning more toward a networking issue.
    Are you experienced in taking and analyzing netmon captures? When you get into this failed state, take a simultaneous capture on both the DPM server and the target server having the failure. Go to DPM and perform a refresh of the agent.  Once done, look at the captures and see if the traffic is making it there.

    Also, look at the Windows firewall logs and see if you are logging any dropped communication at this time.
    You will have to enable logging on the servers though.

    - Configure the Windows Firewall Log
    http://technet.microsoft.com/en-us/library/cc947815(WS.10).aspx

    Question: Do machines in the same subnet as the DPM server also see this behavior?

    Thanks
    Shane

    Monday, July 11, 2011 1:01 PM
    Moderator
  • Hi shane,

     

    Unfortunally i didnt use netmon before. I will google it and see how it works.

    The wíndows firewall is disabled in both servers.

    All the machines that i'm protecting are in untrusted domain.

     

    Thanks

    Laith.

    Thursday, July 14, 2011 5:39 AM
  • Hello,

    If they are in an untrusted domain, then the following needs to be realized. The following is not supported:
    http://technet.microsoft.com/en-us/library/ff634170.aspx
    • DMZ or perimeter network machine protection
    • Clustered servers (except for Exchange Server 2010)
    • Mirrored servers (SQL)
    • Microsoft SharePoint
    • Laptop
    • End-user recovery
    • DPM – DPM Disaster recovery
    • Bare Metal Restore

    You must perform an attach via : http://technet.microsoft.com/en-us/library/ff399479.aspx
    In addition, name resolution needs to be as solid as possible between the domains. How is your DNS setup to provide this?

    Thanks,
    Shane
    Thursday, July 14, 2011 5:24 PM
    Moderator
  • Hi Shane,

     

    The DPM is protecting servers in untrusted domain. Some are DMZ.

    The problem is not with if DPM can or cannot protect the servers because i already added the servers to the DPM and already protecting them but the problem is that now and then i see that the agent on the protected server lost the connection with DPM server. To have over than 200 Servers to protect that needs a full time job just to check the agent status and i dont think that this is doable. I wonder if you have any suggestion to know the cause of the problem?

    Thanks,

    Laith.

    Tuesday, July 19, 2011 5:31 AM
  • Hello,

    At the time of the occurance take a netmon trace on the DPM server and a protected server.  Perform a refresh of the agent and see if the traffic is making it that far.  If the traffic is never recieved, then you have an device in the between (router, switch, firewall) that is having issues (flooded ports and dropping packets, routing issues etc...). 

    A capture of what is going on the wire is a good way to rule out a connectivity issue.  If you see nothing in the firewall logs or event logs, which you usually would if you are getting "access denied", then most likely it's a connectivity issue. 

    Other than a capture, I am curious...if you restart the protected machine does it make a difference? If you restart the VM host, does it make a difference? OR if you restart the DPM server does it make a difference?

    Thanks
    Shane
    Tuesday, July 19, 2011 2:40 PM
    Moderator
  • Hi shane,

    Let me add the following. The traffic is there when the dpm agent lose the connection to the DPM server. You can ping both sides.

    I installed netmon on the protected server and started to capture the connection.

    The DPM agent status is "error" (No mapping between account names and security IDs was done). I refreshed the agent and still no trafic. When i ran the add agent in the DPM server then comes the connection.

    The netmon show Unavailable trafic (IPV4 DPM server IP)

    TCP

    MSRPC IObjectExporter DCOM

    TCP

    MSRPC IRemoteSCMActivator DCOM

     

    Unfortunatelly im not expert using netmon so Any suggestions moving forward?

    // Laith.

    Tuesday, July 19, 2011 4:01 PM
  • and yes. restarting the server and re-attach the agent solve the problem most of the time.
    Wednesday, July 20, 2011 5:01 AM
  • Did you ever resolve this ongoing issue?  We have been experiencing the same thing.  1 DPM 2010 Server, Untrusted agents "disconnect" randomly (perhaps 1 server out of 20 per week), then we have to re-install the agent.  Very frustrating!
    Friday, July 22, 2011 3:05 PM
  • Hello,

    I'd suggest you try using the FQDN or -ProductionServerDnsSuffix if multiple DNS suffixes in use.  Does this work instead of relying on NetBIOS?
    http://technet.microsoft.com/en-us/library/ff399479.aspx

    Also, what is the "maximum password age (days)" policy set to for the local user account? If this interval is being reached you should see an event as below:

    ******************
    Log Name: Security
    Source: Microsoft-Windows-Security-Auditing
    Date: 8/12/2011 1:18:48 AM
    Event ID: 4625
    Task Category: Logon
    Level: Information
    Keywords: Audit Failure
    User: N/A
    Computer: "computername"
    Description:
    An account failed to log on.

     

    Subject:
    Security ID: NULL SID
    Account Name: -
    Account Domain: -
    Logon ID: 0x0

    Logon Type: 3

    Account For Which Logon Failed:
    Security ID: NULL SID
    Account Name: "username"
    Account Domain: DPM01

    Failure Information:
    Failure Reason: The specified account's password has expired.
    Status: 0xc000006e
    Sub Status: 0xc0000071

     

    Process Information:
    Caller Process ID: 0x0
    Caller Process Name: -

    Network Information:
    Workstation Name: "DPM servername"
    Source Network Address: fe80::14af:e6a1:24ea:add8
    Source Port: 55043

    Detailed Authentication Information:
    Logon Process: NtLmSsp
    Authentication Package: NTLM
    Transited Services: -
    Package Name (NTLM only): -
    Key Length: 0
    **********************



    Thanks
    Shane

    Friday, July 22, 2011 6:00 PM
    Moderator
  • Hi shane,

    I was tracing the connection and the log in the firewall and i found out that the TCP ports range was not up tp the max 65535.and i found that there were some drop packages over the range specified in the firewall and that might explain the random in losing the connection.

    i change it and i will keep an eye and see if that will solve the problem.

    Sunday, July 31, 2011 6:48 AM
  • Hello,

    Will you be able to either:

    a.) Increase the port range on the firewall
    b.) bypass the firewall
    c.) Use:  154596 How to configure RPC dynamic port allocation to work with firewalls  http://support.microsoft.com/default.aspx?scid=kb;EN-US;154596   to restrict the ports in use by the server.  There are two points to remember though.
         1.) This method, however, can have negative affects if the range chosen is not enough.  Example: You set it up to allow a range of 2000-2500 and everything works fine. You add extra  
              machines on the domain and\or install a few applications that require RPC connections.  You can easily exhaust that range.
        2.) If you use this article you may have to use it on both the DPM server side and the target server as well.

    Given this, most people opt to NOT restrict the RPC port range on the server side.  If you do go this route, I'd highly suggest you test it out in your test lab.


    Thanks

    Shane

    Friday, August 5, 2011 12:18 PM
    Moderator
  • Hello,

    Since you increased the RPC range on the firewall do you still see the problem?


    Thanks,
    Shane
    Saturday, August 6, 2011 3:44 PM
    Moderator
  • Yes.

    it was also because some of the accounts both in DPM servers and protected servers was expired, changed next time login. so im changing them to never expired and we will see.

    Thursday, August 11, 2011 11:13 AM
  • As it seems that both increase the RPC range to the max and Make sure that both accounts on DPM servers and protected server are set to "Password never expired" solved the problem.
    • Proposed as answer by Laith_IT Wednesday, August 24, 2011 5:28 AM
    Wednesday, August 24, 2011 5:28 AM
  • Its good as well to check the firewall on the protected server.

    When DPM made the exception on the windows firewall, as it seems that these changes might change according to the UDP port that is using in the meantime. And that will be a problem when you are using IPsec tunnels. The tunnel might be idle while you the DPM want to communicate in the old UDP.

    So if you are using an external firewall to control all your communication i recommend to disable windows firewall.

    Friday, September 2, 2011 5:43 AM
  • It was this password expiry that caught me by surprise as well.   The errors indicate that RPC is failing, but we look to firewall/network connection problems and not actual account problems.  So, anyone else who has a DPM agent (regardless of version, 2007 2010 2012) check the accounts used for communication and ensure their passwords are not expired!  Otherwise, you'll encounter RPC errors and agents that are unavailable or offline.
    Monday, August 20, 2012 8:59 PM