none
DPM 2010 Secondary Server Stuck trying to cancel running jobs RRS feed

  • Question

  • All servers running 2008 R2 SP1. How can a running job be forcefully removed? Rebooting has not helped -

    DPM 2010 Primary and Secondary server are stuck with jobs that I can't cancel.

    DPM Primary server has a protected 2-node cluster, whose jobs have been running for 60+ hours. The management tab shows the agents as 'Attempting to connect' but nothing happens. The cancelled jobs have a status of 'Attempting to cancel'.

    In addition, the DPM Secondary server is stuck with 60+ hour synchronizations for 8 different protected servers without data transfer. The 8 jobs were cancelled, and have a status of 'Attempting to cancel'.

    After waiting a few hours, I have since rebooted the Primary DPM, Secondary DPM, Secondary DPM's SQL server, and an inactive node of the Cluster protected by the primary server.

    New jobs on the DPM servers are running just fine for the other protected servers, but failing for the mentioned ones with stuck 'Attempting to cancel' jobs. I rebooted the inactive cluster node, and DPM is still unable to communicate to the server. All firewall, agent installation, DCOM, WMI, and authentication have been tested.


    Monday, March 18, 2013 6:32 PM

Answers

  • Turns out there was an ios level change on our Cisco ASA Frewalls, and it enabled or changed the way the RPC Inspection functioned causing the firewall to interrupt the actual RPC / DCOM traffic.

    We went from ios level 8.2 up to 9.1.1 on two different pairs of Cisco ASA 5540 and Cisco ASA 5510 HA configured firewalls -- after the ios upgrade, the interfaces needed the DCERPC inscpection disabled on the global policy. On top of that, an additional access-list bypass extended permit ip  <source> <destination> was needed. 

    The overall diagnostics never pointed to any dropped or malformed traffic - as the firewall rules were set to IP ANY ANY - you would think the firewalls would allow all traffic. As with TMG firewalls, explicit RPC inspection needs to be disabled. Otherwise, as you can see above - and testing with Microsoft showed - that all configurations were good, all test applications work, all commands work, but the actual Agent communication will hang. The stuck state of attempting to cancel and attempting to connect was because the TCP Keep Alive packets were making it through the firewall.

    • Marked as answer by Kyle SA3 Monday, April 1, 2013 2:38 PM
    Monday, April 1, 2013 2:38 PM

All replies

  • Made a call into Microsoft, and the initial support member said that he had never seen it where an agent was stuck 'Attempting to Connect' as the status. It never fails, and the DPM server never initiates any network traffic (wireshark).

    Especially, he had not seen it where other agent communications were working just fine.

    I'll update this if there is a resolution from MS support.

    Thursday, March 21, 2013 1:43 PM
  • Turns out there was an ios level change on our Cisco ASA Frewalls, and it enabled or changed the way the RPC Inspection functioned causing the firewall to interrupt the actual RPC / DCOM traffic.

    We went from ios level 8.2 up to 9.1.1 on two different pairs of Cisco ASA 5540 and Cisco ASA 5510 HA configured firewalls -- after the ios upgrade, the interfaces needed the DCERPC inscpection disabled on the global policy. On top of that, an additional access-list bypass extended permit ip  <source> <destination> was needed. 

    The overall diagnostics never pointed to any dropped or malformed traffic - as the firewall rules were set to IP ANY ANY - you would think the firewalls would allow all traffic. As with TMG firewalls, explicit RPC inspection needs to be disabled. Otherwise, as you can see above - and testing with Microsoft showed - that all configurations were good, all test applications work, all commands work, but the actual Agent communication will hang. The stuck state of attempting to cancel and attempting to connect was because the TCP Keep Alive packets were making it through the firewall.

    • Marked as answer by Kyle SA3 Monday, April 1, 2013 2:38 PM
    Monday, April 1, 2013 2:38 PM