none
DPM has detected that the cluster configuration has changed for resource group SQL and has marked the replica inconsistent (30209) RRS feed

  • Question

  • Hello,

    About a month ago we experienced the following event on DPM and marked al our SQL databases replicas of our SQL cluster inconsistent. "DPM has detected that the cluster configuration has changed for resource group SQL and has marked the replica inconsistent" - error 30209.

    We did not change anything on the configuration of the cluster and there have been no cluster events or failovers. At the time we did a quick check in the eventlog but could not find anything strange which could lead to the event. The "incident" was logged and after a consistency check everything was working as before. Until last sunday night where it happened again. Because the SQL node has 553 SQL databases, the consistency check takes a long time and we are missing quite a lot of recovery points.

    We have searched on DPM errorcodes and elsewhere, the error is documented but never a solution of a hint of what is triggering this.

    We are using DPM 2010 on a Windows 2008 R2 server, the SQL Server node is Windows 2003 R2 with SQL 2005 SP3.

    Two things which can be related to the issue:

    1. The cache battery of the internal Arraycontroller died a view days ago, wating for replacement. This is however only for the SAS RAID1 bootdisk, the SQL databases are on a iSCSI SAN.

    2. In September last year, the SQL cluster was moved to to current DPM server. However the old DPM server still exists as "Unprotected Computer" and the Protection Agent om the DPM server was disabled. I was getting DCOM errors on the cluster with the account of the old DPM server. After removing the clusternodes from the old DPM server with "Remove-ProductionServer.ps1" the DCOM errors didn't occur anymore.

    Anybody seen the DPM error 30209 in the field and has a resolution for it?

    Marcel

    Thursday, March 31, 2011 2:43 PM

All replies

  • After a failover (at night) of the SQL cluster Group to another node the DPM backup is stable again and we got no errors at all during the day. We replaced during the day the BBWC battery from the arraycontroller and booted the clusternode. After a fallback the following night everything was working as expected including DPM. The first synchronisation job at 06:00 was succesful.

    At 10:00 however (we sync every 4 hours with a 2 hour offset) 51 (of 553) databases failed with the same error "DPM has detected that the cluster configuration has changed for resource group SQL and has marked the replica inconsistent - error 30209". After 15 minutes the job is automatically tried a second time and triggers a recovery check om the 51 databases.

    Marcel



    Wednesday, April 6, 2011 8:03 AM
  • In the meantime we fully patched the clusternodes and the DPM with rollup 2 (3.0.7707.0), since i guess this would be the first question asked when we call MS Support. The problem went away for 2 days but came finally back again. When i logged in at the console of the clusternode i saw an error: 

    DPMRA.exe Application Error:
    The Exception unknown software exception (0xc0000417) occurred in the application at location 0x786752d4

    Nothing logged in the event log and i looked at the DPMRACurr.errlog but could not make anything of the cryptic errors all over the log.

    When i searched the internet i did find others which had also the same error with the DPM Agent on a Windows 2003 server, but the errors just dissapeared or did not have a solution: http://social.technet.microsoft.com/Forums/en-US/dpmsetup/thread/c9195a0b-71d7-490e-ab8d-a557f3af3b2a/

    I guess the next step is opening a case while others have also....

     

    Friday, April 15, 2011 3:37 PM
  • FYI - Sorry for the delay in posting this.

    More information can be found on the technet article: http://technet.microsoft.com/en-us/library/ff399290.aspx  Search for error 30209 and the following has been added under community content.

     

    Possible cause solution:
    =================


    DPM queries active directory to determine which node is active and due to network / dns problems, the query fails. This causes all resouces to be marked inconsistent and a CC must be performed.

    Look for similar errors in the dpm error logs:


    <snip>
    05/20 08:12:08.894 03 machinename.cpp(248) [00000000053EEF90] 47C9F5D0-81B2-4EDB-B35D-5E2A9C908766 WARNING Failed: Hr: = [0x80072020] : F: lVal : ADsOpenObject( ldapQuery.PeekStr(), 0, 0, ADS_SECURE_AUTHENTICATION, __uuidof(IDirectorySearch), (void **)&pds )
    05/20 08:12:08.894 03 clusterutil.cpp(565) [00000000053EF480] 47C9F5D0-81B2-4EDB-B35D-5E2A9C908766 WARNING Failed: Hr: = [0x80072020] : F: lVal : machineName.GetDnsName(ssName, ssDomainName, ssFqdnNodeName)
    05/20 08:12:08.894 03 clusterutil.cpp(969) [00000000053EF480] 47C9F5D0-81B2-4EDB-B35D-5E2A9C908766 WARNING Failed: Hr: = [0x80072020] : F: lVal : ConvertToFqdn(lpszNodeName, ssfqdnNodeName)
    05/20 08:12:08.894 61 necluster.cpp(123) [0000000006EF1020] 47C9F5D0-81B2-4EDB-B35D-5E2A9C908766 WARNING Failed to Get ActiveNode for ResourceGroup SQL_Group_Name
    >snip<


    Fix the intermittent network problems that is causing the ldap query to fail.

     


    Regards, Mike J. [MSFT] This posting is provided "AS IS" with no warranties, and confers no rights.
    Monday, September 26, 2011 3:33 PM
    Moderator