none
SharePoint backups failing for 2 out of 10 databases RRS feed

  • Question

  • We have a MOSS 2010 server (v 14.0.4763.1000) that uses an SQL server 2008 (v 10.3.5500.0) which are being backed up by a DPM 2010 server (v 3.0.7707.0).  All are VMWare virtual machines running Windows Server 2008 R2 Standard SP1.  The DPM agents are the same version as the DPM server.  The SharePoint backups are failing for 2 of the 10 SharePoint-Site databases.  These 2 databases were being backed up successfully for about 6 weeks after they were created. 

    From the SQL server Application log:

    Log Name:      Application
    Source:        DPMRA
    Date:          3/16/2012 12:57:28 PM
    Event ID:      85
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          SYSTEM
    Computer:      <SQLServer>
    Description:
    A DPM agent failed to communicate with the DPM service on <DPMServer> because of a communication error. Make sure that <DPMServer> is remotely accessible from the computer running the DPM agent. If a firewall is enabled on <DPMServer>, make sure that it is not blocking requests from the computer running the DPM agent (Error code: 0x800706f7, full name: <DPMServer> ).
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="DPMRA" />
        <EventID Qualifiers="0">85</EventID>
        <Level>2</Level>
        <Task>0</Task>
        <Keywords>0x80000000000000</Keywords>
        <TimeCreated SystemTime="2012-03-16T16:57:28.000000000Z" />
        <EventRecordID>22286</EventRecordID>
        <Channel>Application</Channel>
        <Computer><SQLServer></Computer>
        <Security UserID="S-1-5-18" />
      </System>
      <EventData>
        <Data><DPMServer></Data>
        <Data>0x800706f7</Data>
        <Data><DPMServer></Data>
      </EventData>
    </Event>


    From SQLServer DPMRA*.errlog file:

    0240 094C 03/16 16:57:28.265 04 cmdproc.cpp(2469) [00000000002C29C0] 32CEF35E-1E95-4BB7-B1F5-DCD5DE7A59F6 WARNING Failed: Hr: = [0x800706f7] : F: lVal : pAgentCommand->SubmitResponse ( pCommand->m_guidCommandInstance, pCommand->GetCmdType(), m_cmdProcConfig.GetclsidServer(), pbXML, cbXML )
    0240 094C 03/16 16:57:28.265 04 cmdproc.cpp(2482) [00000000002C29C0] 32CEF35E-1E95-4BB7-B1F5-DCD5DE7A59F6 WARNING CCommandProcessor::SendOutboundCommand this:[00000000002C29C0], ServerName: <DPMServer>
    0240 094C 03/16 16:57:28.276 04 cmdproc.cpp(2579) [00000000002C29C0] 32CEF35E-1E95-4BB7-B1F5-DCD5DE7A59F6 WARNING Logging event for error: 4096, detailed: 0x800706f7
    0240 094C 03/16 16:57:28.276 04 events.cpp(89) [000000001DB79970] 32CEF35E-1E95-4BB7-B1F5-DCD5DE7A59F6 WARNING Failed: Hr: = [0x00001000] CCmdProcEvent::GetEventId: unexpected errorCode: detailed hr: 0x800706f7


    From the DPM Monitoring Jobs tab:

    Type: Replica creation
    Status: Failed
    Description: The DPM service was unable to communicate with the protection agent on <SQLServer>. (ID 52)
     More information
    End time: 3/16/2012 12:57:38 PM
    Start time: 3/16/2012 11:28:24 AM
    Time elapsed: 01:29:13
    Data transferred: 0 MB
    Cluster node -
    Source details: <SQLServer>\XXXSQL2\WSS_Content_AAAA
    Protection group: Datacenter SharePoint Protection Group
     This job can be rerun by running the failed job for the corresponding farm datasource(<SQLServer>\XXXSQL2\SharePoint_Config).


    I've tried the 4 connectivity tests as suggested by Steve Light here:
    http://social.technet.microsoft.com/Forums/en-US/dpmsetup/thread/5f53acd7-3758-486d-9d3f-13adc5b7d548

    And I've tried the DCOM connectivity tests outlined here:
    http://support.microsoft.com/kb/259011

    All these connectivity tests were done between the SQLServer and DPMServer, both ways.  And they were all successful. 

    None of the SQL databases have full-text indexing enabled.  I've deleted and recreated the SharePoint Protection group, once keeping the existing data and then when deleting the existing data (Ouch).  I've tried the ConfigureSharePoint command with different administrator credentials (and then rerunning the backups).  I've uninstalled and reinstalled the protection agents on the SQLServer and MOSS server.  Still I can't get these 2 databases backed up. 

    According to the SQLServer, they are getting backed up.  From SQLServer Application log:

    Log Name:      Application
    Source:        MSSQL$XXXSQL2
    Date:          3/16/2012 11:32:45 AM
    Event ID:      18264
    Task Category: Backup
    Level:         Information
    Keywords:      Classic
    User:          SYSTEM
    Computer:      <SQLServer>
    Description:
    Database backed up. Database: WSS_Content_AAAA, creation date(time): 2011/11/16(14:16:47), pages dumped: 83318, first LSN: 1350:30116:204, last LSN: 1350:30201:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{F85CF57C-2EA8-4080-BDF9-B9EC54714350}1'}). This is an informational message only. No user action is required.
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="MSSQL$XXXSQL2" />
        <EventID Qualifiers="16384">18264</EventID>
        <Level>4</Level>
        <Task>6</Task>
        <Keywords>0x80000000000000</Keywords>
        <TimeCreated SystemTime="2012-03-16T15:32:45.000000000Z" />
        <EventRecordID>22262</EventRecordID>
        <Channel>Application</Channel>
        <Computer><SQLServer></Computer>
        <Security UserID="S-1-5-18" />
      </System>
      <EventData>
        <Data>WSS_Content_AAAA</Data>
        <Data>2011/11/16</Data>
        <Data>14:16:47</Data>
        <Data>83318</Data>
        <Data>1350:30116:204</Data>
        <Data>1350:30201:1</Data>
        <Data>1</Data>
        <Data>FILE=1, TYPE=VIRTUAL_DEVICE: {'{F85CF57C-2EA8-4080-BDF9-B9EC54714350}1'}</Data>
        <Binary>584700000A00000010000000530052005600530051004C0032005C004E004C004800530051004C0032000000070000006D00610073007400650072000000</Binary>
      </EventData>
    </Event>


    But that backup never makes it to the DPMServer.  You'll notice that there's a good 1.5 hours between when the SQLServer thinks the database gets backed up to when the DPMServer fails the job.  And from the DPMRA*.errlog file the error code 0x800706f7, which is a "The stub received bad data." error.  So I've got a stub receiving bad data that causes a timeout communications error(?) that causes my backup to fail?  How do I fix this?  Any Ideas would be appreciated.

    Friday, March 16, 2012 9:45 PM

All replies