none
DPM 2012 R2 UR4 Console Constantly Crashing RRS feed

  • Question

  • I have a case open with Microsoft regarding this and despite working on it for a week straight, they haven't been able to figure out the problem.  I'm in the process of escalating it, but in the meantime I thought I'd see if anyone can help point us in some sort of direction.

    About two months ago I started seeing these console crashes, but they were very infrequent, maybe once or twice a week.  In the past 2-3 weeks, they have increased in frequency, occurring anywhere from every 5 minutes to once an hour.  The console only crashes when jobs are running.  I can't pinpoint any event that triggered these crashes, however, they seemed to increase in frequency after we had a power outage a few weeks ago.  Over the last week and a half, the crashes are so frequent that it is impossible to get any complete backups.  Microsoft support has examined the DPM logs, run SQL traces and made some tweaks as far as timeouts but the crashes continue.  They tell me the issue is that there are "timeouts" when DPM is connecting to its database.

    If anybody has any ideas, I'd love to hear them and I'd be happy to provide any more information.  We are in a really bad situation at this point.  We have 50 TB of data, so I really don't want to build a new server and start over.

    The follwing is the error in the Application Log which I see every time the console crashes:


    "The description for Event ID 940 from source MSDPM cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

    If the event originated on another computer, the display information had to be saved with the event.

    The following information was included with the event: 

    Unable to connect to the database because of a fatal database error. It is unlikely that the database itself has been damaged.  Review the event log and take appropriate action. Make sure that SQL Server is running.

    Problem Details:
    <FatalServiceError><__System><ID>19</ID><Seq>119</Seq><TimeCreated>11/22/2014 5:56:29 PM</TimeCreated><Source>DpmThreadPool.cs</Source><Line>163</Line><HasError>True</HasError></__System><ExceptionType>SqlException</ExceptionType><ExceptionMessage>A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)</ExceptionMessage><ExceptionDetails>System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.) ---&gt; System.ComponentModel.Win32Exception (0x80004005): The semaphore timeout period has expired
       at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
       at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
       at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
       at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
       at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
       at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
       at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte&amp; value)
       at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean&amp; dataReady)
       at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
       at System.Data.SqlClient.SqlDataReader.get_MetaData()
       at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
       at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task&amp; task, Boolean asyncWrite, SqlDataReader ds)
       at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task&amp; task, Boolean asyncWrite)
       at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
       at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
       at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
       at Microsoft.Internal.EnterpriseStorage.Dls.DB.SqlRetryCommand.InternalExecuteReader()
       at Microsoft.Internal.EnterpriseStorage.Dls.DB.SqlRetryCommand.ExecuteReader()
       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.DB.Task.GetInstance(Guid taskID)
       at Microsoft.Internal.EnterpriseStorage.Dls.ArmCommon.CommonLoop.WakeupReceived(Message msg)
       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.Fsm.Engine.ChangeState(Message msg)
       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.TaskInstance.Process(Object dummy)
       at Microsoft.Internal.EnterpriseStorage.Dls.TaskExecutor.FsmThreadFunction.Function(Object taskThreadContextObj)
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
       at System.Threading.ThreadPoolWorkQueue.Dispatch()
    ClientConnectionId:19a70934-5124-43d7-a55a-11680fb594de</ExceptionDetails></FatalServiceError>


    the message resource is present but the message is not found in the string/message table"

     

    Saturday, November 22, 2014 6:12 PM

Answers

  • It appears I've finally found what's causing this.  When we had our power outage a few weeks ago, one of our iSCSI NAS devices had a problem and we lost a volume for DPM.  We fixed it, but had to remove all the sources in DPM that had data stored on the problem volume, and added them back to the new volume we created.  

    Yesterday, I discovered that the problem volume was still showing in Windows Disk Management as missing even though it was completely gone from the DPM console.  On a hunch, I removed the missing volume from Disk Manager, DPM has been stable (going on 15 hours now.)

    As usual, it seems obvious now that I've found the problem...I just wish Microsoft could have pointed me in the right direction given all the logs and sql traces they've been examining.



    George Moore

    • Marked as answer by WKGeorge Monday, November 24, 2014 5:28 PM
    Monday, November 24, 2014 5:28 PM