none
Failed to store data in the Data Warehouse

    Question

  • Hello,

    We are running SCOM 2007 R2 and just started receiving these alerts on our RMS and also on one of our management servers. I saw one blog site saying this was caused by the KMS management pack, but we don't have this MP installed.  Any suggestions on how to go about troubleshooting this?

    Thanks,
    Tom

    Log Name:      Operations Manager
    Source:        Health Service Modules
    Date:          10/7/2009 2:14:56 PM
    Event ID:      31552
    Task Category: Data Warehouse
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      LA-SCOMRMS01.xxx.com
    Description:
    Failed to store data in the Data Warehouse.
    Exception 'SqlException': Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

    One or more workflows were affected by this. 

    Workflow name: Microsoft.SystemCenter.DataWarehouse.StandardDataSetMaintenance
    Instance name: State data set
    Instance ID: {CA524D86-58D7-9C71-5B71-466EC8AF7F5A}
    Management group: XXX


    Log Name:      Operations Manager
    Source:        Health Service Modules
    Date:          10/7/2009 2:15:28 PM
    Event ID:      31553
    Task Category: Data Warehouse
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      LA-SCOMRMS01.XXX.com
    Description:
    Data was written to the Data Warehouse staging area but processing failed on one of the subsequent operations.
    Exception 'SqlException': Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

    One or more workflows were affected by this. 

    Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectEntityHealthStateChange
    Instance name: LA-SCOMRMS01.XXX.com
    Instance ID: {1458430E-9F73-1C41-601F-D87FA689D18C}
    Management group: XXX

    Wednesday, October 07, 2009 9:36 PM

Answers

  • Ok, so we closed our pss case with the resolution, but we didn't find root cause. Below are the note of what the pss engineer found.

    "I determined that the StandardDataSetMaintenance stored procedure was timing out when trying to aggregate the data in the State.vStateRaw table. Unfortunately, I was not able to see why the aggregation was taking so long, the data in the table was not corrupt and there wasn’t an excessive amount of it. To correct the issue I disabled the rule "Standard Data Warehouse Data Set maintenance rule" for the State data set. I then ran the StandardDataSetMaintenance stored procedure manually and passed it the DataSetId for the State data set. After the stored procedure completed the issue was resolved. We reenabled the rule and no critical alerts have returned."

    • Marked as answer by martit01 Friday, October 23, 2009 9:17 PM
    Friday, October 23, 2009 9:17 PM

All replies

  • It looks like your state staging tables are stuck.....   can you count the records in the state staging tables in the datawarehouse database?
    Thursday, October 08, 2009 2:10 AM
    Moderator
  • Hello Kevin,

    I'm not a DBA so please be patient. Are you referring to the "State.StateStage" Table?  If so, that one has a Row count of 123.

    Thanks,
    Tom
    Thursday, October 08, 2009 7:39 PM
  • That's not bad then... wait a little time - and then see if it increases or goes to zero.

    Is your warehouse database server size appropriately?  Is it under load?  Have you examined performance counters for the basic memoty, CPU, and disk I/O?
    Friday, October 09, 2009 6:51 AM
    Moderator
  • Have you looked that the discoveries from the imported MP's are not set to update too frequently?

    I'm here now thinking of MP's like IIS or such which had 60min interval discoveries for example.

    If I remember right, Kevin has also blogged about those best practices what to do to the discoveries.

    -Tero
    MCT | MCSE | MCITP | MCTS SCOM & SCCM
    Friday, October 09, 2009 7:58 AM
  • I've been watching the State.StateStage table, I've seen it go up to 245 row count, but it then drops back down to 0 and started to grow again.

    Warehouse database is sized to 250Gb, currently with 93Gb of space available.

    I've ran performance reports on the SQL server and do not see any performance issue within the past 30 days.

    Monday, October 12, 2009 6:58 PM
  • How often are you getting these alerts?  Your staging tables are behaving normally - staging some data then processing it - so this looks like it might be a transient issue?

    Does this only happen at night?  Or constantly all day?  During backups?


    One thing to try (without much deeper digging) - is to run a sp_updatestats and a full manual reindex of the database.... it might be that the standard dataset maintenance is struggling to complete.


    Performance reports wont tell you what you need to know.  You need to run a perfmon - logical disks - on the database disk volumes, for avg disk sec per read, and avg disk sec per write - and collect this EVERY SECOND, for each disk.  THen analyze that data looking for sustained periods above 15ms.
    Monday, October 12, 2009 11:20 PM
    Moderator
  • Thanks Kevin,

    How do I run the full manual reindex of the OpsMgrDW database?

    Thanks,
    Tom
    Tuesday, October 13, 2009 10:58 PM
  • I have the similar messages all the time, day and night, almost anytime, RMS and 2 MS all have the similar problem, and  in DW, once a while, some processes which I do not know the name had been locked (time out), DW transaction log keeps full, RMS and MS all turned red which is very ugly. that is right after SP1 to R2 upgrade, we have no KMS either,  not sure if Dell 4.0 MP or Exchange MP, Unix MP we just added during the R2 upgrade will cause the issue.

    But I am still able to run report 

    the full alert is:

    Data Warehouse event data dedicated maintenance process failed to perform maintenance operation. Failed to store data in the Data Warehouse.
    Exception 'SqlException': Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
    One or more workflows were affected by this. Workflow name: Microsoft.SystemCenter.DataWarehouse.StandardDataSetMaintenance
    Instance name: Event data set
    Instance ID: {4C8320EB-234A-F8DA-02BB-82BE4C469FD6}"
    Friday, October 16, 2009 6:20 PM
  • I have a PSS case opened with Microsoft to resolve this. I should have an update by early next week.
    Saturday, October 17, 2009 12:27 AM
  • I have a PSS case opened with Microsoft to resolve this. I should have an update by early next week.

    Martit, please keep us updated what Microsoft finds.  

    I doubt some latest MPs may cause what it happened,  have you recently uploaded any MP?

    Regards
    Lisa
    Saturday, October 17, 2009 3:24 AM
  • Ok, so we closed our pss case with the resolution, but we didn't find root cause. Below are the note of what the pss engineer found.

    "I determined that the StandardDataSetMaintenance stored procedure was timing out when trying to aggregate the data in the State.vStateRaw table. Unfortunately, I was not able to see why the aggregation was taking so long, the data in the table was not corrupt and there wasn’t an excessive amount of it. To correct the issue I disabled the rule "Standard Data Warehouse Data Set maintenance rule" for the State data set. I then ran the StandardDataSetMaintenance stored procedure manually and passed it the DataSetId for the State data set. After the stored procedure completed the issue was resolved. We reenabled the rule and no critical alerts have returned."

    • Marked as answer by martit01 Friday, October 23, 2009 9:17 PM
    Friday, October 23, 2009 9:17 PM
  • Thank you Martit!

    It seems that my problem is resolved too.

    1.       I disabled the rule "Standard Data Warehouse Data Set maintenance rule" for the State data set.

    2.       I then ran the StandardDataSetMaintenance stored procedure manually 3 times and passed it the DataSetId for the State data set.

    a.       first time it finished after 24 minutes;

    b.      second time it finished after 58 minutes;

    c.       third time it finished after 02 seconds.

    3.       I reenabled the rule and no critical alerts have returned.

     

    Best regards,

    Aleksandrs

    Tuesday, October 27, 2009 3:08 PM
  • Hello!

    The same error appeared again in my case after approximately 10 hours. I think it is necessary to find the root of this error.

    Best regards,

    Aleksandrs
    Wednesday, October 28, 2009 7:27 AM
  • Hello lisa,

    Just wondering if you do store your overrides in the default Management pack?

    Saturday, November 28, 2009 12:25 AM
  • good point that I will check, thanks, Maher
    I do not store overrides in the the default MP, however I am very new and not sure if someone else did or not

    Thursday, December 03, 2009 1:27 AM
  • 1.       I disabled the rule "Standard Data Warehouse Data Set maintenance rule" for the State data set.

    2.       I then ran the StandardDataSetMaintenance stored procedure manually 3 times and passed it the DataSetId for the State data set.

    a.       first time it finished after 24 minutes;

     

    We are experiencing the same issue. StandardDataSet has been started manually after disabling the rule as above, and has been running for four days, four hours and ten minutes (and counting), generating a tempdb log of 12GB so far.

    Are there any indicators as to how long this will take? I guess the procedure is doing *something* as long as the tempdb log keeps growing...

    LastOptimizationActionSuccessfulCompletionDateTime for 'State data set' is 2010-10-03 01:00:27.657

    Regards,

    Magnus

    Monday, October 25, 2010 12:41 PM
  • hmmm...

     

    StandardDatasetMaintenance has been running for 48 days, 2 hours.

    TempDB is 147GB.

    It is running on 16 cores / 32GB RAM and there does not appear to be any significant load on the server.

     

    Any clues?

     

    Magnus

    Wednesday, December 08, 2010 12:00 PM
  • Magnus - are you saying your MANUAL run of the job has been running for 48 days straight????
    Kevin Holman http://blogs.technet.com/b/kevinholman
    Wednesday, December 08, 2010 1:49 PM
    Moderator
  • Kevin, yep.

    Eventually we had to cancel the query and reboot the server due to pending patches.

    Magnus

    Thursday, January 27, 2011 11:17 PM
  • I finally got around to finding the "how to" steps the pss engineer used to resolve this issue.  Just a heads up, for step #5 you will need to know the dataset, so I've included  the following query to get that information:

    SELECT SchemaName

    FROM StandardDataset

     

    1.Using the instance name section in the 31552 event, find the data set that is causing the problem. (See the highlighted portion in the event below)

     Log Name:      Operations Manager

    Source:        Health Service Modules

    Event ID:      31552

    Task Category: Data Warehouse

    Computer:      LA-SCOMRMS01.caa.com

    Description:

    Failed to store data in the Data Warehouse.

    Exception 'SqlException': Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

     

    One or more workflows were affected by this. 

     

    Workflow name: Microsoft.SystemCenter.DataWarehouse.StandardDataSetMaintenance

    Instance name: State data set

    Instance ID: {CA524D86-58D7-9C71-5B71-466EC8AF7F5A}

    Management group: CAA-LA

     

    2.Create an override to disable the maintenance procedure for this data set:

    a.In the OpsMgr console go to Authoring-> Rules-> Change Scope to “Standard Data Set”

    b.Right click the rule “Standard Data Warehouse Data Set maintenance rule” -> Overrides -> Override the rule -> For a specific object of class: Standard Data Set”

    c.Select the data set that you found from the event in step 1.

    d.Check the box next to Enabled and change the override value to “False”, then apply the changes.

     

     

    3.Restart the “System Center Management” service on the RMS.

     

    4.Wait approximately 10 minutes and then connect to the SQL server that hosts the OperationsManagerDW database and open SQL Management Studio.

     

    5.Run the query below replacing the highlighted portion with the name of the data set from step 1.

     

    USE [OperationsManagerDW]

    DECLARE @DataSet uniqueidentifier

    SET @DataSet = (SELECT DatasetId FROM StandardDataset WHERE SchemaName = 'Event ')

    EXEC StandardDatasetMaintenance @DataSet

     

    6.Once the query finishes follow steps 1-5 again, only this time set the rule to Enabled = True.

     

    7.Monitor the event log for any timeout events.

     

     


    Tom Martin Email: tmartin@caa.com
    Thursday, February 17, 2011 7:36 PM
  • Seems the problem was resolved by applying CU4
    Monday, February 28, 2011 11:09 AM
  • Hi Martin,

    I tried your solution. In my case the dataset is for the event data set. i changed it and ran the query. But I get the result in the first column as 0 and the second column as null.

    Is this the expexted result ?

    Thanks,

    Dhanraj

    Monday, April 18, 2011 12:43 PM
  • Hypothetical, if the Standard Data Warehouse Data Set maintenance rule was disabled for a month and then this process was executed.  Would the state data from a month ago be available or does is it removed in some fashion?
    Thursday, October 13, 2011 4:31 PM
  • I have reset the the credentials of the Data Reader account and all was fine
    Monday, March 10, 2014 12:08 PM