none
SCOM SQL Server patching - how to manage outage RRS feed

  • Question

  • Hi, 

    (SCOM 2019)

    Our SQL Server DBA would like to patch the server that is running the SCOM databases.  This will require a two hour outage as he does them one at a time.  

    Normally we would stop both management servers and let them get on with it, but it did make me wonder about the following: 

    I don't know what impact having one database running without the other one being available is, can we run the operations manager db without the data warehouse being available and vice versa?  

    If a monitored server cannot communicate back to a management server, how long does the Microsoft Monitoring Agent retain data for?  

    Thank you, 

    Mark.  

    Thursday, July 9, 2020 9:24 AM

Answers

  • Hi Mark,

    The Agents collect data for the SCOM database and the SCOM data warehouse as well. When the data warehouse is not available the Operations Manager event log on all the Management Servers will start show alerts about the data warehouse not being present and will retry it many times.

    The Operations Console will also show multiple errors, besides that, there will be a gap in the collected data for the reports. So there are multiple things to reckon with when your scenario is being put to work.

    You can run without a data warehouse for awhile with a few errors / alerts that are not too concerning (since you know that you took the data warehouse down).

    Generally after one day of downtime, some data will most likely be dropped depending on the volume of data being generated by the Agents.  If you go several days with the data warehouse down, you'll definitely have some data loss for data that is bound for the data warehouse.

    The SCOM database is required for the SCOM environment to work, so when that goes down, your SCOM is basically down.

    If the SCOM agents cannot communicate with a management server, they will start caching the data locally, this lasts depending on how many workloads are running on the agent computer, normally I'd say a few hours.
    The agent HealthService cache can be increased by modifying the following registry key:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlsSet\Services\HealthService\Parameters\Management Groups\maximumQueueSizeKb


    The default value of queue size is 100 MB. It can be increased up to 1500 MB by adding or modifying DWORDtype registry key. Once you have completed the upgrade of the management group, you can reset it to default value.

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    • Proposed as answer by CyrAz Thursday, July 9, 2020 9:38 AM
    • Marked as answer by TheMAW Thursday, July 9, 2020 9:45 AM
    Thursday, July 9, 2020 9:37 AM

All replies

  • Hi Mark,

    The Agents collect data for the SCOM database and the SCOM data warehouse as well. When the data warehouse is not available the Operations Manager event log on all the Management Servers will start show alerts about the data warehouse not being present and will retry it many times.

    The Operations Console will also show multiple errors, besides that, there will be a gap in the collected data for the reports. So there are multiple things to reckon with when your scenario is being put to work.

    You can run without a data warehouse for awhile with a few errors / alerts that are not too concerning (since you know that you took the data warehouse down).

    Generally after one day of downtime, some data will most likely be dropped depending on the volume of data being generated by the Agents.  If you go several days with the data warehouse down, you'll definitely have some data loss for data that is bound for the data warehouse.

    The SCOM database is required for the SCOM environment to work, so when that goes down, your SCOM is basically down.

    If the SCOM agents cannot communicate with a management server, they will start caching the data locally, this lasts depending on how many workloads are running on the agent computer, normally I'd say a few hours.
    The agent HealthService cache can be increased by modifying the following registry key:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlsSet\Services\HealthService\Parameters\Management Groups\maximumQueueSizeKb


    The default value of queue size is 100 MB. It can be increased up to 1500 MB by adding or modifying DWORDtype registry key. Once you have completed the upgrade of the management group, you can reset it to default value.

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    • Proposed as answer by CyrAz Thursday, July 9, 2020 9:38 AM
    • Marked as answer by TheMAW Thursday, July 9, 2020 9:45 AM
    Thursday, July 9, 2020 9:37 AM
  • Thank you Leon.  Marked that as the answer.  
    Thursday, July 9, 2020 9:45 AM