locked
SCOM 2019 - New Alerts not working/firing RRS feed

  • Question

  • Hello

    I have a number of virtual machines running on Windows Server 2008 and Windows Server 2012 that I need to monitor for three factors:

    • Disk space usage - if a certain VM server is using a lot of space in a certain disk or all of its disks
    • RAM usage - if a VM is using an unusually high amount of its RAM 
    • CPU usage - if a VM is using an unusually high amount of its CPU power

    Previously I've set up a management pack that contained eight simple threshold monitors: four of which are for each of the above factors for Server 2008, and four for Server 2012. It's four because for disk space usage one monitor checks if the free disk space is below 20%, in which it generates a warning for that server, and another checks if the free disk space is below 10%, which generates a critical error for that server.

    This previous solution was a problem because if, for example, one of the servers has 5% free disk space, it will generate both a warning (that it is below 20%) and a critical error (that it is below 10%), when in reality I only want it to generate either a warning or a critical for if it is below 20% or below 10% respectively. 

    To fix this I created a new management pack, but with six monitors: three for each above factor for Server 2008, and three for Server 2012. They are double-threshold monitors which should generate a warning if, say, the RAM usage is between 80% and 90%, generate a critical error if it is above this threshold, and go back to healthy if it is below this threshold. This is the same for the CPU but different for disk space usage (threshold is between 10% and 20% free disk space, critical if it is below but healthy if above).

    However, since creating the new monitors after deleting the old ones, the servers that have previously generated warnings and critical errors under the previous monitors aren't doing the same for the new ones, and I can't seem to figure out why. The agents are working on the servers and I've tried putting them in maintenance mode for an hour before I left yesterday in case they are the issue, but I've opened up SCOM today, checked the agents are out of maintenance mode, and the alerts are still not firing despite the servers still having a low amount of disk space/high RAM/CPU usage (which should trigger them)

    Friday, June 5, 2020 10:41 AM

All replies

  • Hello,

    First of all SCOM 2019 does not support monitoring of Windows Server 2008/2008R2 servers, although it is possible, but there can exist inconsistencies.

    You might want to try clearing the agent cache of the servers, this will delete the "Health Service State" folder and restart the agents, which means the agents will get the configurations & management packs from the management servers again.

    I have made PowerShell scripts for clearing the agent's cache:

    SCOM Agent Clear Cache
    https://gallery.technet.microsoft.com/SCOM-Agent-Clear-Cache-54998a47

    Clear SCOM Agent cache for computers from text file
    https://gallery.technet.microsoft.com/SCOM-Agent-Clear-Cache-5a88a8bb

    Here's also the official documentation:

    How and When to Clear the Cache
    https://docs.microsoft.com/en-us/system-center/scom/manage-clear-healthservice-cache?view=sc-om-2019

    If it still doesn't work, check the Operations Manager event log for any clues, also double check your custom management pack for any possible errors.

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Friday, June 5, 2020 11:55 AM
  • Hi,
     
    Agree with Leon,  the SCOM 2019 agent does not support Windows server 2008. We suggest to plan an upgrade for these servers.
     
     https://kevinholman.com/2019/03/07/scom-2019-news/
     
    From your description, I know the alerts are still not firing despite the servers still having a low amount of disk space/high RAM/CPU usage. Could you confirm what version are these servers ? Are they Windows server 2012? If yes, please follow Leon’s suggestion to clear the cache and see if the result will be different.
     
    Hope it can help.
     
    Best regards.
    Crystal

    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Monday, June 8, 2020 1:35 AM
  • Hi,

    How's thing going? Did we try to clear the cache? Is it working now? If there's any update, please let us know.

    Best regards.

    Crystal


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, June 12, 2020 4:40 AM
  • Thank you Crystal and Leon for your advice. Apologies on the late reply, haven't had much time to work on this.

    I've tried the script that you suggested to clear the agents cache, but unfortunately the alerts still aren't being triggered. I've checked over the management pack and the agents are set up almost exactly like the old ones (with the exception of being double-threshold). There aren't any new logs generated that I can check.

    Friday, June 12, 2020 8:48 AM
  • May I ask why you'd rather not use the native Microsoft MPs for these counters?

    Also, could you show us the exact xml code for the monitors you created?

    Friday, June 12, 2020 9:02 AM