locked
Reset Health of the Agent offline alert is not re triggering the alert RRS feed

  • Question

  • HI ,

    In the scom 2012 R2 I wrote a script to reset the health of the closed alert so that scom can re trigger an alert .

    But it didnt work for a agent offline alert 'Health service Heartbeat Failure', wherein the agent went to grey state .

    Wednesday, March 2, 2016 9:51 PM

Answers

  • This is "by design".

    The event only triggers once when the heartbeat miss occurs. (typically 3x60 seconds). If you reset this monitor manually i will never trigger a new alert.

    My guess this is because the management server keeps track of the "last heartbeat" and only checks for that "3 min" mark and probably marks it for "reported". In scom 2007, this information was kept in memory, so a reboot of the rms would resend an alert. In 2012 this is kept in the db so no fix here.

    MS could solve this by resetting the last heartbeat from an agent to the "Reset monitor" time, but you'll need to open a support case for that and write a business case why this is important for you (ms should just understand that the monitor disconnected from alerts is a huge fundamental flaw in SCOM, hence this being a top request in the 2016 feedback).


    Rob Korving
    http://jama00.wordpress.com/


    • Edited by rob1974 Thursday, March 3, 2016 3:28 PM
    • Proposed as answer by Elton_Ji Saturday, March 26, 2016 6:13 AM
    • Marked as answer by Elton_Ji Sunday, March 27, 2016 2:55 PM
    Thursday, March 3, 2016 3:28 PM

All replies

  • The health service heartbeat failure is triggered from the source " Health Service Watcher". Make sure that your has reset the health of this instance.

    Roger

    Thursday, March 3, 2016 5:04 AM
  • Hi ,

    It was the power shell script ,which we configured in the orchestrator to reset the source monitor if the alert is closed .

    I see the alert monitor as Health Service Heartbeat Failure 


    Thursday, March 3, 2016 9:08 AM
  • This is "by design".

    The event only triggers once when the heartbeat miss occurs. (typically 3x60 seconds). If you reset this monitor manually i will never trigger a new alert.

    My guess this is because the management server keeps track of the "last heartbeat" and only checks for that "3 min" mark and probably marks it for "reported". In scom 2007, this information was kept in memory, so a reboot of the rms would resend an alert. In 2012 this is kept in the db so no fix here.

    MS could solve this by resetting the last heartbeat from an agent to the "Reset monitor" time, but you'll need to open a support case for that and write a business case why this is important for you (ms should just understand that the monitor disconnected from alerts is a huge fundamental flaw in SCOM, hence this being a top request in the 2016 feedback).


    Rob Korving
    http://jama00.wordpress.com/


    • Edited by rob1974 Thursday, March 3, 2016 3:28 PM
    • Proposed as answer by Elton_Ji Saturday, March 26, 2016 6:13 AM
    • Marked as answer by Elton_Ji Sunday, March 27, 2016 2:55 PM
    Thursday, March 3, 2016 3:28 PM