none
SQL Job duration False Alarts

    Question

  • Hi All,

    I have been facing issues with the job duration monitor in SQL server management pack. We get an alert for job duration once the job duration exceeds a specified time limit which is fine but the monitor keeps changing heath state between healthy and critical state again and again even when the job has got succeeded and not running. This causes the alert to close and trigger again and again. I have attached the screenshot of health state changes example below. It goes from Healthy to Critical and then from not monitored to Healthy and so on. I am unable to understand how this monitor is configured. Would it alert for job duration even after the job get completed and why it keeps changing the state to healthy again and again. 

    Regards,
    Daya Ram

    Wednesday, May 16, 2018 8:13 AM

All replies

  • Hi Daya,

    can you please give some more details on this one?

    - which version and Update Level of SCOM are you running?
    - Which version of the SQL management pack did you have imported?
    - Is only one particular job affected or the issue can be reporoduced with any job?
    - What does the context of the state change event say?

    Thanks in advance. Regards,


    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov


    Wednesday, May 16, 2018 11:48 AM
    Moderator
  • Hi Stoyan Chalakov,

    I am using SCOm 2012 R2 with update rollup 14 with SQL MP version 6.7.31.0. 

    I am getting this issue for almost every long running job that is being alerted for long job duration. Alert context gives the details of all jobs and the one that is being alerted is shown as "The job succeeded". This is something as below for job _dba_DatabaseBackup - SYSTEM_DATABASES - FULL. 


    Date and Time: 16/05/2018 12:43:20
    Property Name Property Value




    {6249998C-343D-4CAE-B9C4-E65B648D5797}-LastStatus -1
    {6249998C-343D-4CAE-B9C4-E65B648D5797}-LastMessage
    {D26717B5-43CB-4D8D-B733-E6DA017EE85B}-Duration 0
    {D26717B5-43CB-4D8D-B733-E6DA017EE85B}-LastStatus 1
    {D26717B5-43CB-4D8D-B733-E6DA017EE85B}-LastMessage The job succeeded. The Job was invoked by Schedule 12 (System_Full_Backup). The last step to run was step 1 (_dba_DatabaseBackup - SYSTEM_DATABASES - FULL).
    {D95C75C7-8548-41F5-B955-F4FEB5312917}-Duration -1
    {D95C75C7-8548-41F5-B955-F4FEB5312917}-LastStatus -1
    {D95C75C7-8548-41F5-B955-F4FEB5312917}-LastMessage


    • Edited by D. R Wednesday, May 16, 2018 12:56 PM
    Wednesday, May 16, 2018 12:46 PM
  • Hello Daya,

    Is your agent restarts frequently or not, earlier i have same issue and i got this thread which help me to fix this. Please check if this help you Or what you can is change the threshold if you afford :)  


    Cheers, Gourav Please remember to mark the replies as answers if it helped.

    Wednesday, May 16, 2018 3:17 PM
  • Hi Gourav,

    I have gone through this one previously but in my case agent restart is also not observed. Also, one thing that is of concern is if the job has already completely succeeded successfully then why it is still alerting for long duration. Is it configured to alert on job duration irrespective of job status?

    Regards,
    Daya Ram

    Thursday, May 17, 2018 10:32 AM
  • <<<Is it configured to alert on job duration irrespective of job status?>>> Yes!

    The management pack provides a “Long-running Jobs” monitor targeted to SQL Server Agent object. The monitor oversees all jobs running by SQL Server agent and changes the state when the duration of any job execution exceeds the threshold. An alert is also registered in this case.

    For further you can read this.

    So in general it will create an alert if the value of any job is running 120 seconds with the given time interval second. I suggest read about  this "SQL Server 2016 Agent Job - Unit monitors" in the above given linkn for further information  


    Cheers, Gourav Please remember to mark the replies as answers if it helped.

    Thursday, May 17, 2018 11:04 AM
  • Hi Daya,

    is the affected MP for SQL 2008 respectively SQL 2012? When I see the version I think this should be the case? Because for SQL 2014 and 2016 there are newer MPs and I know you do update them? AM i right?

    I am trying to understand why the MP fires when the jobs is successfully and wnat to be sure that the version does not introduce a known issue...

    Regards,


    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov

    Thursday, May 17, 2018 11:20 AM
    Moderator
  • Hi Stoyan,

    It's basically for the SQL 2012 Management Pack and the MP version is 6.7.31.0. Although we keep updating the management packs but this one is not the latest. However I have the checked the changes made in newer versions but nothing has been changed from job monitoring perspective in these as well. 

    Regards,

    Daya Ram

    23 hours 29 minutes ago