none
SQL Job duration False Alarts

    Pregunta

  • Hi All,

    I have been facing issues with the job duration monitor in SQL server management pack. We get an alert for job duration once the job duration exceeds a specified time limit which is fine but the monitor keeps changing heath state between healthy and critical state again and again even when the job has got succeeded and not running. This causes the alert to close and trigger again and again. I have attached the screenshot of health state changes example below. It goes from Healthy to Critical and then from not monitored to Healthy and so on. I am unable to understand how this monitor is configured. Would it alert for job duration even after the job get completed and why it keeps changing the state to healthy again and again. 

    Regards,
    Daya Ram

    miércoles, 16 de mayo de 2018 8:13

Respuestas

  • I opened a case with Microsoft and as per them this monitor will generate alert even if the job has succeeded. So, either the job should be fixed not to run for long duration or increase the threshold. Also, as per them, Health state of the monitor changing and alert gets closed when agent restarts is pretty normal.
    • Marcado como respuesta D. R lunes, 25 de junio de 2018 12:39
    • Editado D. R lunes, 25 de junio de 2018 12:39
    lunes, 25 de junio de 2018 12:39

Todas las respuestas

  • Hi Daya,

    can you please give some more details on this one?

    - which version and Update Level of SCOM are you running?
    - Which version of the SQL management pack did you have imported?
    - Is only one particular job affected or the issue can be reporoduced with any job?
    - What does the context of the state change event say?

    Thanks in advance. Regards,


    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov


    miércoles, 16 de mayo de 2018 11:48
    Moderador
  • Hi Stoyan Chalakov,

    I am using SCOm 2012 R2 with update rollup 14 with SQL MP version 6.7.31.0. 

    I am getting this issue for almost every long running job that is being alerted for long job duration. Alert context gives the details of all jobs and the one that is being alerted is shown as "The job succeeded". This is something as below for job _dba_DatabaseBackup - SYSTEM_DATABASES - FULL. 


    Date and Time: 16/05/2018 12:43:20
    Property Name Property Value




    {6249998C-343D-4CAE-B9C4-E65B648D5797}-LastStatus -1
    {6249998C-343D-4CAE-B9C4-E65B648D5797}-LastMessage
    {D26717B5-43CB-4D8D-B733-E6DA017EE85B}-Duration 0
    {D26717B5-43CB-4D8D-B733-E6DA017EE85B}-LastStatus 1
    {D26717B5-43CB-4D8D-B733-E6DA017EE85B}-LastMessage The job succeeded. The Job was invoked by Schedule 12 (System_Full_Backup). The last step to run was step 1 (_dba_DatabaseBackup - SYSTEM_DATABASES - FULL).
    {D95C75C7-8548-41F5-B955-F4FEB5312917}-Duration -1
    {D95C75C7-8548-41F5-B955-F4FEB5312917}-LastStatus -1
    {D95C75C7-8548-41F5-B955-F4FEB5312917}-LastMessage


    • Editado D. R miércoles, 16 de mayo de 2018 12:56
    miércoles, 16 de mayo de 2018 12:46
  • Hello Daya,

    Is your agent restarts frequently or not, earlier i have same issue and i got this thread which help me to fix this. Please check if this help you Or what you can is change the threshold if you afford :)  


    Cheers, Gourav Please remember to mark the replies as answers if it helped.

    miércoles, 16 de mayo de 2018 15:17
  • Hi Gourav,

    I have gone through this one previously but in my case agent restart is also not observed. Also, one thing that is of concern is if the job has already completely succeeded successfully then why it is still alerting for long duration. Is it configured to alert on job duration irrespective of job status?

    Regards,
    Daya Ram

    jueves, 17 de mayo de 2018 10:32
  • <<<Is it configured to alert on job duration irrespective of job status?>>> Yes!

    The management pack provides a “Long-running Jobs” monitor targeted to SQL Server Agent object. The monitor oversees all jobs running by SQL Server agent and changes the state when the duration of any job execution exceeds the threshold. An alert is also registered in this case.

    For further you can read this.

    So in general it will create an alert if the value of any job is running 120 seconds with the given time interval second. I suggest read about  this "SQL Server 2016 Agent Job - Unit monitors" in the above given linkn for further information  


    Cheers, Gourav Please remember to mark the replies as answers if it helped.

    jueves, 17 de mayo de 2018 11:04
  • Hi Daya,

    is the affected MP for SQL 2008 respectively SQL 2012? When I see the version I think this should be the case? Because for SQL 2014 and 2016 there are newer MPs and I know you do update them? AM i right?

    I am trying to understand why the MP fires when the jobs is successfully and wnat to be sure that the version does not introduce a known issue...

    Regards,


    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov

    jueves, 17 de mayo de 2018 11:20
    Moderador
  • Hi Stoyan,

    It's basically for the SQL 2012 Management Pack and the MP version is 6.7.31.0. Although we keep updating the management packs but this one is not the latest. However I have the checked the changes made in newer versions but nothing has been changed from job monitoring perspective in these as well. 

    Regards,

    Daya Ram

    lunes, 21 de mayo de 2018 10:19
  • Hi Stoyan,

    Were you able to get anything on this one, please advise. 

    Regards,

    Daya Ram

    martes, 22 de mayo de 2018 14:56
  • Hi Daya,

    please appologize the delay in the response. I am afraid I have to ask further to try and create a picture of where the troubleshooting should had to...

    - does the issue happen only on one agents or more agents are affected?
    - Does a SQL job run at all during the health state change? Or with other words, Is the monitor triggering a state chnage even when no job is running?
    - Can you please check the event log (Oprations Manager) on the affected agent and see if there are any events at the time of the state change?
    - Would you please post here the exact name of the monitor, so that I can check its config?

    Thanks in advance!

    Regards,


    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov

    miércoles, 23 de mayo de 2018 7:25
    Moderador
  • Hi Stoyan,

    I am observing the issue on more than one agent and did see when agent gets new configuration or may be restarting then the monitor "Job Duration" gets initialized and causes the health state to change to healthy state. It then again goes to critical state and triggers and alert based on the job duration which have exceeded the threshold in the past. Interesting thing is it triggers alert again even if the job has successfully completed. So , I believe the monitor is configured to check only the job duration irrespective of job status. But it shouldn't trigger and alert again when it sees the job status as completed.

    Regards,
    Daya Ram 

    miércoles, 23 de mayo de 2018 14:05
  • Hi Daya,

    this is pretty odd... does a trivial thing like re-initializing the cache bring something?

    P.S. i will try check the monitor config to see how it is actually configured. Will post back if I find any clues...What a about a support case? Is this an option for you?
    Any events at the time of the issue on the agent side?

    Regards,


    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov

    miércoles, 23 de mayo de 2018 14:14
    Moderador
  • I opened a case with Microsoft and as per them this monitor will generate alert even if the job has succeeded. So, either the job should be fixed not to run for long duration or increase the threshold. Also, as per them, Health state of the monitor changing and alert gets closed when agent restarts is pretty normal.
    • Marcado como respuesta D. R lunes, 25 de junio de 2018 12:39
    • Editado D. R lunes, 25 de junio de 2018 12:39
    lunes, 25 de junio de 2018 12:39
  • Hi Daya,

    manay thanks for sharing this! It seems that I faced the same behaviour, just couple of days ago!

    Regards,


    (Please take a moment to "Vote as Helpful" and/or "Mark as Answer" where applicable. This helps the community, keeps the forums tidy, and recognizes useful contributions. Thanks!) Blog: https://blog.pohn.ch/ Twitter: @StoyanChalakov

    martes, 26 de junio de 2018 7:16
    Moderador