locked
Help diagnosing a sporadic long running proc RRS feed

  • Question

  • We have a nightly ETLV job that our 30 clients run every night. Depending on the size of the client, it can run from 1 to 7 hours. We have one client, clientX, where the etlv job takes about 7 hours. But about 3 times a month, it takes 20 hours. Looking at the logs, it is always the same proc, procx, that is the issue. Normally procx runs at clientx for 35 minutes. But these 3 times a month it runs 838, 839, 828, 835, or 825 minutes. And then procx finishes successfully! There does NOT seem to be a pattern on day of week, time of day procx starts, time of day procx finishes, amount of data, etc. But it is so strange that if it does not finish in 35 or so minutes, it finishes in 832 +- 7 minutes.

     Yesterday when the etlv at clientx was running over by several hours, i could tell by our log that it was in procx. I did a sp_who2 and there was nothing in the BlkBy column for any SPID. Unfortunately I am not good with diagnosing performance problems and do not remember exactly what sp_who2 said about procx. Know that it was not taking any CPU. Believe that the Status was Suspended. The Command said Update.

     What can I do so that in the future so that I can better diagnose the issue? I know that it is procx which has 3 insert and 10 update statements to the same db. I do not want to do something that will impact the entire etlv job, which is already pushing its 7 hour limit.

    I want some diagnosis steps that I can run if I am called while procx is running. And I would like some diagnosis steps that I can setup in advance because many times I am just told that "oh, yesterday we had that issue again.'

     Thanks

    Thursday, December 11, 2014 4:45 PM

Answers

  • Did you have a look at sp_lock and wait stats dmvs?

    also profiler

    https://www.simple-talk.com/sql/performance/how-to-identify-slow-running-queries-with-sql-profiler/


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Thursday, December 11, 2014 5:57 PM
    Answerer
  • Do the following:

    1.

    >We have one client, clientX, where the etlv job takes about 7 hours

    Have development optimize for a run time 1/2 hour. You don't want a 7 hours job in any IT shop because it bites you sooner or later.

    2. Eliminate missing indexes, REBUILD indexes on schedule, UPDATE STATISTICS nightly:

    http://www.sqlusa.com/articles/query-optimization/

    http://www.sqlusa.com/bestpractices/sargable/

    3. Optimize memory & disk resources:

    http://www.sqlusa.com/bestpractices/memory-configuration/

    http://www.sqlusa.com/bestpractices/configureharddisk/




    Kalman Toth Database & OLAP Architect SQL Server 2014 Database Design
    New Book / Kindle: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2014





    Thursday, December 11, 2014 6:28 PM
  • Start by looking at the estimated execution plan of the query - it should shed some light about any missing indexes or parts that need tuning. The variation of timings may be parameter sniffing problem - if this is new to you, read up the literature on it. You can also turn on some options to trace time for individual statements within the SP - set statistics time on. Also, IO load of each of the statements within the SP can be measured by turning on the option "set statistics io on" 

    Satish Kartan www.sqlfood.com

    Friday, December 12, 2014 4:56 AM
  • We have a nightly ETLV job that our 30 clients run every night. Depending on the size of the client, it can run from 1 to 7 hours. We have one client, clientX, where the etlv job takes about 7 hours. But about 3 times a month, it takes 20 hours. Looking at the logs, it is always the same proc, procx, that is the issue. Normally procx runs at clientx for 35 minutes. But these 3 times a month it runs 838, 839, 828, 835, or 825 minutes. And then procx finishes successfully! There does NOT seem to be a pattern on day of week, time of day procx starts, time of day procx finishes, amount of data, etc. But it is so strange that if it does not finish in 35 or so minutes, it finishes in 832 +- 7 minutes.

     Yesterday when the etlv at clientx was running over by several hours, i could tell by our log that it was in procx. I did a sp_who2 and there was nothing in the BlkBy column for any SPID. Unfortunately I am not good with diagnosing performance problems and do not remember exactly what sp_who2 said about procx. Know that it was not taking any CPU. Believe that the Status was Suspended. The Command said Update.

     What can I do so that in the future so that I can better diagnose the issue? I know that it is procx which has 3 insert and 10 update statements to the same db. I do not want to do something that will impact the entire etlv job, which is already pushing its 7 hour limit.

    I want some diagnosis steps that I can run if I am called while procx is running. And I would like some diagnosis steps that I can setup in advance because many times I am just told that "oh, yesterday we had that issue again.'

     Thanks

    >> I know that it is procx which has 3 insert and 10 update statements to the same db. I do not want to do something that will impact the entire etlv job, which is already pushing its 7 hour limit.

    Right now you are asking about monitoring, this is a great question, but remember that there is an option that we could help you in improving your proc as well, and maybe change this 7 hours to 7 minute (This is not just numbers, but improvements that I have seen a lot, and even one time 1 hour execution improved to 4 second), on the same time maybe we can't :-)

    This is a different question, but If you want us to take a look on your basic issue, then we need more information, which will let us reproduce your issue.

    >> I want some diagnosis steps that I can run if I am called while procx is running. And I would like some diagnosis steps that I can setup in advance because many times I am just told that "oh, yesterday we had that issue again.'

    in addition to other great responses

    * Performance Dashboard Reports

    * in SQL Server 2014 we have a new Dynamic Management View to track what a query is actually doing at a physical operator level, like the sys.dm_exec_query_profiles, which is monitors real time query progress while the query is in execution.

    * Setting SET STATISTICS PROFILE or SET STATISTICS XML to ON into the Query

    * Create an Extended Events Session and add sqlserver.query_post_execution_showplan as a target event.

    *** Check those link as well:

    http://www.sqlshack.com/performance-dashboard-reports-sql-server-2014/

    http://www.mssqltips.com/sqlservertip/1949/sql-server-sysdmoswaitstats-dmv-queries/


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]

    Friday, December 12, 2014 11:46 AM
  • Hi TheBrenda,

    As your description, you want to monitor the long running query. Based on my research, SQL Server Profiler, Dynamic Management Views and Function, SQL Server Activity Monitor could be your choices.

    SQL Server Profiler provides a graphical user interface to create, manage, analyze, and replay SQL traces. SQL Server Profiler provides several built-in templates where tracked events and columns for each event are defined. You could select TSQL_Duration from the templates, and select the save to file option to save the captured event information into a trace file (.trc) that you can analyze later, or replay in SQL Server Profiler. It could be your best choice. For how to monitor a query using SQL Server Profiler, please refer to the article: http://solutioncenter.apexsql.com/monitor-sql-server-queries-find-poor-performers-sql-server-profiler/

    Dynamic management views and functions provide information needed to find most expensive queries currently running, or recently executed where their execution plan is still in the cache. You could chose sys.dm_exec_sql_text and sys.dm_exec_query_stats. sys.dm_exec_query_stats provides information about average, minimal, maximal, and total processor time used by the plan, and sys.dm_exec_sql_text provides text of the SQL batch. For how to monitor a query using Dynamic management views and functions, please refer to the article: http://solutioncenter.apexsql.com/monitor-sql-server-queries-find-poor-performers-dynamic-management-views-and-functions/

    SQL Server Activity Monitor could list the recent expensive queries. To view Activity Monitor, the SQL Server login must have the VIEW SERVER STATE permission. For how to monitor the query using SQL Server Activity Monitor, Please refer to the article: http://solutioncenter.apexsql.com/monitor-sql-server-queries-find-poor-performers-activity-monitor-and-data-collection/

    Regards,
    Michelle Li

    • Marked as answer by TheBrenda Thursday, January 8, 2015 12:30 PM
    Thursday, January 8, 2015 9:25 AM

All replies

  • Did you have a look at sp_lock and wait stats dmvs?

    also profiler

    https://www.simple-talk.com/sql/performance/how-to-identify-slow-running-queries-with-sql-profiler/


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Thursday, December 11, 2014 5:57 PM
    Answerer
  • Do the following:

    1.

    >We have one client, clientX, where the etlv job takes about 7 hours

    Have development optimize for a run time 1/2 hour. You don't want a 7 hours job in any IT shop because it bites you sooner or later.

    2. Eliminate missing indexes, REBUILD indexes on schedule, UPDATE STATISTICS nightly:

    http://www.sqlusa.com/articles/query-optimization/

    http://www.sqlusa.com/bestpractices/sargable/

    3. Optimize memory & disk resources:

    http://www.sqlusa.com/bestpractices/memory-configuration/

    http://www.sqlusa.com/bestpractices/configureharddisk/




    Kalman Toth Database & OLAP Architect SQL Server 2014 Database Design
    New Book / Kindle: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2014





    Thursday, December 11, 2014 6:28 PM
  • Start by looking at the estimated execution plan of the query - it should shed some light about any missing indexes or parts that need tuning. The variation of timings may be parameter sniffing problem - if this is new to you, read up the literature on it. You can also turn on some options to trace time for individual statements within the SP - set statistics time on. Also, IO load of each of the statements within the SP can be measured by turning on the option "set statistics io on" 

    Satish Kartan www.sqlfood.com

    Friday, December 12, 2014 4:56 AM
  • We have a nightly ETLV job that our 30 clients run every night. Depending on the size of the client, it can run from 1 to 7 hours. We have one client, clientX, where the etlv job takes about 7 hours. But about 3 times a month, it takes 20 hours. Looking at the logs, it is always the same proc, procx, that is the issue. Normally procx runs at clientx for 35 minutes. But these 3 times a month it runs 838, 839, 828, 835, or 825 minutes. And then procx finishes successfully! There does NOT seem to be a pattern on day of week, time of day procx starts, time of day procx finishes, amount of data, etc. But it is so strange that if it does not finish in 35 or so minutes, it finishes in 832 +- 7 minutes.

     Yesterday when the etlv at clientx was running over by several hours, i could tell by our log that it was in procx. I did a sp_who2 and there was nothing in the BlkBy column for any SPID. Unfortunately I am not good with diagnosing performance problems and do not remember exactly what sp_who2 said about procx. Know that it was not taking any CPU. Believe that the Status was Suspended. The Command said Update.

     What can I do so that in the future so that I can better diagnose the issue? I know that it is procx which has 3 insert and 10 update statements to the same db. I do not want to do something that will impact the entire etlv job, which is already pushing its 7 hour limit.

    I want some diagnosis steps that I can run if I am called while procx is running. And I would like some diagnosis steps that I can setup in advance because many times I am just told that "oh, yesterday we had that issue again.'

     Thanks

    >> I know that it is procx which has 3 insert and 10 update statements to the same db. I do not want to do something that will impact the entire etlv job, which is already pushing its 7 hour limit.

    Right now you are asking about monitoring, this is a great question, but remember that there is an option that we could help you in improving your proc as well, and maybe change this 7 hours to 7 minute (This is not just numbers, but improvements that I have seen a lot, and even one time 1 hour execution improved to 4 second), on the same time maybe we can't :-)

    This is a different question, but If you want us to take a look on your basic issue, then we need more information, which will let us reproduce your issue.

    >> I want some diagnosis steps that I can run if I am called while procx is running. And I would like some diagnosis steps that I can setup in advance because many times I am just told that "oh, yesterday we had that issue again.'

    in addition to other great responses

    * Performance Dashboard Reports

    * in SQL Server 2014 we have a new Dynamic Management View to track what a query is actually doing at a physical operator level, like the sys.dm_exec_query_profiles, which is monitors real time query progress while the query is in execution.

    * Setting SET STATISTICS PROFILE or SET STATISTICS XML to ON into the Query

    * Create an Extended Events Session and add sqlserver.query_post_execution_showplan as a target event.

    *** Check those link as well:

    http://www.sqlshack.com/performance-dashboard-reports-sql-server-2014/

    http://www.mssqltips.com/sqlservertip/1949/sql-server-sysdmoswaitstats-dmv-queries/


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]

    Friday, December 12, 2014 11:46 AM
  • Have a very large ETL process. There is one sproc, that executes one truncate, 3 inserts, and then 5 updates all to the same table. This sproc finishes in under 35 minutes 95% of the time. But 5% of the time it takes 8+ hours. I only get notified after the fact that "yesterday it ran 8+ hours again". What can i turn on to help diagnose the issue. Whatever I turn on cannot increase execution time for the ETL process. So obviously I want to narrow it down to the table, or the query, etc. And it would have to be saved so that i can look at it the next day to see what happened.

    Thanks.

    • Merged by Shanky_621MVP Thursday, January 8, 2015 9:36 AM duplicate
    Monday, January 5, 2015 9:25 PM
  • Have a very large ETL process. There is one sproc, that executes one truncate, 3 inserts, and then 5 updates all to the same table. This sproc finishes in under 35 minutes 95% of the time. But 5% of the time it takes 8+ hours. I only get notified after the fact that "yesterday it ran 8+ hours again". What can i turn on to help diagnose the issue. Whatever I turn on cannot increase execution time for the ETL process. So obviously I want to narrow it down to the table, or the query, etc. And it would have to be saved so that i can look at it the next day to see what happened.

    Thanks.

    • Merged by Shanky_621MVP Tuesday, January 6, 2015 9:59 AM duplicate
    Monday, January 5, 2015 9:25 PM
  • If you use SSIS you probably are low on memory so the dataflow swaps to disk, what makes it very slow. Improve your lookups and use nonblocking transformations if you can. Increase the memory on the server if you cannot improve the ETL performance.
    Monday, January 5, 2015 9:40 PM
  • My question is how to monitor a table/query/proc without affecting the overall run time of the job. I just mentioned ETL to let people know that the problem is a very small subset of a much larger job.
    Monday, January 5, 2015 11:01 PM
  • it is a similar question. but that discussion got off subject of how to set up some type of monitoring for a small subset of work that will not affect the overall run time.

    I have looked at the execution plan. again, 95% of the time everything is great. but just occasionally, one of the queries in the sprocs just runs off the rails.

    Monday, January 5, 2015 11:03 PM
  • Hi TheBrenda,

    As your description, you want to monitor the long running query. Based on my research, SQL Server Profiler, Dynamic Management Views and Function, SQL Server Activity Monitor could be your choices.

    SQL Server Profiler provides a graphical user interface to create, manage, analyze, and replay SQL traces. SQL Server Profiler provides several built-in templates where tracked events and columns for each event are defined. You could select TSQL_Duration from the templates, and select the save to file option to save the captured event information into a trace file (.trc) that you can analyze later, or replay in SQL Server Profiler. It could be your best choice. For how to monitor a query using SQL Server Profiler, please refer to the article: http://solutioncenter.apexsql.com/monitor-sql-server-queries-find-poor-performers-sql-server-profiler/

    Dynamic management views and functions provide information needed to find most expensive queries currently running, or recently executed where their execution plan is still in the cache. You could chose sys.dm_exec_sql_text and sys.dm_exec_query_stats. sys.dm_exec_query_stats provides information about average, minimal, maximal, and total processor time used by the plan, and sys.dm_exec_sql_text provides text of the SQL batch. For how to monitor a query using Dynamic management views and functions, please refer to the article: http://solutioncenter.apexsql.com/monitor-sql-server-queries-find-poor-performers-dynamic-management-views-and-functions/

    SQL Server Activity Monitor could list the recent expensive queries. To view Activity Monitor, the SQL Server login must have the VIEW SERVER STATE permission. For how to monitor the query using SQL Server Activity Monitor, Please refer to the article: http://solutioncenter.apexsql.com/monitor-sql-server-queries-find-poor-performers-activity-monitor-and-data-collection/

    Regards,
    Michelle Li

    • Marked as answer by TheBrenda Thursday, January 8, 2015 12:30 PM
    Thursday, January 8, 2015 9:25 AM
  • Hi Brenda,

    What is point in raising 3 same threads and wasting people time. This is similar third thread. This is really a bad practice on forum. If you do not get answer please unmark it on very first thread, which I am sure you got on very first thread. but I cannot see any effort put by you. 

    Why cant you wait at night in your office and see what is actually causing the Job to run for 8 hrs. You can as well assign a night duty to a person to monitor the job.

    Moreover since you did not posted the code we can just give you all plausible solution and you would have to find the root cause dont expect us to provide full and final solution from here virtually. Forum is to assist people in best possible way we do not always guarantee to provide solution and more over we expect OP to put as much effort as he/she can

    Thanks for understanding , Now Is the problem still there ? Can you post complete code ? How did you monitor ? what were the findings ?


    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My Technet Wiki Article

    MVP

    Thursday, January 8, 2015 9:48 AM
  • My question is not about the code. my question is how to set up some monitoring for some code that sporadically acts up. We write the code, and then send to the client. This is running at a client site and I do not have access. Also, it runs well 95% of the time. And I have looked at the execution plan at the client site and it was fine, when I looked at it.

    Thursday, January 8, 2015 12:31 PM
  • Michelle Li,

    I do not want to affect the run time of the ETL process, or any other process that runs. Which of these three will allow me to pinpoint to a specific table, sproc, or query?

    Thursday, January 8, 2015 12:33 PM
  • My question is not about the code. my question is how to set up some monitoring for some code that sporadically acts up. We write the code, and then send to the client. This is running at a client site and I do not have access. Also, it runs well 95% of the time. And I have looked at the execution plan at the client site and it was fine, when I looked at it.

    Its not that easy upfront. You have to spend some time again ONLY mark answer if you are totally satisfied with solution. For your case you would have to either setup server side trace or extended events trace.

    Did you tried to correlate about any jobs which run at same time (may be weekly/montly) which might interfere with execution of your current job. Like I said you can also setup a team or One or two in rotation to monitor job if it suits you. Profiler is not going to help if you are not aware when issue might occur.

    try to draw a pattern by looking at job history you would find some clue about date or week when job is running slow.

    As you said your query is fine but performs bad at certain time which makes me think abut contention/blocking. What is version and edition of SQL Server


    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My Technet Wiki Article

    MVP


    Thursday, January 8, 2015 12:43 PM
  • > I do not want to affect the run time of the ETL process, or any other process that runs

    Using SQL Trace or XEvents to capture the batch-level and rpc-level events along with the CPU, duration, reads and writes should help you pin it down with minimal overhead.  An ETL process usually consists of a small number of expensive queries, and so this kind of tracing (while never very expensive) is especially cheap.  In fact it may be so cheap that you can capture the actual execution plan for every query against your database by tracing the Showplan XML Statistics Profile event, although I would test that that level of tracing doesn't affect performance and filter it as much as possible.

    These will capture events to a file, and you can analyze them offline.

    Once you have identified the queries that are causing the performance problem, you may want to adjust your trace to selectively gather execution plans.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Thursday, January 8, 2015 1:54 PM