none
some queries with runnable state

    Question

  • hello,

    i am facing very weired problem on my production which is,some query sent from web application captuerd as runnable for over 10 hour untill i killed them manuelly... in spite of most of time working normally 

    i had the same issue with scheduled job which was running in 2 sec and for no reason start running in 7 hour and i had to kill the syntax this job runing manuelly as well.. and the job back again for normal time 2 sec without do any action from my side!!!!!!

    any suggestion please.. 

    i have sql 2008 x64 clustered on windows 2008 IA

    Saturday, February 18, 2012 10:02 AM

Answers

  • Thank you Harsh,

    i have applied the missing index on the table involved in query and is working normally now ,

    this job under monitroing for a couple of days to see if the same problem will happen again , will update my post in case smth happen..

    Thanks

    Sunday, March 04, 2012 10:19 AM

All replies

  • First don't kill the session's you'll never figure out the root cause doing that. A runnable status means that the task is waiting for its turn to execute on the CPU scheduler so you need to look at why it was waiting to begin with.  Look at the wait_type and last_wait_type columns in sys.dm_exec_requests for the session_id, and you will see what the task was waiting for and that would be the first place to start.

    How many CPU's are in the server?  How many concurrent tasks does the normal workload entail?


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    • Proposed as answer by Iric WenModerator Monday, February 20, 2012 8:35 AM
    • Unproposed as answer by SQL Kitchen Monday, February 20, 2012 12:39 PM
    • Proposed as answer by SivaReddyG Thursday, February 23, 2012 10:35 AM
    • Unproposed as answer by SQL Kitchen Thursday, February 23, 2012 10:43 AM
    Saturday, February 18, 2012 4:08 PM
    Moderator
  • actually , i had to kill it  plus the query was very simple select * from .... where !!!

     server CPUs are 4

    to be honest this server is my nightmare since i have a lot of process running on that server plus the mixing of function don't give me any chance to standrize the configuration (i.e:same database working online for saerch and upoding data and users quiers ) and aroud another 60 db on the same node with same behaviours

     

    Saturday, February 18, 2012 6:12 PM
  • Runnable means it is waiting to execute on the CPU scheduler.  If it were Suspended, then you'd have another problem, but with it being runnable, you likely don't have enough CPU cores or fast enough processors to meet the workload.

    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    • Proposed as answer by Iric WenModerator Monday, February 20, 2012 8:35 AM
    • Unproposed as answer by SQL Kitchen Monday, February 20, 2012 12:40 PM
    Saturday, February 18, 2012 6:51 PM
    Moderator
  • Jonathan Kehayias is right. You should not kill the process unless & untill you have root cause. Following few questions may help you for further debugging?

    1) How frequent problem occurs? Do you see any pattern?

    2) Is it occurring at particular time? (It might happen that at particular time there are several jobs are running which is taking more CPU time)

    3) Did you check for Deadlock?

    4) Did you check performance counder?

    5) Did you run Profiler to see whats going on?

    Hope this helps...!!!


    Please click “Mark as Answer” if this post answers your question and click "Vote as Helpful" if this Post helps you.

    Saturday, February 18, 2012 7:38 PM
    • The frequency very dynamic and happen any time throughout the course of a day, even when server activities are less....
    • the highest count is CPU
    • since the problem no running all the time i don't have particular time to run a profiler and issue happen on diffrent databases so won't be possible to run profiler for extended period of time to captuer all activities on all databases (around 60 DBs) analysis such trace would be very difficult

    thanks in advance


    • Edited by SQL Kitchen Sunday, February 19, 2012 8:07 AM
    Sunday, February 19, 2012 8:06 AM
  • When Problem occurs, check Activiry monitor (Mainly Process & Expensive Queries).

    Additionally you may find following link useful.

    http://support.microsoft.com/kb/298475


    Please click “Mark as Answer” if this post answers your question and click "Vote as Helpful" if this Post helps you.

    • Proposed as answer by Iric WenModerator Monday, February 20, 2012 8:35 AM
    • Unproposed as answer by SQL Kitchen Monday, February 20, 2012 11:03 AM
    Sunday, February 19, 2012 8:34 AM
  •  

     It will be good to capture the select * from sys.sysproceses after every 15 seconds till the issue gets reproduced. Atleast, we will have data to analyze and check the issue.

     


    Kind regards| Harsh Chawla | Personal Blog:- SQL-blogs
    |Team Blog:- Team Blog

    Monday, February 20, 2012 5:41 AM
  • Hello,

    the porblem occured again toady and i didn't kill the process as recommened and here the status of last wit type ... since no wait type while it runnable

    last wait type is :SOS_SCHEDULER_YIELD

    kindly be informed that process was running in 4 min max before currently it is running since 2 hour with no any oupout??

    Monday, February 20, 2012 11:06 AM
  • Check your power configuration settings in Window Server and make sure that your system setup in High Performance mode and not Balanced.

    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Monday, February 20, 2012 3:58 PM
    Moderator
  • Thank you Jonathan , any refrences i can use ?

    thanks in advance

    Monday, February 20, 2012 5:14 PM
  • Adding my 2 cents : - The query seems to be CPU intensive as SOS_scheduler_yield is the waittype. It's not waiting on any locks etc and just waiting to get scheduled on the CPU.  

      After checking what Jonathan recommended, It'll be good to check the execution plan of the query(compile time atleast) and see if there are any full table scans etc. it's highly possible that some parameter is causing the bad plan .  If possible, please share the execution plan.

       

     


    Kind regards| Harsh Chawla | Personal Blog:- SQL-blogs
    |Team Blog:- Team Blog

    Tuesday, February 21, 2012 5:06 AM
  • Hi Harsh ,

    What you have stated makes total sens, but what is the case of sometimes queries working very smoothly and perfromance very good and sometime take that time which 100 double original duration

    thanks


    Tuesday, February 21, 2012 10:16 AM
  •  What I can think of is , some parameter in the where clause causing the bad plan. compile time plan whilst the query is running will be good to start with.

    run this query to get the execution plan at the time of the issue:

    select b.query_plan,a.* from sys.dm_exec_requests a cross apply sys.dm_exec_query_plan(a.plan_handle)  b
    where a.session_id = <session_id>


    Kind regards| Harsh Chawla | Personal Blog:- SQL-blogs
    |Team Blog:- Team Blog

    Tuesday, February 21, 2012 10:53 AM
  • Hi Jonathan,

    Unfortunately the job which was working normally 2 weeks now, I just found the same issue
    even after change the power configuration..

    any Suggestion

    Thanks



    • Edited by SQL Kitchen Thursday, February 23, 2012 7:12 AM
    Thursday, February 23, 2012 7:11 AM
  • Hi Jonathan,

    Unfortunately the job which was working normally 2 weeks now, I just found the same issue
    even after change the power configuration..

    any Suggestion

    Thanks



    Look at sys.dm_exec_requests to see what the last_wait_type was for the job.  Also look at the execution plan and try running the code manually with OPTION(RECOMPILE) to see if the problem goes away, which would be a sign that you have a parameter sensitivity issue as mentioned by Harsh Chawla.

    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Thursday, February 23, 2012 7:14 AM
    Moderator
  • am attaching here the exeution plan of this query ,

    What do u think ?

    Thursday, February 23, 2012 10:33 AM
  • Seems like RID lookup seems to be costly in the entire execution plan which can possibly be sorted out by including the column in the clustered index(if any). Do you see the same execution plan when the query runs normal?


    Kind regards| Harsh Chawla | Personal Blog:- SQL-blogs
    |Team Blog:- Team Blog

    Thursday, February 23, 2012 11:07 AM
  • Without all the other information from the plan I don't think much about the plan.  The % costs don't tell you enough information, there are actual cost valuse associated with the operations are what matters most and those are used to determine the % numbers shown by SSMS.  I don't know if that is 72% of a plan cost of 6 or 6000. 

    I would look at the column being filtered on for the IndexScan that has the 25% cost and check the data distribution for the value the plan is based off of, versus the value currently being used.  It is likely that the plan is based off a value with a relatively small number of occurrences which makes the RID Lookup an efficient way of doing things.  However, if there are hundreds or thousands of rows for another value, a Table Scan would be much faster.  This is typically the problem when you have parameter sniffing of the plan:

    http://www.simple-talk.com/sql/t-sql-programming/parameter-sniffing/

    Why are all the tables here Heaps?  Typically unless you have better performance for most workloads from having a clustered index on the appropriate column for most tables:

    http://msdn.microsoft.com/en-us/library/cc917672.aspx


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Thursday, February 23, 2012 1:59 PM
    Moderator
  • Hi Harsh,
    i have caught compile time plan whilst the query is running and i start compare to execution plan above and seems diff ? whar do u think ?

    here is

    Saturday, February 25, 2012 8:24 AM

  • Could you please upload the

    1.  compile time plan of the query during problem time (as the query doesn't finish and compile time plan is the only option here so, showplan all)
    2.  Run time plan of the query when it runs normal. (showplan statistics profile )

    somewhere and share the link.

    As mentioned by Jonathan, we will need to have a look at the parameters, no. of rows etc. to find the exact problem.

    But roughly, there is a difference in parallelism of the query. In the good run, parallelism is not there. I remember working on a scenario, where we found the issue to be due to intera - parallel thread waits.

    Could you please share, how many CPUs have you got and whats the Max degree of parallelism for SQL? ( Just to verify if the configuration is alright)

    If you really want to dig in further, you could configure PSSDIAG with just one script : - SQL 2005 Perf stats.  I had wrriten one blog on the configuration of PSSDIAG: - http://sql-blogs.com/2012/02/18/configure-pssdiag-to-capture-high-cpu-issues/

    Also, you could run the query  select * from sys.dm_os_waiting_tasks at time when the issue reoccurs and look at the column blocking_ecid to get more clarity on whats going on.


    Kind regards| Harsh Chawla | Personal Blog:- SQL-blogs
    |Team Blog:- Team Blog

    Monday, February 27, 2012 6:27 AM
  • Thank you Harsh,

    i am currently collecting all details can help in pinpoint this issue as stated ,

    will get back to you by this time tommorw, as this problem run once at 3 AM to captuer all details requierd during the problem running ..

    Monday, February 27, 2012 1:12 PM
  • Hi,

    In the last 2 days I was working on that issue and what suprissed me really , that i have found another job running the same query with execution duration 2 min ?? while this job still running with no end !!!

    i have compared the execution plan and compile plan they are the same ..

    Harsh,

    here are answers for you inquiers,

    • both plan attched in above posts
    •  4 CPUs and my MAXDOP set to 1
    • while problem running i used this select * from sys.dm_os_waiting_tasks and was nothing there

    one more thing that, I found missing index and as DTA suggestion it would improve the proformance 96%

    finally , when kill the process the job reported succsfully completed and job done succssefuly !! which mean it wasn't do anything ???

    appriciate your support



    • Edited by SQL Kitchen Wednesday, February 29, 2012 10:28 AM
    Wednesday, February 29, 2012 9:54 AM
  • If the maxdop is 1, the query shouldn't go for parallelism unless we are using query hint for maxdop. It's difficult to find out the issue without some data. Could you please run the PSSDIAG as mentioned in my previous answer?

    Also, try creating the index recommended by DTA and check if the issue gets reproduced. Also, run the command to find the missing indexes and check if you see any recommendation for the table being used in the query: -

    DECLARE @runtime datetime

    DECLARE @cpu_time_start bigint, @cpu_time bigint, @elapsed_time_start bigint, @rowcount bigint

    DECLARE @queryduration int, @qrydurationwarnthreshold int

    DECLARE @querystarttime datetime

    SET @runtime = GETDATE()

    SET @qrydurationwarnthreshold = 5000

    PRINT ”

    PRINT ‘===============================================================================================’

    PRINT ‘Missing Indexes: ‘

    PRINT ‘The “improvement_measure” column is an indicator of the (estimated) improvement that might ‘

    PRINT ‘be seen if the index was created.  This is a unitless number, and has meaning only relative ‘

    PRINT ‘the same number for other indexes.  The measure is a combination of the avg_total_user_cost, ‘

    PRINT ‘avg_user_impact, user_seeks, and user_scans columns in sys.dm_db_missing_index_group_stats.’

    PRINT ”

    PRINT ‘– Missing Indexes –’

    SELECT CONVERT (varchar, @runtime, 126) AS runtime,

    mig.index_group_handle, mid.index_handle,

    CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS improvement_measure,

    ‘CREATE INDEX missing_index_’ + CONVERT (varchar, mig.index_group_handle) + ‘_’ + CONVERT (varchar, mid.index_handle)

    + ‘ ON ‘ + mid.statement

    + ‘ (‘ + ISNULL (mid.equality_columns,”)

    + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ‘,’ ELSE ” END + ISNULL (mid.inequality_columns, ”)

    + ‘)’

    + ISNULL (‘ INCLUDE (‘ + mid.included_columns + ‘)’, ”) AS create_index_statement,

    migs.*, mid.database_id, mid.[object_id]

    FROM sys.dm_db_missing_index_groups mig

    INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle

    INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle

    WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10

    ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC

    PRINT ”

    GO

    If possible, please share the execution plan on skydrive or some fileshare location on web and share the path with me.


    Kind regards| Harsh Chawla | Personal Blog:- SQL-blogs
    |Team Blog:- Team Blog

    Thursday, March 01, 2012 7:09 AM
  • appriciating your support Harsh .. thank you

    could you please repost above script as T-SQL script

    Thursday, March 01, 2012 4:10 PM
  •  

    I think there was some error in formatting. please check this one:

    DECLARE @runtime datetime

    DECLARE @cpu_time_start bigint, @cpu_time bigint, @elapsed_time_start bigint, @rowcount bigint

    DECLARE @queryduration int, @qrydurationwarnthreshold int

    DECLARE @querystarttime datetime

    SET @runtime = GETDATE()

    SET @qrydurationwarnthreshold = 5000

    PRINT ''

    PRINT '==============================================================================================='

    PRINT 'Missing Indexes: '

    PRINT 'The "improvement_measure" column is an indicator of the (estimated) improvement that might '

    PRINT 'be seen if the index was created. This is a unitless number, and has meaning only relative '

    PRINT 'the same number for other indexes. The measure is a combination of the avg_total_user_cost, '

    PRINT 'avg_user_impact, user_seeks, and user_scans columns in sys.dm_db_missing_index_group_stats.'

    PRINT ''

    PRINT '-- Missing Indexes --'

    SELECT CONVERT (varchar, @runtime, 126) AS runtime,

    mig.index_group_handle, mid.index_handle,

    CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS improvement_measure,

    'CREATE INDEX missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)

    + ' ON ' + mid.statement

    + ' (' + ISNULL (mid.equality_columns,'')

    + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL (mid.inequality_columns, '')

    + ')'

    + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,

    migs.*, mid.database_id, mid.[object_id]

    FROM sys.dm_db_missing_index_groups mig

    INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle

    INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle

    WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10

    ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC

    PRINT ''

    GO


    Kind regards| Harsh Chawla | Personal Blog:- SQL-blogs
    |Team Blog:- Team Blog


    Friday, March 02, 2012 6:55 AM
  • Thank you Harsh,

    i have applied the missing index on the table involved in query and is working normally now ,

    this job under monitroing for a couple of days to see if the same problem will happen again , will update my post in case smth happen..

    Thanks

    Sunday, March 04, 2012 10:19 AM