none
Recompilation and parameter sniffing

    Question

  • Before, I have posted this question to the wrong forum - mirroring. I apologize.

    In SQL server there is quite often a problem because of parameter sniffing. In average I have SP with at least 5 parameters.
    The first SP call stores execution plan for the subsequent calls. If you have luck, the first call is with typical parameters values and only atypical parameters values will have bad execution plan. But what if first call is with aTypical value, than you have most of the executions with bad plan.

    One of the biggest performance decrease is also because of sort warning(data is sorted on the disk), which is usually a result of a bad plan(it can happen also for some other reasons).
    So, how to deal with it?

    1. Create local variables or hint optimize for unknown. This will solve only the case, when first call is atypical, but still for atypical values will be bad plan (even typical values can have not so good plan as with recompile, but usually not big difference)

    2. Create more stored procedures for different parameter values. Since SP have 5 parameters in average, that would create so many combinations that it is almost impossible to deal with it. Even with one parameter is hard to deal with many different parameter values.

    3. Recompile SP every time.

    4. Dynamic SQL which stores different plan for every different parameter combination, but not for different parameter values inside the same combination.

    So, I have found out, that recompile every time is the best solution. Yes, it is CPU intensive, but still not so intensive as query with bad plan or I'm wrong?

    Suppose an example:

    SP is executed 100 times per second.  EP is in cache and 90% of executions have optimal plan and 10% of them have bad plan, which executes 100 times slower(I suppose that 100 times slower is some average).
    If I use dummy maths:

    90 + 10x100=1090

    So, it is the same time as 1090 executions of SP with good plan in cache.

    If recompilation time is 10 times greater than SP execution time then I get:

    100+100x10=1100

    It is the same time as 1100 executions of SP with good plan in cache.

    So, recompile time should be 10 times greater than SP execution time that plan cashing would benefit in this example.

    But in reality recompile time is often 10 times lower than SP execution time. I have checked couple of SP, but can't find any where recompile time would be even the same as SP execution time.
     (I'm not sure how to measure recompile time, so I have used: SET STATISTICS TIME ON and compare executions with recompile and without).
    I have found also an example when SP with recompile is faster than SP executed from cache - i don't know how it is possible but cache maintenance obviously takes some resources. But I have tested as one user. When many users there can be some difference.

    Recompile creates also schema stability lock(Sch-s), which could be the problem, when many users call the same SP at the same time.
    In definition of Sch-s lock is, that there can be many concurrent Sch-s locks and it is compatible with all other lock types except Sch-M lock type, which is very rare(only when the table is modified). So, many users can have lock at the same time.
    Or is there some different scenario at recompile, that one user must wait other user to free the lock?
    Than this can be a bottle neck, and then it depends of number of users, it still can be faster than cache. Maybe then option 4 would be the best. 

    I guess that default setting should be recompile every time for procedures with parameters and then only deal with case when this is not acceptable and causing some problems.
    I would like to hear what are others experience or opinions?

    Interesting reading:
    Elephant and mouse

    Thanks,
    Simon

    Monday, August 19, 2013 12:01 PM

All replies

  • Simon

    If it is  a default setting you will quickly fulfill the entire cache with execution plans which are used only once, why?

    Also recompile might take too long, we have a client who has a JOIN with tables and recompile takes almost 40 seconds, btw during the recompilation all processes where hung 

    http://rusanu.com/2013/08/01/understanding-how-sql-server-executes-a-query/


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance

    Monday, August 19, 2013 1:01 PM
    Answerer
  • Uri,

    the default setting shouldn't put EP in cache and SQL wouldn't bother with cache at all.
    And if you create procedure WITH RECOMPILE, than there is no plan in cache.

    "Also recompile might take too long"
    I know it can happen, but I'm speaking for general usage. For example:
    I have 100 SP with parameters and maybe 5 of them are problematic for recompiling. While on the other hand, all of them can execute with bad plan.
    So, it is better do deal with only 5 of them and make those 5 cacheable and use some hints, local variables and so on...rather than deal with 100 of them because of sort warnings and wrong index usage....

    Only if Sch-s locks would be a problem when SP is executed by many users at the same time I would stay with cache. I'm not sure about that.

    Here it writes something about compile locks:
    Compile locks
    "In Microsoft SQL Server, only one copy of a stored procedure plan is generally in cache at a time."

    If SP is created: CREATE dbo.myProcedure WITH RECOMPILE AS..., than there won't be any copy of a stored procedure plan in cache. So, I guess recompile this way won't have any compile lock since there is no cache at all?

    br, Simon


    • Edited by simonxy Monday, August 19, 2013 1:51 PM add something
    Monday, August 19, 2013 1:21 PM
  • There is no straight answer for this question, and the common answer can easily be used here (It depends!).

    - How long the sp takes to execute?

    - How frequently do you execute it and if you do it concurrently?

    - How frequently the typical / atypical parameters are used?

    - Is it ok (response time is aceptable) to use the "All Density" instead the histogram?

    You need to evaluate your environment and try to use the options that better serve you.

    Take for example a DW environment, targeting stored procedures used for the ETL process. Those sps usually take long time to execute, they are not executed concurrently most of the time, so compiling them before execution will not hurt timely wise, assuming statistics are up to date.

    On the other hand, less take a OLTP system with sps being called concurrently and at a high frequency. In this case compilation time could affect response time, but you have other options depending on your answers. One scenario could be that most of the time the users use typical parameters (you can also use them as part of the hint (OPTIMIZE FOR)) even though I tend to not use this option, so letting the sp take long run when atypical parameters are used will serve well this case. The same goes in opposite way, atypical parameters being used most frequently.


    AMB

    Some guidelines for posting questions...


    Monday, August 19, 2013 1:49 PM
    Moderator
  • Just to give my 2 cents.

    I have a stored proc which does heavy ETL work.  The stored proc takes 93 seconds to compile and runs in 15 seconds.  So I am very happy the stored proc is compiled and cached.

    Monday, August 19, 2013 3:48 PM
    Moderator
  • I have created some more tests with Adam Mechanic stress test. I have called quite complex procedure which has 20 parameters and inside are 8 queries with 8 execution plans, some of them looks quite complex. I have created 200 concurrent connections and calls to this SP at the same time
    (this is probably more than in production will ever happen in my application). Inside procedure are many joins also with tables with millions of records.
    The results are:
    1. with recompile:
        average execution time= 546 ms
        client avg time: 1310 ms

    2. without recompile(EP is always from cache, I have checked):
        average execution time= 538 ms
        client avg time: 851 ms

    So, CPU time is the same(interesting), but client time is greater - I don't know what is the reason, probably because of locking.

    I have always called SP with the same parameters and plan from cache was therefor optimal.
    If I would call procedure with different parameters values, than I will get also executions with bad plan in second case and probably the results would be much worse than in first case. I'll try to test also calls with different parameter values.

    br, Simon

    Tuesday, August 20, 2013 1:28 PM

  • 1. Create local variables or hint optimize for unknown. This will solve only the case, when first call is atypical, but still for atypical values will be bad plan (even typical values can have not so good plan as with recompile, but usually not big difference)

    To add my 2 cents: which values are atypical is not determined by the data in the table, but by the amount of queries that use a value that is either very common in the table or almost unique.

    Using local variables instead of parameters, or optimizing for unknown will result in a query plan that gives optimal performance for the worst case situation. So typically, this would be for values that are common in the table. If you ever use such values as parameters, then this is the safe choice.

    The optimizer does not create multiple plans based on values / value ranges (unlike DB2 I have heard). This means that there is no out-of-the-box solution. You could make a suggestion to Microsoft (at connect.microsoft.com) to better support situations like this, or vote for any existing suggestion to that effect.


    If the stored procedure that is called 100 times per second, then any hint that will force recompilation every time sounds like a very bad idea to me, because it is highly unlikely  that the plan should change anywhere close to this 100 times per second. If your query plan is trivial then it might not hurt a lot, but of course this very much depends on your query.


    If I understand you correctly, your situation is not so much focused on different values of the same parameter, but your problem seems to be one of optional parameters. If this is not the case, then please respond, because then what I am about to write (below) doesn't apply.

    If you are talking about 5 parameters, and this query is very important to you, and all of the parameters are optional, then I would definitely consider creating / generating 2^5 (=32) stored procedures which are identical (except for their name) and generate a "router" stored procedure that will be called from your application and that will call these 32 stored procedures based on which parameters are actually used. This way, each of the 32 stored procedures can benefit from parameter sniffing. Already in the very first second, this approach will outperform your "recompile" approach.


    Gert-Jan

    Wednesday, August 21, 2013 9:41 AM
  • Gert,

    thanks for your 2 cents.
    My situation is focused on different values of the same parameter.

    Can you explain a bit:
    "Using local variables instead of parameters, or optimizing for unknown will result in a query plan that gives optimal performance for the worst case situation"

    When optimizing for unknown it will use always the same estimated number of rows.

    If I simplify and suppose that you have different values of parameters equally disposed, you will not have enough memory for half of your queries and too much memory for other half.  And also bad plans - scan instead of seek for example.

    If you have parameter sniffing you can have similar results.

    So, either way is not optimal. If you have bad luck you can have even all executions with bad plan.

    On the other hand, recompilation will always(in most cases) produce good plan, always will have the right amount of memory(if I simplified again). And you don't have cache maintenance of execution plans, since they don't exists in cache. You will have a lot of less troubles.
    But for what cost?

    In SQL2000 there was a problem because of compile locks at procedure-level and recompilation should be avoided in many users scenario.
    In SQL2008, which I worked at the moment, there are only Sch-S locks because of recompilation, which are not problematic if I don't change objects at the same time(which is usually not the case).

    And in almost all procedures(in my database, which has some quite complex procedures), the recompilation time is almost negligible compare to execution time.
    And i believe that additional CPU cost because of recompilation is in most cases negligible compare to additional CPU and IO cost because of bad plans from cache - which can happen quite often.

    So, i think, that it is better for default setting to recompile all procedures with parameters(except for the ones you know that same plan is ok for different values) all the time than suffer from bad plans.

    Any then maybe repair only those which will suffer because of recompilation.

    So, I'm just asking what are yours experience. Do you agree or you have procedures with heavy recompilation and you prefer more caching which produce bad plans quite often.

    br, Simon

    Wednesday, August 21, 2013 3:08 PM
  • "Using local variables instead of parameters, or optimizing for unknown will result in a query plan that gives optimal performance for the worst case situation"

    When optimizing for unknown it will use always the same estimated number of rows.

    If I simplify and suppose that you have different values of parameters equally disposed, you will not have enough memory for half of your queries and too much memory for other half.  And also bad plans - scan instead of seek for example.

    You are right that one estimate will be used. When I say the worst case, this means the value with the highest number of estimated rows. This can result in a scan, while it might choose a seek for a low occurrence value when sniffing parameters.

    If it chooses a scan, then for that value, that is (assumed to be) the best query plan. So you will not have "not enough memory for half of your queries", although the memory claim for the other half might indeed be over-sized. You will not have poor performance for that value, only potentially suboptimal performance for other values.

    Like I said, this is the out-of-the-box behavior. If you only use one estimate, then the optimizer approach is the best. The alternative is much worse for an individual query, because typically, the performance degradation for a "seek" query plan when it should have been a scan is much bigger than the degradation of a "scan" when it should have been a seek, typically orders of magnitude.

    So, in practice, adding a recompilation hint is the easy fix. And I think I said it before, if your query only needs a trivial query plan, then compilation cost will be low, which would make it an ideal permanent fix. So as usual with performance related questions, it really depends on your situation. And as always, you should test. In the end, tests will answer all your questions.

    If the cost of continuous recompilation is too high for your situation, then you could consider to manually optimize the situation. You would have to figure out for what number of value occurrences the optimal query plan changes. Then you would have to find a way to distinguish these two categories of values, and put that is a "managing" stored procedure. That stored procedure could then call two identical stored procedures (your original stored procedure, but duplicated), one for each group. You can rely on the optimizer to use parameter sniffing to find the correct query plan for each of the two stored procedures.

    This approach will add cost to your query execution, and adds complexity to your code base. How much depends on how you distinguish the categories. Again, whether this is worth it depends. It will always be a tradeoff. If you consider this, you should test whether it outperforms other approaches, and whether it is worth the extra complexity and future risks.



    Gert-Jan

    Wednesday, August 21, 2013 4:33 PM
  • Thank you Gert.
    "When I say the worst case, this means the value with the highest number of estimated rows. "

    I think it is not true. Example:

    If you create procedure in Adventure works database:

    ALTER PROCEDURE [dbo].[test] (@pid int)
    AS
    SELECT * FROM Sales.SalesOrderDetail
    WHERE ProductID = @pid
    OPTION (OPTIMIZE FOR UNKNOWN)

    and execute it: 

    EXEC dbo.test 709

    you will see, that estimated number of rows is 456, while actual number of rows is 188.

    If you look at statistics:

    DBCC SHOW_STATISTICS('Sales.SalesOrderDetail', IX_SalesOrderDetail_ProductID)


    You will see, that all density is 0,003759399. And if i multiply it with all records then estimated number of rows is: 0,003759399 * 121317=456. It is greater than 188, so enough memory will be granted.

    But if you execute the procedure for different product:

    EXEC dbo.test 712

    you will see that actual number of rows is 3382, while estimated is only 456. So, it could happen that not enough memory would be granted.

    What it is interesting here:

    The table has computed column. The optimizer extract all columns from table and use compute scalar to calculate computed column for all rows and after that, it use filter by productID to extract only the requested rows.

    I don't understand why it don't use filter first and then use compute scalar only on small number of rows.

    If you change procedure to:

    ALTER PROCEDURE [dbo].[test] (@pid int)
    AS
    SELECT SalesOrderID, CarrierTrackingNumber FROM Sales.SalesOrderDetail
    WHERE ProductID = @pid
    OPTION (OPTIMIZE FOR UNKNOWN)

    you will better see that this query would suffer from not enough memory.

    Can you confirm that there is only Sch-S lock at recompilation if procedure is created with recompilation?

    Otherwise i agree with everything you have written.

    br, Simon

    Thursday, August 22, 2013 8:50 AM
  • EXEC dbo.test 709

    you will see, that estimated number of rows is 456, while actual number of rows is 188.

    I am getting a clustered index scan in the OPTIMIZE FOR UNKNOWN situation, so I am getting an estimation of 121317.

    If you look at statistics: You will see, that all density is 0,003759399. And if i multiply it with all records then estimated number of rows is: 0,003759399 * 121317=456. It is greater than 188, so enough memory will be granted.

    First of all, I have no way of establishing the memory claim. I do see the estimated number of rows and the estimated row size, but AFAIK the execution plan does not report memory claims.

    Secondly, both the index scan as well as in the index seek scenario, every operator in the query plan is a streaming operator. In other words, every operator only requires memory to process a single row. There is no spool or sort (or whatever) in the query plan for which a serious memory claim is relevant / needed.

    What it is interesting here:

    The table has computed column. The optimizer extract all columns from table and use compute scalar to calculate computed column for all rows and after that, it use filter by productID to extract only the requested rows.

    I don't understand why it don't use filter first and then use compute scalar only on small number of rows.

    Query plans can be hard to read. I don't know if computed column is actually calculated before the filter is applied. Note that in reality, the query plan is driven from left to right, for each individual row that applies (see Paul White's article).

    If you change procedure to ... you will better see that this query would suffer from not enough memory.

    I'd love you to tell me how I can check the memory claim and when the memory was not sufficient.

    Can you confirm that there is only Sch-S lock at recompilation if procedure is created with recompilation?

    In the snapshot I took, I only saw shared key locks. Most likely this snapshot was after the (re)compilation. If I lock the table exclusively in another process, and then run the OPTION (RECOMPILE) version, I see an intend shared table lock that is being blocked.


    Gert-Jan

    Thursday, August 22, 2013 7:54 PM
  • "I am getting a clustered index scan in the OPTIMIZE FOR UNKNOWN situation, so I am getting an estimation of 121317."

    Gert, this is because of computed column. It first does scan to calculate all computed columns. If you put mouse on the last step of execution plan you will see, that estimated number of rows is 456 while actual number of rows is 3382. And that could led to insuficient memory grant in some cases, I guess.

    If you remove computed column from procedure (my second procedure DDL) you will get better picture.

    "I'd love you to tell me how I can check the memory claim "
    Put execution of procedure into loop in one connection:

    while 1=1
    begin
    exec [dbo].[test] 709
    end

    and then is second window run the following query:

    SELECT * FROM sys.dm_exec_query_memory_grants WHERE [session_id] = @spId--you process id of loop

    And you will see, that granted memory is 12976, while used memory was only 16 kb.
    So memory in this case was overestimated for 811 times, which is really bad solution.

    If you replace 709 with 712, than used memory is 208 kb, 15 times gretaer.
    In both cases there were the same estimation, but very different memory consumption.
    But in both cases there was enough memory, as you said, in fact, memory was overestimated so many times, that queries with optimize for unknown would probably lead to disaster on server with many users.

    If you run procedure with recopile, you will see, that granted memory for 709 is only 1024 kb, which is much better.

    If you use parameter sniffing and execute the SP first with 712 and then with 709, you will get the same result as in case with OPTIMIZE FOR UNKNOWN, so overestimation for 811 times.

    Probably not the best example, but it can be seen that with recompile each time, the memory pressure is much lower and due to better plans also CPU time can be much lower(even if recompilation cost some CPU).

    Shared lock is because of select inside the procedure not because of recompilation. Even if you remove WITH RECOMPILE you will see the same shared locks.

    I have checked. When recompiling, SQl puts Sch-S locks on all objects included inside SP
    (procedure, table, constraint, index,...). After recompile it puts shared locks as in the case without recompile. I couldn't find any other lock happened while recompiling.
    Is there some internal lock which is not seen with sp_lock or something else?

    Otherwise, Sch-S locks are concurrent and compatible with all others locks(execpt Sch-M) and therefore negligible for most of the cases.

    It seems to me that compile locks were making problems only on SQL 2000.

    br, Simon

    • Edited by simonxy Friday, August 23, 2013 9:48 AM addition explanation
    Friday, August 23, 2013 9:24 AM
  • "I'd love you to tell me how I can check the memory claim "
    SELECT * FROM sys.dm_exec_query_memory_grants WHERE [session_id] = @spId--you process id of loop

    Thanks for that, I'll have a look

    ... memory was overestimated so many times, that queries with optimize for unknown would probably lead to disaster on server with many users.

    Maybe, maybe not. You simply can't argue this position unless you have tested it for your situation.

    Memory is one important resource. But so is CPU, so i I/O, so is bandwidth, etc. When you push your server to the max, one of those resources will become a bottleneck. Which one depends on your hardware, your configuration, your workload, etc.  There is no one answer that is true for all situation. Or to sum it up in the two famous words: it depends.



    Gert-Jan

    Friday, August 23, 2013 2:44 PM