we have query which is running fine with temp variable

Answered we have query which is running fine with temp variable

  • Wednesday, March 20, 2013 3:22 PM
     
     
    we have query which is running fine with temp variable but it is taking time with temp table? temp table and temp variable contains 1200 rows while query. Why temp variable is better than temp table in sql server 2008?

All Replies

  • Wednesday, March 20, 2013 3:28 PM
     
     
    disk versus main storage. 

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

  • Wednesday, March 20, 2013 3:42 PM
     
     Answered
    Compare the plans - that is the only way.  Do not assume that there is a fixed rule.
  • Wednesday, March 20, 2013 3:48 PM
     
      Has Code

    HI Scott,

    Hi all.

    I'm currently writing an article for my blog entry and stumbling about a couriosity which maybe can be solved by the experts here. For the example I'm using the AdventureWorks2012 database.

    Situation:
    I'm using a table variable @DateRange and try to compare it's effect in a query plan against a temporary table called #DateRange. The code is as follows:

    SET NOCOUNT ON;
    GO
    
    DECLARE @StartDate date = '20080701';
    DECLARE @EndDate   date = '20080731';
    
    IF OBJECT_ID('tempdb..#DateRange') IS NOT NULL
    	DROP TABLE #DateRange
    
    CREATE TABLE #DateRange (OrderDate date PRIMARY KEY CLUSTERED);
    DECLARE	@DateRange TABLE (OrderDate date PRIMARY KEY CLUSTERED);
    
    WHILE @StartDate <= @EndDate
    BEGIN
        INSERT INTO #DateRange (OrderDate) VALUES (@StartDate);
        INSERT INTO @DateRange (OrderDate) VALUES (@StartDate);
        SET @StartDate = DATEADD(dd, 1, @StartDate)
    END;
    
    SET STATISTICS IO ON;
    
    SELECT	dr.OrderDate,
    		oh.SalesOrderID,
    		oh.PurchaseOrderNumber,
    		COUNT(od.SalesOrderDetailID)	AS	Positions,
    		SUM(od.LineTotal)				AS	TotalAmount
    FROM	@DateRange dr LEFT JOIN sales.SalesOrderHeader oh
    		ON	(dr.OrderDate = oh.OrderDate) LEFT JOIN sales.SalesOrderDetail od
    		ON (oh.SalesOrderId = od.SalesOrderID)
    GROUP BY
    		dr.OrderDate,
    		oh.SalesOrderId,
    		oh.PurchaseOrderNumber;
    
    PRINT '-------------------------------------'
    
    SELECT	dr.OrderDate,
    		oh.SalesOrderID,
    		oh.PurchaseOrderNumber,
    		COUNT(od.SalesOrderDetailID)	AS	Positions,
    		SUM(od.LineTotal)				AS	TotalAmount
    FROM	#DateRange dr LEFT JOIN sales.SalesOrderHeader oh
    		ON	(dr.OrderDate = oh.OrderDate) LEFT JOIN sales.SalesOrderDetail od
    		ON (oh.SalesOrderId = od.SalesOrderID)
    GROUP BY
    		dr.OrderDate,
    		oh.SalesOrderId,
    		oh.PurchaseOrderNumber;
    
    SET STATISTICS IO OFF;

    The example is quite simple. First I create a temporay table and declare a table variable which will be filled in a do-while-loop. Than I execute the same relations of AdventureWorks with both - temporary table and table variable. When I have a look to the execution plan I was really astonished about the result. I expected a better performance when using temporay table but a "common" difference of app. 5 - 10 % of faster execution... but the execution plan looks as follows:

    Please ignore the 21% of the execution costs - these are the costs for the insert-operations! When I check the IO i see that the query plan with the temporary table uses less IO than usage of the table variable. But in relation the query with temporary table is 12 times more expensive than the query with the table variable.

    SalesOrderDetail-Tabelle. scans 976, logical reads 3162, ...
    SalesOrderHeader-Tabelle. scans 31, logical reads 64, ...
    #B3ECD235-Tabelle. scans 1, logical reads 2, ...

    SalesOrderHeader-Tabelle. scans 31, logical reads 84, ...
    #DateRange-Tabelle. scans 1, logical reads 2, ...
    SalesOrderDetail-Tabelle. scans 1, logical reads 1246, ...

    Do you have an explanation for this - really extrem - difference in execution costs?
    Thank you all for your support...

    Why this is the difference?

  • Wednesday, March 20, 2013 3:51 PM
     
     Proposed

    It depends.  In some cases the table variable will be faster, in some cases the temp table.  Generally, the larger the table the more likely it is that the temp table will be faster.  And 1200 rows is about where you would normally expect to see the temp table being better.

    But as I said, it depends.  Why in this case the table variable is better is hard to say without much more information.  One possible cause is a temp table of that size will have statistics computed on it.  Computing the statistics costs some time, this cost would not be present in a table variable.  Often times, having statistics improves overall performance enough so that the cost of computing them is well worth it.  But sometimes the statistics are of no help, and rarely, they might even cause the Query Optimizer to choose a slower plan than it would without the statistics. Another difference is that changes to a temp table are written to the tempdb log.  Changes to table variables are not logged.

    So, in a different case you might find the temp table to be faster.  The only way to know for sure is to test.  Which you have done.  If this test was with production data or test data that is approximately the size and complexity of the expected production data, then congratulations.  You now know which is faster and if speed is your primary consideration in this case, you know which one to pick.

    My usual practice is to use a table variable if there are a very small number of rows (say less than 500 - 1000) and a temp table if there are more rows.  If the performance of that is adequate, great.  If the performance is not adequate, then I would try to make it better.  One thing I would be likely to try is switching to/from a table variable/temp table.

    One further comment, Joe Celko said the difference is disk versus main storage.  That is an oversimplification.  Both temp tables and table variables can be in memory and the both can be written to disk and need to be reread later.  As noted above, there is additional disk I/O due to logging with a temp table, but do not think that because it is a table variable, it must always be in memory.

    Tom

  • Wednesday, March 20, 2013 4:57 PM
     
     

    So which one is fastest? As in: which one has the lowest elapsed time?

    Because there are a few problems with they way you conducted your analysis.

    First of all, you included the estimated query plan. The percentages displayed there are estimates. The actual performance might be substantially different.

    Secondly, unfortunately one logical read is not the other logical read. For an index seek, a logical read locates one row. For an index scan, a logical read locates a page with rows. Because of that, it is not strange that the query plan with the two index scan has fewer logical reads than the query plan with one index scan.


    Gert-Jan

  • Wednesday, March 20, 2013 6:12 PM
     
     Proposed Has Code

    I agree with Gert-Jan.  The important question is almost always lowest elapsed time.  And while both estimated and actual execution plans can be helpful guides they are not always correct.  You usually want to look at elapsed time as the final authority on which is the best method in a particular case.  So, I took your example and made some changes to it.

    The changes are 1) I took out the loop and loaded the temp table and table variable from a calendar table I have - this has nothing to do with the question at hand, it's just the easiest/fastest way to load those tables, 2) instead of selecting the result and returning it to the client, I just performed a calculation that forces SQL to look at every column and every row it would need to do the original select.  I did this because what we are concerned with here is the time on the server and I didn't want my elapsed times affected by the transport between the client and the server 3) for both the temp table and the table variable I performed 50 loops - in each loop I freed the procedure cache and dropped the clean buffers so that each time the data in the AdventureWorks tables would have to be read from disk and not from memory.  Then for each loop, I got the time in milliseconds for that loop and summed those times. At the end of the 50 loops, I print out the total time.  Code is below

    use AdventureWorks2012
    go
    SET NOCOUNT ON;
    GO
    
    DECLARE @StartDate date = '20080701';
    DECLARE @EndDate   date = '20080731';
    
    declare @starttime datetime2, @endtime datetime2;
    declare @t int;
    declare @loopcounter int;
    declare @elaspedtime int;
    
    IF OBJECT_ID('tempdb..#DateRange') IS NOT NULL
    	DROP TABLE #DateRange
    
    CREATE TABLE #DateRange (OrderDate date PRIMARY KEY CLUSTERED);
    DECLARE	@DateRange TABLE (OrderDate date PRIMARY KEY CLUSTERED);
    
    INSERT INTO #DateRange (OrderDate) SELECT dt FROM AuxDB.dbo.Calendar WHERE dt BETWEEN @StartDate AND @EndDate;
    INSERT INTO @DateRange (OrderDate) SELECT dt FROM AuxDB.dbo.Calendar WHERE dt BETWEEN @StartDate AND @EndDate;
    
    set @elaspedtime = 0;
    
    set @loopcounter = 1;
    while @loopcounter <= 50
    begin
    DBCC FREEPROCCACHE
    DBCC DROPCLEANBUFFERS
    select @starttime = SYSDATETIME();
    WITH cte AS
    (SELECT	dr.OrderDate,
    		oh.SalesOrderID,
    		oh.PurchaseOrderNumber,
    		COUNT(od.SalesOrderDetailID)	AS	Positions,
    		SUM(od.LineTotal)				AS	TotalAmount
    FROM	@DateRange dr LEFT JOIN sales.SalesOrderHeader oh
    		ON	(dr.OrderDate = oh.OrderDate) LEFT JOIN sales.SalesOrderDetail od
    		ON (oh.SalesOrderId = od.SalesOrderID)
    GROUP BY
    		dr.OrderDate,
    		oh.SalesOrderId,
    		oh.PurchaseOrderNumber)
    SELECT @t = COUNT(DISTINCT DateDiff(day, OrderDate, '20000101')+Case When PurchaseOrderNumber Is Null Then 1 Else 0 End +SalesOrderID+Positions+TotalAmount) FROM cte;
    set @loopcounter = @loopcounter + 1
    select @endtime = SYSDATETIME()
    select @elaspedtime = @elaspedtime + datediff(ms, @starttime, @endtime);
    end
    select @elaspedtime as 'Elasped ms for 50 loops of table variable'
    
    set @elaspedtime = 0;
    
    set @loopcounter = 1;
    while @loopcounter <= 50
    begin
    DBCC FREEPROCCACHE
    DBCC DROPCLEANBUFFERS
    select @starttime = SYSDATETIME();
    WITH cte AS
    (SELECT	dr.OrderDate,
    		oh.SalesOrderID,
    		oh.PurchaseOrderNumber,
    		COUNT(od.SalesOrderDetailID)	AS	Positions,
    		SUM(od.LineTotal)				AS	TotalAmount
    FROM	#DateRange dr LEFT JOIN sales.SalesOrderHeader oh
    		ON	(dr.OrderDate = oh.OrderDate) LEFT JOIN sales.SalesOrderDetail od
    		ON (oh.SalesOrderId = od.SalesOrderID)
    GROUP BY
    		dr.OrderDate,
    		oh.SalesOrderId,
    		oh.PurchaseOrderNumber)
    SELECT @t = COUNT(DISTINCT DateDiff(day, OrderDate, '20000101')+Case When PurchaseOrderNumber Is Null Then 1 Else 0 End +SalesOrderID+Positions+TotalAmount) FROM cte;
    set @loopcounter = @loopcounter + 1
    select @endtime = SYSDATETIME()
    select @elaspedtime = @elaspedtime + datediff(ms, @starttime, @endtime);
    end
    select @elaspedtime as 'Elasped ms for 50 loops of temp table'
    

    Now when I looked at the actual execution plans, the cost the optimizer assigned to the table variable query was 0.792731 and the cost for the  temp table was 2.08574.  So looking at this you would think the table variable was better.  But looking at the elapsed times, the total time to do 50 loops of the table variable was usually between 16.5 and 17.5 seconds.  And the total time for the temp table was usually between 14 and 15 seconds.  (As expected the times varied from run to run, but were fairly consistent and I never had a case where the table variable was faster.  If you try this, it is likely the times will be different on your system.)

    Since, in most cases, you are more concerned about elapsed time rather than any other measure, in this case, I would say the temp table was better.

    Tom