none
Optimize SQL query

    Question

  • Hi,

    I have a Transaction table having 5 billion rows.There is one SP which takes data from Transaction table based on different BL and stores in another table. Now, the issue is this SP takes almost 3-4 hours to execute. I have to optimize the SP.

    Below is the pseudo code. Please suggest how to optimize the query. Similarly there are multiple conditions.

    Insert into Transactiontb
       select      
       Product,      
       country,      
       M1 = 0.003 * sum(case when period = 1 then  ISNULL(Sales,0) else 0 end),      
       M2 = 0.003 * sum(case when period = 2 then  ISNULL(Sales,0) else 0 end),      
       M3 = 0.003 * sum(case when period = 3 then  ISNULL(Sales,0) else 0 end),      
       ...........
       ...........
       M60 = 0.003 * sum(case when period = 60 then ISNULL(Sales,0) else 0 end)      
         
       FROM       
       dbo.Transaction       
       where Type='A'
       group by Product,country    
       
       
       Insert into Transactiontb
        select      
       Product,      
       country,      
       M1 = 0.003 * sum(case when period = 1 then  ISNULL(Sales,0) else 0 end),      
       M2 = 0.003 * sum(case when period = 2 then  ISNULL(Sales,0) else 0 end),      
       M3 = 0.003 * sum(case when period = 3 then  ISNULL(Sales,0) else 0 end),      
       ...........
       ...........
       M60 = 0.003 * sum(case when period = 60 then ISNULL(Sales,0) else 0 end)      
         
       FROM       
       dbo.Transaction       
       where Type='C'
       group by Product,country    
       
       

    Wednesday, September 11, 2013 1:07 PM

All replies

  • First thing.. does the table has index.if it has, do you do regular index maintenance.

    If it has index and in maintained regularly could you post the execution plan and the table DDL

    vt

     

    Please mark answered if I've answered your question and vote for it as helpful to help other user's find a solution quicker


    Wednesday, September 11, 2013 1:13 PM
  • This Transaction & Transactiontb table truncates and loads fresh data every month.
    Wednesday, September 11, 2013 1:33 PM
  • There are some 'tricks' to improve the query but based on your narrative it is hard to suggest something without seeing the table structure + indexes definition

    1) 

    Insert into Transactiontb with (tablock)
       select      
       Product,      
       country,      
       M1 = 0.003 * sum(case when period = 1 then  ISNULL(Sales,0) else 0 end),      
       M2 = 0.003 * sum(case when period = 2 then  ISNULL(Sales,0) else 0 end),      
       M3 = 0.003 * sum(case when period = 3 then  ISNULL(Sales,0) else 0 end),     

    2)

    DECLARE @x INT
    SET @x = 1
    WHILE @x < 10,000,000  -- Set appropriately
    BEGIN

       

    Insert into Transactiontb

    select      
       Product,      
       country,      
       M1 = 0.003 * sum(case when period = 1 then  ISNULL(Sales,0) else 0 end),      
       M2 = 0.003 * sum(case when period = 2 then  ISNULL(Sales,0) else 0 end),      
       M3 = 0.003 * sum(case when period = 3 then  ISNULL(Sales,0) else 0 end),  

    where ID BETWEEN @x AND @x + 10000

        SET @x = @x + 10000
    END

    ID column should have clustered index.

    PS I might be wrong, but I think that gives some  material for thinking


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance


    Wednesday, September 11, 2013 1:44 PM
  • Of course it depends on what the current bottleneck is. Assuming it is I/O, then from your pseudo code, it looks like both queries can be joined into one. If both queries currently scan the entire table, then joining the two statements would save one scan.

    By the way: your pseudo code does not show the difference in selected values between the two statements. If I take that literally, then you could simply change "WHERE Type = 'A'" to "WHERE Type = 'A' OR Type = 'C'" and omit the second statement.


    Gert-Jan

    Wednesday, September 11, 2013 4:57 PM
  • Create a index on (Product, Country, Type) INCLUDE (Period, Sales).

    The order of the columns in the index should be based on their data distribution, e.g. selectivity. Most selective column should be left first.


    Regards, Dean Savović

    Wednesday, September 11, 2013 5:10 PM
  • we already having the below index structure.

    CREATE UNIQUE CLUSTERED INDEX [UIX] ON [dbo].[Transaction] 
    (
    	[period] ASC,
    	[Product] ASC,
    	[Country] ASC,
    	[Type] ASC
    )

    Thanks,

    Saikat

    Thursday, September 12, 2013 6:17 AM
  • Is the index UIX being used in the query?

    Include Sales column in the index (INCLUDE part of the statement) it will help SQL server generate better exec plan.


    Regards, Dean Savović

    Thursday, September 12, 2013 9:18 AM
  • I cannot include the sales column in the index because there are other columns as well like Tax,Revenue,ST,CT etc which are also used in other different conditions in the SP.

    Thanks,

    Saikat

    Thursday, September 12, 2013 9:50 AM
  • What is taking long in the 3 hours?  INSERT or SELECT?

    How long the SELECT query takes without the INSERT?

    Consider using FILLFACTOR in indexing.

    BOL: "When you create a clustered index, the data in the table is stored in the data pages of the database according to the order of the values in the indexed columns. When new rows of data are inserted into the table or the values in the indexed columns are changed, Microsoft® SQL Server™ 2000 may have to reorganize the storage of the data in the table to make room for the new row and maintain the ordered storage of the data. This also applies to nonclustered indexes. When data is added or changed, SQL Server may have to reorganize the storage of the data in the nonclustered index pages. When a new row is added to a full index page, SQL Server moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. Page splitting can impair performance and fragment the storage of the data in a table. For more information, see Table and Index Architecture.

    When creating an index, you can specify a fill factor to leave extra gaps and reserve a percentage of free space on each leaf level page of the index to accommodate future expansion in the storage of the table's data and reduce the potential for page splits. The fill factor value is a percentage from 0 to 100 that specifies how much to fill the data pages after the index is created. A value of 100 means the pages will be full and will take the least amount of storage space. This setting should be used only when there will be no changes to the data, for example, on a read-only table. A lower value leaves more empty space on the data pages, which reduces the need to split data pages as indexes grow but requires more storage space. This setting is more appropriate when there will be changes to the data in the table.

    The fill factor option is provided for fine-tuning performance. However, the server-wide default fill factor, specified using the sp_configure system stored procedure, is the best choice in the majority of situations. "

    LINK: http://technet.microsoft.com/en-us/library/aa933139(v=sql.80).aspx

    Optimization bible:

    http://www.sqlusa.com/articles/query-optimization/


    Kalman Toth Database & OLAP Architect IPAD SELECT Query Video Tutorial 3.5 Hours
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012


    Thursday, September 12, 2013 12:36 PM
  • Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Learn how to follow ISO-11179 data element naming conventions and formatting rules (you have no idea so your code is full of vague useless data element names). Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect. 

    This is minimal polite behavior on SQL forums. 

    >> I have a Transaction table having 5 billion rows. There is one SP which takes data from Transaction table based on different BL and stores in another table. <<

    It would be better to have a VIEW and not a new table. 

    >> Below is the pseudo code. Please suggest how to optimize the query. Similarly there are multiple conditions. <<

    Did you actually put “tb” in a table name? That design error has a name it is so bad; tibbling. Did you know the ISO-3166 country codes? Is your “country” that code or the population size? Can you be more vague? We use COALESCE() today. What is “sales”? I would guess a dollar amount which means we need to cast to decimal not integer. What is “type”? I will guess blood type! Why not? This is why we follow ISO-11179 rules. 

    There is no obvious need to have two queries: 

    INSERT INTO Transaction_Summary
    SELECT product_id, country_code, blood_type,
      0.003 * SUM(CASE WHEN report_period = 1 
                  THEN COALESCE(sales_amt, 0.00) ELSE 0.00 END) AS m01
      0.003 * SUM(CASE WHEN report_period = 2  
                  THEN COALESCE(sales_amt, 0.00) ELSE 0.00 END) AS m02, 
      0.003 * SUM(CASE WHEN report_period = 3 
                  THEN COALESCE(sales_amt, 0.00) ELSE 0.00 END) AS m03, 
      ...
      0.003 * SUM(CASE WHEN report_period = 60  
                  THEN COALESCE(sales_amt, 0.00) ELSE 0.00 END) AS m60
      FROM DBO. Something_Transactions
     WHERE blood_type IN ('A', 'C')
     GROUP BY product_id, country_code, blood_type;
     
     

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Thursday, September 12, 2013 5:17 PM