none
Performance Issue with slow running query

    Question

  • Hi all,
    I have a performance issue that I need help with. The query is taking 40 seconds to run.
    I tried creating indexes but it hasnt really helped. Please suggest indexes/other methods that might help with this quey.

    There are 2 main tables: Table1 and Table2.
    ----------------------------------------------
    create table Table1
    (
     Num [nvarchar](255) NULL,
     Code [nvarchar](255) NULL,
     Amt [float] NULL,
     Description [nvarchar](255) NULL,
     -- And other fields ---
    )

    -- Sample data below. There are over 300,000 records in this table with 37,000 distinct [Num] field values.
    --No indexes on this table.

    -- sample data
    Insert Table1 values ('00000000000011111', 'a',2.3,'test')
    Insert Table1 values ('00000000000011111', 'a',2.4,'test2')
    Insert Table1 values ('00000000000011111', 'b',2.5,'test3')

    Insert Table1 values ('00000000000011112', 'c',2.7,'test11')
    Insert Table1 values ('00000000000011112', 'a',2.8,'test12')
    Insert Table1 values ('00000000000011112', 'b',2.9,'test13')
    Insert Table1 values ('00000000000011112', 'c',2.9,'test14')

    ----------------------------------------------
    --The second table has 19000 records with 3000 distinct [NUM]
    --No indexes on this table.

    create table Table2
    (
     ID [int] IDENTITY(1,1) NOT NULL,
     NUM [nvarchar](255) NULL,
     -- And other fields ---
    )

    -- sample data
    Insert Table2 values ('00000000000011111')
    Insert Table2 values ('00000000000011113')
    Insert Table2 values ('00000000000011114')
    Insert Table2 values ('121213FM')
    Insert Table2 values ('ABC000002821056')

    ----------------------------------------------

    -- QUERY THAT IS TAKING 40 seconds to return data.
    -- I need to find all data from table1 that does not have a matching [NUM] from table2
    -- Rows Returned: 290,000 (which is almost all of the data from table1)

    SELECT   T1.Amt, 
        T1.Code,
        T1.Description,
        T1.Num
      FROM Table1 T1 (NOLOCK)
      LEFT JOIN (SELECT DISTINCT NUM FROM TABLE2 (NOLOCK)) T2
       ON T1.Num = T2.NUM
      WHERE T2.Num IS NULL


    ---------------------------------------------- 

    Tuesday, December 17, 2013 3:12 PM

Answers

  • I suggest you start over - completely.  First, let's look at the schema you currently have.  Your tables have no primary key, no constraints, and all columns appear to be nullable.  I use "appear to be" since you did not post complete DDL, but your pattern suggests that this is accurate.  You have also fallen prey to the myth that a table needs an identity column - though oddly only one table has been victimized.  The column "Amt" is defined as float when the name suggests that it should be a fixed decimal.  And really, why "Amt" and not "Amount"?  Columns "Num" and "Code" are defined as nvarchar(255) though it seems unlikely that number of characters will ever be stored in these columns.  In general, a poorly defined schema will only be a constant source of problems.

    Next, you indicate that the resultset consists of 290,000 rows.  That is a large number; such a large resultset will require time to transmit from server to client as well as time for the client to consume all those rows and "do something" with them.  Since we don't know exactly what your client application is doing, I'll ignore that last bit.   The better approach would be to avoid generating and consuming such a large resultset in the first place - especially if this is intended to human consumption.  One person cannot really manage a resultset that large; a better (and usual) interface is to page large resultsets and let the consumer (i.e., the human in this example) view/navigate through each page when needed.  In short, 40 seconds to consume 300k rows doesn't seem so bad.

    Next, your query has no order by clause.  If the order of rows is important, it must have an order by clause.  Do not assume otherwise, even if your current testing seems to indicate an order.  Your query also uses the nolock hint.  Why?  Do you understand the implications? If not, don't use it.

    Next, you said that you had tried adding indexes.  So what exactly did you try?  Given that you currently have a heap, you might benefit from past discussions regarding heaps and their issues as well as discussions about the choice of a clustered index.  I cannot say whether a heap is a good choice for these tables, but it seem unlikely.   You could try using the Database Engine Tuning Advisor (DTA) as a first step.  Since you join on NUM, indexing on that column seems a good first step.  Keep in mind that nothing is free.  Your choices for this query may make others less efficient - in particular, those that populate or manipulate these tables.

    Lastly, you should always indicate which version of sql server that you are using since that fact frequently impacts the issue. 

    Tuesday, December 17, 2013 5:04 PM

All replies

  • Try -

     -- I need to find all data from table1 that does not have a matching [NUM] from table2
     
     SELECT  a.Num ,
             a.Code ,
             a.[Amt] ,
    	 a.[Description]
     FROM dbo.Table1 AS a
     WHERE NOT EXISTS (SELECT 1
    		   FROM Table2 AS b
    		   WHERE a.Num = b.Num
    		   )


    Narsimha

    Tuesday, December 17, 2013 3:27 PM
  • Narasimha, your query is still taking 37 seconds. I  think i need the right indexes to be created.
    Tuesday, December 17, 2013 3:48 PM
  • I suggest you start over - completely.  First, let's look at the schema you currently have.  Your tables have no primary key, no constraints, and all columns appear to be nullable.  I use "appear to be" since you did not post complete DDL, but your pattern suggests that this is accurate.  You have also fallen prey to the myth that a table needs an identity column - though oddly only one table has been victimized.  The column "Amt" is defined as float when the name suggests that it should be a fixed decimal.  And really, why "Amt" and not "Amount"?  Columns "Num" and "Code" are defined as nvarchar(255) though it seems unlikely that number of characters will ever be stored in these columns.  In general, a poorly defined schema will only be a constant source of problems.

    Next, you indicate that the resultset consists of 290,000 rows.  That is a large number; such a large resultset will require time to transmit from server to client as well as time for the client to consume all those rows and "do something" with them.  Since we don't know exactly what your client application is doing, I'll ignore that last bit.   The better approach would be to avoid generating and consuming such a large resultset in the first place - especially if this is intended to human consumption.  One person cannot really manage a resultset that large; a better (and usual) interface is to page large resultsets and let the consumer (i.e., the human in this example) view/navigate through each page when needed.  In short, 40 seconds to consume 300k rows doesn't seem so bad.

    Next, your query has no order by clause.  If the order of rows is important, it must have an order by clause.  Do not assume otherwise, even if your current testing seems to indicate an order.  Your query also uses the nolock hint.  Why?  Do you understand the implications? If not, don't use it.

    Next, you said that you had tried adding indexes.  So what exactly did you try?  Given that you currently have a heap, you might benefit from past discussions regarding heaps and their issues as well as discussions about the choice of a clustered index.  I cannot say whether a heap is a good choice for these tables, but it seem unlikely.   You could try using the Database Engine Tuning Advisor (DTA) as a first step.  Since you join on NUM, indexing on that column seems a good first step.  Keep in mind that nothing is free.  Your choices for this query may make others less efficient - in particular, those that populate or manipulate these tables.

    Lastly, you should always indicate which version of sql server that you are using since that fact frequently impacts the issue. 

    Tuesday, December 17, 2013 5:04 PM
  • Without any indexes, the optimizer has very few options, and will settle on a Hash Join ten times out of ten. For heaps of this size, the 40 second range do not sound unreasonable.

    If you have only experimented with nonclustered indexes, then you probably won't see any improvement, because all of these you probably created noncovering. The fact that the end result has a large number of rows basically rules out any efficient use noncovering indexes for a query that basically has to do table scans (as is the case here).

    Scott has many, many, MANY good comments on your data structure. You probably should get help from someone who knows more about databases.


    Gert-Jan

    Wednesday, December 18, 2013 12:09 PM
  • I created clustered index on the [Num]  field on the Main Table. I also created covering and noncovering nonclustered indexes. None of which helped, as the end result is most of the data from the table.

    A clustered index scan happens, but the duration is long due to the number of records.

    To Scott's comments on the data structure....

    I inherited this database schema/tables a few days ago when I joined this place. I cannot fix the issues with the database/table design at this point. I can add indexes to help with this report's performance at the most. But thanks for the notes.

    Wednesday, December 18, 2013 3:00 PM
  • Hi BGop,

    There are a number of common reasons for slow-running queries and updates. Here, you can see the link: Checklist for Analyzing Slow-Running Queries.

    In general, the slow running query is caused by blocking, blocking per object, locks or waits. If we know which query statement take more time, we can optimize the query statement by changing them. For the more information, you can refer to the blogs as follows:

    How to Identify Slow Running Queries with SQL Profiler
    Find the TOP n Longest Running Procedures (or Queries), Ordered by Total Impact on Server

    Regards,
    Heidi Duan


    Heidi Duan
    TechNet Community Support

    Thursday, December 26, 2013 8:34 AM