none
Sql query slowness due to rank and columns with null values:

    Question

  •     

    Sql query slowness due to rank and columns with null values:

    I have the following table in database with around 10 millions records:

    Declaration:

    create table PropertyOwners ( [Key] int not null primary key, PropertyKey int not null,     BoughtDate DateTime,     OwnerKey int null,     GroupKey int null    )    go

       

    [Key] is primary key and combination of PropertyKey, BoughtDate, OwnerKey and GroupKey is unique.

    With the following index:

    CREATE NONCLUSTERED INDEX [IX_PropertyOwners] ON [dbo].[PropertyOwners]     
    (    	
       [PropertyKey] ASC,    	
       [BoughtDate] DESC,    	
       [OwnerKey] DESC,    	  
       [GroupKey] DESC    
    )    
    go
       

    Description of the case:
    For single BoughtDate one property can belong to multiple owners or single group, for single record there can either be OwnerKey or GroupKey but not both so one of them will be null for each record. I am trying to retrieve the data from the table using following query for the OwnerKey. If there are same property rows for owners and group at the same time than the rows having OwnerKey with be preferred, that is why I am using "OwnerKey desc" in Rank function.

    declare @ownerKey int = 40000    
    select PropertyKey, BoughtDate, OwnerKey, GroupKey    
    from (	    	
       select PropertyKey, BoughtDate, OwnerKey, GroupKey,              
              RANK() over (partition by PropertyKey order by BoughtDate desc, OwnerKey desc, GroupKey desc) as [Rank]     
       from PropertyOwners    
    ) as result    
    where result.[Rank]=1 and result.[OwnerKey]=@ownerKey

    It is taking 2-3 seconds to get the records which is too slow, similar time it is taking as I try to get the records using the GroupKey. But when I tried to get the records for the PropertyKey with the same query, it is executing in 10 milliseconds. May be the slowness is due to as OwnerKey/GroupKey in the table  can be null and sql server in unable to index it. I have also tried to use the Indexed view to pre ranked them but I can't use it in my query as Rank function is not supported in indexed view.

    Please note this table is updated once a day and using Sql Server 2008 R2. Any help will be greatly appreciated.

    Thursday, August 14, 2014 7:17 AM

Answers

All replies

  • What indexes do you have? It is important to have an index on PropertyKey,BoughtDate,OwnerKey,GroupKey columns 

    Also add a WHERE condition into a  sub query is it possible? 


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, August 14, 2014 7:23 AM
    Answerer
  • @Uri Dimant

    If you check my question you will see I have a index "IX_PropertyOwners" on all the columns. I am pasting it again here

    CREATE NONCLUSTERED INDEX [IX_PropertyOwners] ON [dbo].[PropertyOwners]     
    (    	
       [PropertyKey] ASC,    	
       [BoughtDate] DESC,    	
       [OwnerKey] DESC,    	  
       [GroupKey] DESC    
    )    
    go

    For your question about moving the where clause in the sub query. I can't do that as I want to apply this check after getting all the latest property information from the table. If I will apply it in the inner query then it will be wrong as query will retieve the record for that owner even that property doesn't belong to it now.

    Thursday, August 14, 2014 7:28 AM
  • Not sure but based on the query the plan seems to be simple consisting of segment operators. The reason why with PropertyKey you are getting faster result can may be that the PropertyKey is the leading column in the index definition so Optimizer may be having the right estimates.

    Did you try comparing both the plans to see where the difference is ?


    Thanks and regards, Rishabh K

    Thursday, August 14, 2014 8:18 AM
  • @Rishabh K

    Thanks for your concern in the question. Yes I compared both of the query plans, the query when I use with PropertyKey is using "Index seek" while when I use query with OwnerKey/GroupKey then it is using "Index scan". I have tried defferent variation of index to change the "Index scan" to "Index seek" while quering with those nullable columns "OwnerKey/GroupKey" but with no success. I want to solve this problem desperately now as it has taken so much of my time.


    Thursday, August 14, 2014 8:31 AM
  • yes , the index seek is because the propertykey is the leading column in the index definition. Can you try creating another index with ownerkey as the first key but again this will not help you with the PARTITION BY Clause as it requires data in sorted manner and also  there is always an overhead of having too many indexes. 

    Thanks and regards, Rishabh K


    • Edited by Rishabh K Thursday, August 14, 2014 8:48 AM
    Thursday, August 14, 2014 8:38 AM
  • @Rishabh K

    I have tried your suggestion but it doesn't help. I think sql server has some problem indexing the query for columns with nullable values as in my case.

    Thursday, August 14, 2014 8:53 AM
  • Yep, sorry just missed it... If it is NULL you still want tot retrieve them?If no, you can create a filtered index (WHERE col IS NOT NULL) or something but add all columns with INCLUDE   and I think you also need to specify in the query where col is not null

    Can you show an execution plan of the query ?


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, August 14, 2014 9:17 AM
    Answerer
  • @Uri Dimant

    Yes I still want to retrive the latest record for the property even it is null so I can show the latest OwnerKey(s) or GroupKey for the property.

    Here is the execution plan for the query. When ever I am trying to apply [Rank]=1 with any of the key like "PropertyKey=@propertyKey" or OwnerKey=@ownerKey index become "Scan" from "Seek".

    Thursday, August 14, 2014 10:10 AM
  • Hmm, is it possibly to try insert the subquery result into a temporary table and then apply a filter?

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, August 14, 2014 10:58 AM
    Answerer
  • @Uri Dimant

    I tried as you have suggestion below but it doesn't helped, results became more slower.

    declare @result table (PropertyKey int not null, BoughtDate datetime, OwnerKey int null, GroupKey int null, [Rank] int not null)
    insert into @result(PropertyKey, BoughtDate, OwnerKey, GroupKey, [Rank])
    select PropertyKey, BoughtDate, OwnerKey, GroupKey,
            RANK() over (partition by PropertyKey order by BoughtDate desc, OwnerKey desc, GroupKey desc) as [Rank]
    from PropertyOwners
    
    declare @ownerKey int = 1
    select PropertyKey, BoughtDate, OwnerKey, GroupKey
    from @result as result
    where result.[Rank]=1 
    and result.[OwnerKey]=@ownerKey


    Thursday, August 14, 2014 11:14 AM
  • No, please use a temporary table (#t) not a table variable..

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, August 14, 2014 11:17 AM
    Answerer
  • I believe what Uri suggested was to create a Temp table not table variable and then create index after the data load on uniquekey and then apply the filter and see if the execution time is coming down or not

    Thanks and regards, Rishabh K

    Thursday, August 14, 2014 11:19 AM
  • @Rishabh K

    Do we mean creating a cached table in database with the subquery results and then apply a filter? But I don't want this because this is dynamic table and updated anytime, with the solution you are suggesting after every insert/update/delete I have to update the cached table with result in database.

    Thursday, August 14, 2014 11:30 AM
  • Honestly I don't know what is a cached table. However what I meant was Create table #Temp (cols....) and then index as you do with normal tables.

    Thanks and regards, Rishabh K

    Thursday, August 14, 2014 11:36 AM
  • No, just create a temporary table  and run the SELECT, when connection is dropped SQL Server will drop the temporary  table  as well

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, August 14, 2014 11:42 AM
    Answerer
  • Here you go

    CREATE TABLE #tmp (PropertyKey INT,

    BoughtDate INT, OwnerKey INT , GroupKey  INT,rnk INT   )

    insert into #tmp ( PropertyKey, BoughtDate, OwnerKey, GroupKey,rnk) select PropertyKey, BoughtDate, OwnerKey, GroupKey,        RANK() over (partition by PropertyKey order by BoughtDate desc, OwnerKey desc, GroupKey desc) as [Rank]    from PropertyOwners   

    SELECT * FROM #tmp WHERE........


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, August 14, 2014 11:45 AM
    Answerer
  • @Uri Dimant

     So you are talking about the memory table, sorry I missed your comment in between mine and "Rishabh K". So my next question is that do I need to create this memory table everytime I will run the query where I want to get the latest property information for the owner? If that is the case the I have tried it with your suggestion and still its taking the same amount of time.


    Thursday, August 14, 2014 12:01 PM
  • No, such temporary tables are created and stored in the tempdb system database.. Why not just trying the code I posted above and tell us whether it is improved or not..

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, August 14, 2014 12:20 PM
    Answerer
  • I have executed the following query which is taking 1 second more than the older query due to overhead of creating the memory table which is same as creating a table variable. That what you mean?

    create table #result (PropertyKey int not null, BoughtDate datetime, OwnerKey int null, GroupKey int null, [Rank] int not null)
    insert into #result(PropertyKey, BoughtDate, OwnerKey, GroupKey, [Rank])
    select PropertyKey, BoughtDate, OwnerKey, GroupKey,
            RANK() over (partition by PropertyKey order by BoughtDate desc, OwnerKey desc, GroupKey desc) as [Rank]
    from PropertyOwners
    go
    
    declare @ownerKey int = 1
    select PropertyKey, BoughtDate, OwnerKey, GroupKey
    from #result as result
    where result.[Rank]=1 
    and result.[OwnerKey]=@ownerKey
    go
    
    drop table #result
    go


    Thursday, August 14, 2014 12:26 PM
  • >>When ever I am trying to apply [Rank]=1 with any of the key like >>>"PropertyKey=@propertyKey" or OwnerKey=@ownerKey index become "Scan" >>>from "Seek".

    So having index on OwnerKey, GroupKey  does not help? But do you see in the execution plan that SQL Server uses that index?

    CREATE TABLE #tmp (PropertyKey INT,

    BoughtDate INT, OwnerKey INT , GroupKey  INT,rnk INT   )

    Create index idx ON #tmp (OwnerKey ,rnk)

    Same problem?


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, August 14, 2014 12:35 PM
    Answerer
  • @Uri Dimant

    Your suggestion will help if we create the memory table and index first like this:

    create table #result (PropertyKey int not null, BoughtDate datetime, OwnerKey int null, GroupKey int null, [Rank] int not null)
    insert into #result(PropertyKey, BoughtDate, OwnerKey, GroupKey, [Rank])
    select PropertyKey, BoughtDate, OwnerKey, GroupKey,
            RANK() over (partition by PropertyKey order by BoughtDate desc, OwnerKey desc, GroupKey desc) as [Rank]
    from PropertyOwners
    go
    
    CREATE NONCLUSTERED INDEX [IX_Result_OwnerKey_Rank]
    ON [dbo].[#result] ([OwnerKey],[Rank])
    INCLUDE ([PropertyKey],[BoughtDate],[GroupKey])
    GO
    

    Then we execute this query:

    declare @ownerKey int = 113371
    select PropertyKey, BoughtDate, OwnerKey, GroupKey
    from #result as result
    where result.[Rank]=1 
    and result.[OwnerKey]=@ownerKey
    go
    But creating a table and index every-time I execute this query is more then using it directly as sub-query. Or you are suggesting that every-time I update/delete/insert into this table I update this memory table and keep it in memory then why not create a physical/cached table in database? which will be my last option if I can't find any other solution.

    Thursday, August 14, 2014 12:51 PM
  • No, No, you do not need to update a temporary table every time.. If two users at the same time running the stored procedure (I assume  you have a stored procedure that does INSERT INTO a temporary table) each uses gets his/her copy of THAT temporary table so there is no problem..

    >>>Or you are suggesting that every-time I update/delete/insert into this table I >>update this memory table and keep it in memory then why not create a >>physical/cached table in database? 

    Again , you do not need to operate with a temporary table (DELETE/UPDATE) but just getting the result from the SELECT (former subquery) .Every time you run the script SQL Server will re-create the table (pretty cheap operation) and you just insert the data.

    Having a permanent table introduces more problem (concurrent maintainace,delete old data) all of those operations are pretty costly.. 


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, August 14, 2014 12:59 PM
    Answerer
  • @Uri Dimant

    Sorry I didn't understand what you are saying about temporary tables. Do you mean when ever user want to get the results from the query I will need to create the temporary table with index and drop it after executing the query? Can you give me the steps of operation so I unsedstand it better.

    Thursday, August 14, 2014 1:55 PM
  • You can check the "Ennor" answer here on the following link from Stackoverflow:

    http://stackoverflow.com/questions/25302024/sql-query-slowness-due-to-rank-and-columns-with-null-values

    • Marked as answer by Faheem Ramzan Monday, August 18, 2014 7:31 AM
    Monday, August 18, 2014 6:45 AM
  • create table #result (PropertyKey int not null, BoughtDate datetime, OwnerKey int null, GroupKey int null, [Rank] int not null)

    Create index idx ON #result(OwnerKey ,rnk) insert into #result(PropertyKey, BoughtDate, OwnerKey, GroupKey, [Rank]) select PropertyKey, BoughtDate, OwnerKey, GroupKey, RANK() over (partition by PropertyKey order by BoughtDate desc, OwnerKey desc, GroupKey desc) as [Rank] from PropertyOwners go declare @ownerKey int = 1 select PropertyKey, BoughtDate, OwnerKey, GroupKey from #result as result where result.[Rank]=1 and result.[OwnerKey]=@ownerKey go


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Monday, August 18, 2014 8:06 AM
    Answerer