Hi,
I’m trying to solve a problem of query optimization.
I had a solution on SQL Server 2000 that used to generate this query:
SELECT DISTINCT cast(cast(SOC as nvarchar(2))+cast(RES as nvarchar(1))+cast(LIV as nvarchar(8)) as sql_variant) as Val_Non_Integre, 'TIERS' as Tab_Reference,'TIE' as Cle_TabRef, 'PROFIL_LOGIST' as Tab_Centrale, 'LIV' as Cle_TabCentrale, 'J1' as Tra_Code, 'ADVCCD' as App_Nom, 'XXX' as Div_code
FROM [XXXADVCCD].[DBO].[PROFIL_LOGIST]
WHERE cast(SOC as nvarchar(2))+cast(RES as nvarchar(1))+cast(LIV as nvarchar(8)) NOT IN
(SELECT DISTINCT cast(cast(SOC as nvarchar(2))+cast(RES as nvarchar(1))+cast(TIE as nvarchar(8)) as sql_variant) as result
FROM [XXXADVREF].[DBO].[TIERS])
This worked fine before and query time was very fast (less than 20s).
Now, I’m working on SQL Server 2008 and the same query is solved very slowly (20 min.)
My first idea was that problem could come from statistics not updated, so before the execution, I introduced an sp_updatestats, but without any success. I tried also with specific UPDATE STATISTIC with fullscan on each table, but no success neither…
After looking at the execution plan it seems that the SQL Server 2008 Query optimizer is under-estimating the number of rows it has to go through in table TIERS. As a result, the query optimizer chooses to use a nested loop instead of a hash merge.
To solve this, I decided to use query hint like this:
SELECT DISTINCT cast(cast(SOC as nvarchar(2))+cast(RES as nvarchar(1))+cast(LIV as nvarchar(8)) as sql_variant) as Val_Non_Integre, 'TIERS' as Tab_Reference,'TIE' as Cle_TabRef, 'PROFIL_LOGIST' as Tab_Centrale, 'LIV' as Cle_TabCentrale, 'J1' as Tra_Code, 'ADVCCD' as App_Nom, 'XXX' as Div_code
FROM [XXXADVCCD].[DBO].[PROFIL_LOGIST]
WHERE cast(SOC as nvarchar(2))+cast(RES as nvarchar(1))+cast(LIV as nvarchar(8)) NOT IN (
SELECT DISTINCT cast(cast(SOC as nvarchar(2))+cast(RES as nvarchar(1))+cast(TIE as nvarchar(8)) as sql_variant) as result FROM [XXXADVREF].[DBO].[TIERS])
OPTION(hash join, merge join)
It works fine, and query time speeds up to 20s, but how can I manage to make the query optimizer choose the good execution plan without helping him with the option?
Hope you have enough information !