locked
query inside if exists() lasts longer than without exists() RRS feed

  • Question

  • for examle:

    if exists(select * from mydb.mytable where col1=x and col2>y and col3 not in ('a','b','c')) begin ... end

    take 4 seconds, while:

    select * from mydb.mytable where col1=x and col2>y and col3 not in ('a','b','c')

    take less than one second!

    additional example:

    if exists (select * from mydb.tableA,mydb.tableB where tableA.fk=tableB.pk and tableA.col1=0 and tableB.col2>0) begin ... end

    take 2 seconds, while:

    select * from mydb.tableA,mydb.tableB where tableA.fk=tableB.pk and tableA.col1=0 and tableB.col2>0

    take less than one second.

    i have sql server 2008 r2 sp3

    • Moved by Tom Phillips Monday, August 28, 2017 6:05 PM Query performance question
    Monday, August 28, 2017 6:04 PM

Answers

  • I cop-out is simply to do
    IF (SELECT COUNT(*) FROM ...) > 0
    BEGIN
    This is often scorned out as being less efficient than EXISTS, but sometimes is is the other way round.

    You could also use hints to encourage the optimizer to use the same plan when you run a plain SELECT, but that could be messy to get right.

    • Marked as answer by edbbpc Sunday, September 3, 2017 4:07 PM
    Wednesday, August 30, 2017 9:53 PM

All replies


  • > turn on statistics and run both queries an check

    > compare execution plans for both queries 

    Both are two different queries and will have different plans. 

    IF Exists looks for first TRUE value and proceeds to statement within BEING.

    Where are SELECT * will return all rows matching rows


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.


    • Edited by Sarat (SS) Monday, August 28, 2017 8:11 PM
    Monday, August 28, 2017 6:16 PM
  • i have allready do all those steps.

    yes there is differences in the query plan,

    in the first examlpe, with the exists(): compute scalar->nested loops->constant scan , clustered index scan.

    without exists: parallelism -> clustered index scan.

    in the second example, with the exists: 

      |--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [Expr1008] THEN (1) ELSE (0) END))
           |--Nested Loops(Left Semi Join, DEFINE:([Expr1008] = [PROBE VALUE]))
                |--Constant Scan
                |--Nested Loops(Inner Join, WHERE:(tableA.fk=tableB.pk))
                     |--Clustered Index Scan(OBJECT:(tableA.PK), WHERE:(tableA.col1=0))
                     |--Clustered Index Scan(OBJECT:(tableB.PK), WHERE:(tableB.col2>0))

    and without exists:

      |--Parallelism(Gather Streams)
           |--Hash Match(Inner Join, HASH:(tableB.pk)=(tableA.fk), RESIDUAL:(tableA.fk=tableB.pk))
                |--Parallelism(Distribute Streams, Broadcast Partitioning)
                |    |--Clustered Index Scan(OBJECT:(tableB.PK), WHERE:(tableB.col2>0))
                |--Filter(WHERE:(tableA.col1=0))
                    |--Clustered Index Scan(OBJECT:(tableA.PK))

    the qustion is, why is not there use of parallelism inside exists()?

    and why it used nested loops inside the exists, while without exist it used hash?

    in the second example, when specify inner hash join inside the exists, it was faster.

    thanks


    • Marked as answer by edbbpc Tuesday, August 29, 2017 2:17 PM
    • Unmarked as answer by edbbpc Tuesday, August 29, 2017 3:55 PM
    Monday, August 28, 2017 7:42 PM
  • >>the question is, why is not there use of parallelism inside exists()?

    >>and why it used nested loops inside the exists, while without exist it used hash?

    When one uses EXISTS, SQL Server doesn't return all matching rows and it looks for at least one satisfying row and makes condition true. So, that should be one reason for not picking parallelism   . Example Query with EXISTS needs only Col1,Col2,Col3 values and ignores if there are more columns in table.

    Nested loops operator > join operator , that  depends type of indexes on tables and distribution of data in those columns ....which would be difficult to tell without knowing tables.


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Monday, August 28, 2017 7:57 PM
  • if exists(select * from mydb.mytable where col1=x and col2>y and col3 not in ('a','b','c')) begin ... end

    take 4 seconds, while:

    select * from mydb.mytable where col1=x and col2>y and col3 not in ('a','b','c')

    take less than one second!

    And how many rows does the SELECT return?

    The IF EXISTS plan is probably optimized to find the first row, and it is optimistic that it will. But the SELECT returns no rows, it will traverse the data in a more inefficient way.

    Monday, August 28, 2017 9:45 PM
  • the table has more than 3,500,000 rows.

    all the columns in the where clause has indexes.

    there is 1450 matched rows.

    i worked around in this manner (less than 1 second):

    declare @var int;
    set @var=(select top(1) 1 from mydb.mytable where col1=x and col2>y and col3 not in ('a','b','c'));
    if (@var is not null) begin select 1 end

    instead of (more than 3 seconds):

    if exists(select * from mydb.mytable where col1=x and col2>y and col3 not in ('a','b','c')) begin select 1 end

    thanks

    Tuesday, August 29, 2017 2:33 PM
  • here is complete structure:

    -- first table, dbo.pays
    CREATE TABLE dbo.pays(
    paysPK int IDENTITY(1,1) PRIMARY KEY,
    payscustomerid int,
    paysdestination int,
    paystype bit);
    GO
    CREATE INDEX indpayscustomerid ON dbo.pays (payscustomerid);
    GO
    CREATE INDEX indpaysdestination ON dbo.pays (paysdestination);
    GO

    -- second table, dbo.dest
    CREATE TABLE dbo.dest(
    destPK int IDENTITY(1,1) PRIMARY KEY,
    desttype [bit]);
    GO

    -- INSERT INTO dbo.pays SELECT * FROM xyz; -- 3,500,000 rows inserted
    -- INSERT INTO dbo.dest SELECT * FROM zyx; -- 277 rows inserted

    -- first query lasts 12 seconds.
    IF EXISTS (SELECT * FROM dbo.pays,dbo.dest WHERE paysdestination=destPK AND paystype<>desttype) BEGIN
      select 1;
    END

    -- second query lasts 2 seconds, 19 rows returned:
    SELECT * FROM dbo.pays,dbo.dest WHERE paysdestination=destPK AND paystype<>desttype;

    -- third query lasts 10 seconds, 19 rows returned:

    SELECT TOP(1) * FROM dbo.pays,dbo.dest WHERE paysdestination=destPK AND paystype<>desttype;

    -- fourth query lasts 4 seconds, 19 rows returned:

    SELECT TOP(101) * FROM dbo.pays,dbo.dest WHERE paysdestination=destPK AND paystype<>desttype;

    • Edited by edbbpc Tuesday, August 29, 2017 4:13 PM
    • Proposed as answer by Mauricio Feijo Tuesday, August 29, 2017 4:41 PM
    Tuesday, August 29, 2017 3:58 PM
  • Judging from the numbers you posted, this agrees quite well with my assumptions. That is, the optimizer thinks it will get a hit quickly, but has to go through fare more rows than it expected. When you have SELECT * without TOP or EXISTS, it knows that it has to go through the full table anyway.

    Tuesday, August 29, 2017 10:14 PM
  • is there any way to tell him not to be so optimistic abaout the hit?

    the next query still last 5 seconds:

    IF EXISTS (SELECT * FROM dbo.pays INNER HASH JOIN dbo.dest ON paysdestination=destPK AND paystype<>desttype) BEGIN  select 1; END

    and what type of hint should be used in query like this, where there is no join:

    if exists(select * from mydb.mytable where col1=x and col2>y and col3 not in ('a','b','c')) begin select 1 end

    thanks

    Wednesday, August 30, 2017 2:30 PM
  • I cop-out is simply to do
    IF (SELECT COUNT(*) FROM ...) > 0
    BEGIN
    This is often scorned out as being less efficient than EXISTS, but sometimes is is the other way round.

    You could also use hints to encourage the optimizer to use the same plan when you run a plain SELECT, but that could be messy to get right.

    • Marked as answer by edbbpc Sunday, September 3, 2017 4:07 PM
    Wednesday, August 30, 2017 9:53 PM
  • the sql planner is so clever, so it treats "if (select count(*) from ..) >0" as "select top (1)"...

    only this is good:

    DECLARE @cnt int = (SELECT COUNT(*) FROM ...);

    IF (@cnt>0) BEGIN ... END

    thanks

    Sunday, September 3, 2017 4:07 PM
  • Egads! It is that smart! I have missed that. Thinking that it might have been changed in a recent version, I checked a couple of versions, but it is not exactly new. In SQL 2000 it does make a count, but this was changed already in SQL 2005.

    The life with an ever-changing optimizer is that yesterday's truths can become the lies of today. But not having noticed this for so many years is shameful for a person who plays an SQL expert on TV.

    Sunday, September 3, 2017 5:14 PM