none
adding left joins changes results

    Question

  • i have simple query with left join 

    select x.id , count(y.xid)
    from x
    left join y on y.xid = x.id
    group by x.id 

    this qouery gives me the correct results
    but the same query with more left joins return different results

    select x.id , count(y.xid)
    from x
    left join y on y.xid = x.id
    left join z on z.xid = x.id
    left join t on t.xid = x.id
    group by x.id 


    does this make any sence?
     
    Saturday, August 31, 2013 5:15 PM

Answers

  • Yes, you can get larger counts with the second query.  Consider

    declare @x table(id int);
    insert @x values (1);
    declare @y table(id int, xid int);
    insert @y values (1, 1);
    declare @z table(id int, xid int);
    insert @z values (1, 1), (2, 1);
    
    select x.id as xid, y.id as yid
    from @x x
    left join @y y on y.xid = x.id;
    
    select x.id as xid, y.id as yid, z.id as zid
    from @x x
    left join @y y on y.xid = x.id
    left join @z z on z.xid = x.id;
    
    select x.id , count(y.xid)
    from @x x
    left join @y y on y.xid = x.id
    group by x.id;
    
    select x.id , count(y.xid)
    from @x x
    left join @y y on y.xid = x.id
    left join @z z on z.xid = x.id
    group by x.id;

    The first query returns only one row for x.id = 1, the second query returns 2 rows with x.id = 1.  So when you do the group by and count (queries 3 and 4) you get two different counts: a count of 1 for query 3 and a count of 2 for query 4.

    Tom

    Saturday, August 31, 2013 5:33 PM

All replies

  • Yes, you can get larger counts with the second query.  Consider

    declare @x table(id int);
    insert @x values (1);
    declare @y table(id int, xid int);
    insert @y values (1, 1);
    declare @z table(id int, xid int);
    insert @z values (1, 1), (2, 1);
    
    select x.id as xid, y.id as yid
    from @x x
    left join @y y on y.xid = x.id;
    
    select x.id as xid, y.id as yid, z.id as zid
    from @x x
    left join @y y on y.xid = x.id
    left join @z z on z.xid = x.id;
    
    select x.id , count(y.xid)
    from @x x
    left join @y y on y.xid = x.id
    group by x.id;
    
    select x.id , count(y.xid)
    from @x x
    left join @y y on y.xid = x.id
    left join @z z on z.xid = x.id
    group by x.id;

    The first query returns only one row for x.id = 1, the second query returns 2 rows with x.id = 1.  So when you do the group by and count (queries 3 and 4) you get two different counts: a count of 1 for query 3 and a count of 2 for query 4.

    Tom

    Saturday, August 31, 2013 5:33 PM
  • Your skeleton code is a mess. You have a magical "id" that changes names from table to table. You meant:

    SELECT X.something_id, COUNT(Y.something_id) AS something_cnt
    FROM X
    LEFT OUTER JOIN Y ON Y.something_id = X.something_id
    LEFT OUTER JOIN Z ON Z.something_id = X.something_id
    LEFT OUTER JOIN T ON T.something_id = X.something_id
    GROUP BY X.something_id;

    Here is how OUTER JOINs work in ANSI/ISI Standard SQL and beyond. Assume you are given:

    Table1       Table2
     a   b        a   c
     ======       ======
     1   w        1   r
     2   x        2   s
     3   y        3   t
     4   z

    and the outer join expression: 

     Table1 
     LEFT OUTER JOIN 
     Table2
     ON Table1.a = Table2.a      <== join condition
        AND Table2.c = 't';      <== single table condition

    We call Table1 the "preserved table" and Table2 the "unpreserved table" in the query. What I am going to give you is a little different, but equivalent to the ANSI/ISO standards.

    1) We build the CROSS JOIN of the two tables. Scan each row in the result set. 

    2) If the predicate tests TRUE for that row, then you keep it. You also remove all rows derived from it from the CROSS JOIN

    3) If the predicate tests FALSE or UNKNOWN for that row, then keep the columns from the preserved table, convert all the columns from the unpreserved table to NULLs and remove the duplicates.

    So let us execute this by hand:

     Let @ = passed the first predicate
     Let * = passed the second predicate

     Table1 CROSS JOIN Table2
     a   b        a   c
     =========================
     1   w       1   r @
     1   w       2   s
     1   w       3   t *
     2   x       1   r
     2   x       2   s @
     2   x       3   t *
     3   y       1   r
     3   y       2   s
     3   y       3   t @* <== the TRUE set
     4   z       1   r
     4   z       2   s
     4   z       3   t *

     Table1 LEFT OUTER JOIN Table2
     a   b        a   c
     =========================
     3   y     3      t      <= only TRUE row
     -----------------------
     1   w     NULL   NULL   Sets of duplicates
     1   w     NULL   NULL
     1   w     NULL   NULL
     -----------------------
     2   x     NULL   NULL
     2   x     NULL   NULL
     2   x     NULL   NULL
     3   y     NULL   NULL  <== derived from the TRUE set - Remove  
     3   y     NULL   NULL
     -----------------------
     4   z     NULL   NULL
     4   z     NULL   NULL
     4   z     NULL   NULL

    the final results:

     Table1 LEFT OUTER JOIN Table2
     a   b        a   c
     =========================
     1   w     NULL   NULL
     2   x     NULL   NULL
     3   y     3      t
     4   z     NULL   NULL

    The basic rule is that every row in the preserved table is represented in the results in at least one result row. 

    There are limitations and very serious problems with the extended equality version of an outer join used in some diseased mutant products. Consider the two Chris Date tables

     Suppliers        SupParts
     sup_nbr            sup_nbr part_nbr qty
     =========        ==============
     S1               S1   P1    100
     S2               S1   P2    250
     S3               S2   P1    100
                      S2   P2    250

    and let's do an extended equality outer join like this:

     SELECT *
      FROM Supplier, SupParts
     WHERE Supplier.sup_nbr *= SupParts.sup_nbr
       AND qty < 200;

    If I do the outer first, I get:

     Suppliers LOJ SupParts
     sup_nbr sup_nbr part_nbr qty
     =======================
     S1     S1   P1    100
     S1     S1   P2    250
     S2     S2   P1    100
     S2     S2   P2    250
     S3   NULL  NULL   NULL

    Then I apply the (qty < 200) predicate and get

     Suppliers LOJ SupParts
     sup_nbr sup_nbr part_nbr qty
     ===================
     S1   S1   P1    100
     S2   S2   P1    100

    Doing it in the opposite order

     Suppliers LOJ SupParts
     sup_nbr sup_nbr part_nbr qty
     ===================
     S1   S1   P1    100
     S2   S2   P1    100
     S3   NULL NULL  NULL

    Sybase did it one way, Oracle did it the other and Centura (nee Gupta) let you pick which one -- the worst of both non-standard worlds! In ANSI/ISO Standard SQL, you have a choice and can force the order of execution. Either do the predicates after the join ...

     SELECT *
       FROM Supplier
            LEFT OUTER JOIN
            SupParts
            ON Supplier.sup_nbr = SupParts.sup_nbr
     WHERE qty < 200;

     ... or do it in the joining:

     SELECT *
      FROM Supplier
           LEFT OUTER JOIN
           SupParts
           ON Supplier.sup_nbr = SupParts.sup_nbr 
              AND qty < 200;

    Another problem is that you cannot show the same table as preserved and unpreserved in the extended equality version, but it is easy in ANSI/ISI Standard SQL. For example to find the students who have taken Math 101 and might have taken Math 102:

     SELECT C1.student, C1.math, C2.math
      FROM (SELECT * FROM Courses WHERE math = 101) AS C1
           LEFT OUTER JOIN
           (SELECT * FROM Courses WHERE math = 102) AS C2
           ON C1.student = C2.student;


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Saturday, August 31, 2013 6:31 PM