Question

• i have simple query with left join

select x.id , count(y.xid)
from x
left join y on y.xid = x.id
group by x.id

this qouery gives me the correct results
but the same query with more left joins return different results

select x.id , count(y.xid)
from x
left join y on y.xid = x.id
left join z on z.xid = x.id
left join t on t.xid = x.id
group by x.id

does this make any sence?

Saturday, August 31, 2013 5:15 PM

• Yes, you can get larger counts with the second query.  Consider

```declare @x table(id int);
insert @x values (1);
declare @y table(id int, xid int);
insert @y values (1, 1);
declare @z table(id int, xid int);
insert @z values (1, 1), (2, 1);

select x.id as xid, y.id as yid
from @x x
left join @y y on y.xid = x.id;

select x.id as xid, y.id as yid, z.id as zid
from @x x
left join @y y on y.xid = x.id
left join @z z on z.xid = x.id;

select x.id , count(y.xid)
from @x x
left join @y y on y.xid = x.id
group by x.id;

select x.id , count(y.xid)
from @x x
left join @y y on y.xid = x.id
left join @z z on z.xid = x.id
group by x.id;```

The first query returns only one row for x.id = 1, the second query returns 2 rows with x.id = 1.  So when you do the group by and count (queries 3 and 4) you get two different counts: a count of 1 for query 3 and a count of 2 for query 4.

Tom

Saturday, August 31, 2013 5:33 PM

All replies

• Yes, you can get larger counts with the second query.  Consider

```declare @x table(id int);
insert @x values (1);
declare @y table(id int, xid int);
insert @y values (1, 1);
declare @z table(id int, xid int);
insert @z values (1, 1), (2, 1);

select x.id as xid, y.id as yid
from @x x
left join @y y on y.xid = x.id;

select x.id as xid, y.id as yid, z.id as zid
from @x x
left join @y y on y.xid = x.id
left join @z z on z.xid = x.id;

select x.id , count(y.xid)
from @x x
left join @y y on y.xid = x.id
group by x.id;

select x.id , count(y.xid)
from @x x
left join @y y on y.xid = x.id
left join @z z on z.xid = x.id
group by x.id;```

The first query returns only one row for x.id = 1, the second query returns 2 rows with x.id = 1.  So when you do the group by and count (queries 3 and 4) you get two different counts: a count of 1 for query 3 and a count of 2 for query 4.

Tom

Saturday, August 31, 2013 5:33 PM
• Your skeleton code is a mess. You have a magical "id" that changes names from table to table. You meant:

SELECT X.something_id, COUNT(Y.something_id) AS something_cnt
FROM X
LEFT OUTER JOIN Y ON Y.something_id = X.something_id
LEFT OUTER JOIN Z ON Z.something_id = X.something_id
LEFT OUTER JOIN T ON T.something_id = X.something_id
GROUP BY X.something_id;

Here is how OUTER JOINs work in ANSI/ISI Standard SQL and beyond. Assume you are given:

Table1       Table2
a   b        a   c
======       ======
1   w        1   r
2   x        2   s
3   y        3   t
4   z

and the outer join expression:

Table1
LEFT OUTER JOIN
Table2
ON Table1.a = Table2.a      <== join condition
AND Table2.c = 't';      <== single table condition

We call Table1 the "preserved table" and Table2 the "unpreserved table" in the query. What I am going to give you is a little different, but equivalent to the ANSI/ISO standards.

1) We build the CROSS JOIN of the two tables. Scan each row in the result set.

2) If the predicate tests TRUE for that row, then you keep it. You also remove all rows derived from it from the CROSS JOIN

3) If the predicate tests FALSE or UNKNOWN for that row, then keep the columns from the preserved table, convert all the columns from the unpreserved table to NULLs and remove the duplicates.

So let us execute this by hand:

Let @ = passed the first predicate
Let * = passed the second predicate

Table1 CROSS JOIN Table2
a   b        a   c
=========================
1   w       1   r @
1   w       2   s
1   w       3   t *
2   x       1   r
2   x       2   s @
2   x       3   t *
3   y       1   r
3   y       2   s
3   y       3   t @* <== the TRUE set
4   z       1   r
4   z       2   s
4   z       3   t *

Table1 LEFT OUTER JOIN Table2
a   b        a   c
=========================
3   y     3      t      <= only TRUE row
-----------------------
1   w     NULL   NULL   Sets of duplicates
1   w     NULL   NULL
1   w     NULL   NULL
-----------------------
2   x     NULL   NULL
2   x     NULL   NULL
2   x     NULL   NULL
3   y     NULL   NULL  <== derived from the TRUE set - Remove
3   y     NULL   NULL
-----------------------
4   z     NULL   NULL
4   z     NULL   NULL
4   z     NULL   NULL

the final results:

Table1 LEFT OUTER JOIN Table2
a   b        a   c
=========================
1   w     NULL   NULL
2   x     NULL   NULL
3   y     3      t
4   z     NULL   NULL

The basic rule is that every row in the preserved table is represented in the results in at least one result row.

There are limitations and very serious problems with the extended equality version of an outer join used in some diseased mutant products. Consider the two Chris Date tables

Suppliers        SupParts
sup_nbr            sup_nbr part_nbr qty
=========        ==============
S1               S1   P1    100
S2               S1   P2    250
S3               S2   P1    100
S2   P2    250

and let's do an extended equality outer join like this:

SELECT *
FROM Supplier, SupParts
WHERE Supplier.sup_nbr *= SupParts.sup_nbr
AND qty < 200;

If I do the outer first, I get:

Suppliers LOJ SupParts
sup_nbr sup_nbr part_nbr qty
=======================
S1     S1   P1    100
S1     S1   P2    250
S2     S2   P1    100
S2     S2   P2    250
S3   NULL  NULL   NULL

Then I apply the (qty < 200) predicate and get

Suppliers LOJ SupParts
sup_nbr sup_nbr part_nbr qty
===================
S1   S1   P1    100
S2   S2   P1    100

Doing it in the opposite order

Suppliers LOJ SupParts
sup_nbr sup_nbr part_nbr qty
===================
S1   S1   P1    100
S2   S2   P1    100
S3   NULL NULL  NULL

Sybase did it one way, Oracle did it the other and Centura (nee Gupta) let you pick which one -- the worst of both non-standard worlds! In ANSI/ISO Standard SQL, you have a choice and can force the order of execution. Either do the predicates after the join ...

SELECT *
FROM Supplier
LEFT OUTER JOIN
SupParts
ON Supplier.sup_nbr = SupParts.sup_nbr
WHERE qty < 200;

... or do it in the joining:

SELECT *
FROM Supplier
LEFT OUTER JOIN
SupParts
ON Supplier.sup_nbr = SupParts.sup_nbr
AND qty < 200;

Another problem is that you cannot show the same table as preserved and unpreserved in the extended equality version, but it is easy in ANSI/ISI Standard SQL. For example to find the students who have taken Math 101 and might have taken Math 102:

SELECT C1.student, C1.math, C2.math
FROM (SELECT * FROM Courses WHERE math = 101) AS C1
LEFT OUTER JOIN
(SELECT * FROM Courses WHERE math = 102) AS C2
ON C1.student = C2.student;

--CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

Saturday, August 31, 2013 6:31 PM