Workarounds for Predicates that Don't Get Pushed Down to Columnstore Scan (Including OR)

Workarounds for Predicates that Don't Get Pushed Down to Columnstore Scan (Including OR)

Sometimes predicates (filters) don't get pushed down to columnstore scans. Common reasons for this include use of predicates on string columns, and use of OR across different columns. A separate topic here discusses how to work around the inability to push down string filters.

If you use OR on a single number or date column (like 't.col = 1 or t.col = 3'), the predicate can be pushed down to the storage engine so there is no problem. If you use OR across multiple columns, it can cause filters to not be pushed down. Usually this is not a significant problem because the filter operator typically will still run in batch mode and be fast. But a lot of data may stream out of the columnstore scan operator and the performance may be less than optimal. In particular, unless a filter is pushed down into the scan, segment elimination won't be performed.

If you absolutely must have the best performance, you can work around the inability to push down OR in some cases using the following logical relationship:

A or B = (A and !B) or (!A and B) or (A and B)


Each clause on the right side of the above equation is mutually exclusive. No row can match more than one of or'ed clauses at a time.

Consider the following query:

select p.Date, sum(p.PriceRetail) as PriceRetail
from dbo.Purchase p
where p.Date = 20071114 or PriceId = 6
group by p.Date

The predicate in the WHERE clause is not pushed down. The query takes 236 ms to run on a 4 core laptop with 101 million rows in Purchase. Using the logical relationship above, you can rewrite the query as follows:

select p.Date, sum(p.PriceRetail)
from
(
select p.Date, sum(p.PriceRetail) as PriceRetail
from dbo.Purchase p with (index=ncci)
where p.Date = 20071114 and PriceId <> 6
group by p.Date
UNION ALL
select p.Date, sum(p.PriceRetail) as PriceRetail
from dbo.Purchase p with (index=ncci)
where p.Date <> 20071114 and PriceId = 6
group by p.Date
UNION ALL
select p.Date, sum(p.PriceRetail) as PriceRetail
from dbo.Purchase p with (index=ncci)
where p.Date = 20071114 and PriceId = 6
group by p.Date
) as p
group by p.Date

This query is correct because the WHERE clauses of each of the three subqueries combined with UNION ALL are mutually exclusive, so each row in dbo.Purchase that qualifies will be counted once and only once. This query gets the same result as the previous one, and runs in only 40 ms. All the WHERE clause filters are pushed down into columnstore scans. This reduces the number of batches that flow out of the storage engine, and eliminates segments effectively. We used the hint index=ncci to force use of the columnstore indexes, because the optimizer wanted to seek the clustered B-tree indexes on Date instead in some cases. We wanted to demonstrate a pure columnstore plan.

Return to main SQL Server columnstore index tuning page.
Sort by: Published Date | Most Recent | Most Useful
Comments
  • To me, the relationship A or B = (A and !B) or (!A and B) or (A and B), though correct, seems overly complicated for this situation. Why not simplify this to A or B = A or (!A and B)?

    In that case, the query would become as follows:

    select p.Date, sum(p.PriceRetail)

    from

    (

    select p.Date, sum(p.PriceRetail) as PriceRetail

    from dbo.Purchase p with (index=ncci)

    where p.Date = 20071114

    group by p.Date

    UNION ALL

    select p.Date, sum(p.PriceRetail) as PriceRetail

    from dbo.Purchase p with (index=ncci)

    where p.Date <> 20071114 and PriceId = 6

    group by p.Date

    ) as p

    group by p.Date;

    I don't have these tables so I can't test it, but I would expect this to be somewhat faster than the query above.

Page 1 of 1 (1 items)