select
top
1000 *
into
MediaStudyGroup
from
dbo.Media;
Now, consider the following two equivalent queries:
p.
Date
,
count
(*)
dbo.Purchase p
where
p.MediaId
in
(
MediaId
dbo.MediaStudyGroup)
group
by
order
;
exists (
m.MediaId
dbo.MediaStudyGroup m
m.MediaId = p.MediaId)
Both of these queries take about five seconds to run on a 4 core processor. The query plans for them scan the columnstore index on dbo.Purchase but they don't use batch processing. It turns out that you can rewrite the above two queries using a regular (inner) join as follows:
dbo.Purchase p, dbo.MediaStudyGroup m
p.MediaId = m.MediaId
Most of the work of this query runs in batch mode, and it takes only a tenth of a second to run.
The plan for the first two queries looks like the following (click to expand):
Notice that the hash join operator is not a batch mode operator. In fact, none of the operators run in batch mode in this plan.
The plan for the third query (with the inner join replacing IN and EXISTS), is as follows:
Notice the Batch Hash Table Build operator. This is an indication that hash join and aggregation is running fully in batch mode. You can also hover the mouse over the operators in Management Studio's showplan view, and the tooltip will show that the estimated execution mode is Batch.
The key idea here is to use regular (inner) join instead of IN and EXISTS in your queries when possible to get the best performance with columnstores.
Return to main columnstore performance tuning page.