We've already given an overview of SQL Server columnstore segment (rowgroup) elimination in the fundamentals section. To illustrate further, if you examine sys.column_store_segments, you'll see that the minimum and maximum values are stored for each column segment. The important columns in this table for the purposes of understanding segment elimination are: column_id, segment_id, min_data_id, and max_data_id.
The min_data_id and max_data_id columns identify the minimum and maximum values in segment segment_id for column column_id. We say "identify" because these min and max values may either be literal values, or they may identify those values in some other way, such as with a reference into a dictionary. Assume they actually contain the values for purposes of this discussion. Segment elimination works for number, date, and time data types. Strings aren't supported for segment elimination in SQL Server 2012 RTM.
Suppose there is a table Purchase(Date, ...) where Date is an int column containing date keys of the form YYYYMMDD. Assume the following information is maintained for the Date column:
While scanning the columnstore index for the following query, the SQL Server storage engine would recognize by looking at the information above that only segments 2 and 3 could possibly contain qualifying rows.
select Date, count(*)
where Date >= 20120201
group by Date
Segment 1 would be skipped. Its data wouldn't be read from disk or scanned.
SQL Server can even skip segments using join operations. This is especially useful when joining a fact table that has a columnstore index with a Date dimension table. For example, consider this query:
select p.Date, count(*)
from dbo.Purchase p, dbo.Date d
where p.Date = d.DateId
and d.Year = 2012
and d.Month = 2
group by p.Date
This query will also skip segment 1. In general, segments can be skipped for joins with the Date dimension for simple ranges, or multiple non-overlapping ranges. For example, segments can be skipped for a pattern that identifies Date dimension rows for Saturdays and Sundays over a one year period. Moreover, segments can be skipped based on combinations of joins on multiple columns and direct filter predicates on multiple columns.
In the past, there have been situations where partitions could have been skipped by SQL Server, but they ended up being scanned anyway. This problem was common when trying to do date range elimination via a hash join between a fact able and and a Date dimension. Now, with columnstores, each partition will be composed of one or more segments. Even if a partition is examined during query processing, the segments within it can be skipped if no data in the partition qualifies. So columnstore query processing can effectively skip partitions if they contain no data that qualifies.
Return to main SQL Server columnstore index tuning page.
Hi Eric... thanks for the details! The documentation for sys.column_store_segments says there in one row for each column, but your article here indicates there is one for each segment, and that there can be more than one segment per column. I'm interested in finding out how it is determined how many segments there will be. Thanks again! ~Kalen
Hi Kalen. The sys.column_store_segments catalog view has one row for each segment of each column. Regarding your question, each physical partition of a columnstore index is broken into one-million-row chunks called segments (a.k.a. row groups). The index build process creates as many full segments as possible. Because multiple threads work to build an index in parallel, there may be a few small segments (typically equal to the number of threads) at the end of each partition with the remainder of the data after creating full segments. That's because each thread might hit the end of its input at different times. Non-partitioned tables have one physical partition. -Eric