String filters on a columnstore indexed table, where you apply a filter containing string literals directly to a string column that is contained in the index, do not get pushed down into the columnstore index scan operator in the storage engine. Usually this is not a problem. The string filter operations still can work in batch mode and are typically quite fast. As an example, consider this query: select d.YearNum, d.MonthNumOfYear, m.Category, SUM(p.PriceRetail) as PriceRetail, COUNT(*) as [Count] from dbo.Purchase p, dbo.Date d, dbo.Media m where p.MediaId = m.MediaId and d.DateId = p.Date and m.Category in ('Horror', 'Action', 'Comedy') and p.PriceRetailCurrency in ( N'British Pound', N'Euro') group by d.YearNum, d.MonthNumOfYear, m.Category order by m.Category, d.YearNum, d.MonthNumOfYear The query plan for this query has a section like the following: The highlighted operator is a filter to evaluate p.PriceRetailCurrency in ( N'British Pound', N'Euro'). It runs in batch mode but is not used to eliminate data early during the scan. Nevertheless, data is eliminated early during the scan for the join condition with the Media table. Overall, this is a good plan and is not a cause for concern. The plan only takes a quarter of a second to run. But if the data volume is very large, you may notice that performance is not ideal. The cost of moving data out of the storage engine, even in batch mode, may be significant. In addition, segment elimination will not be performed for filters on strings. Segment elimination is discussed in a separate topic. Just as described above in the discussion of performance issues when joining on strings, you can work around string filter performance issues by normalizing the strings into dimension tables and replacing them with integer surrogate keys, or you can encode the information contained in the string directly into an integer.
Sometimes you need to filter on string columns of a large table that has a columnstore index, because it may not always be possible or convenient to factor the strings out into a separate dimension table. In this case, a helpful trick can be to group on the string column first, then filter after the grouping. GROUP BY on string columns is normally extremely fast. Filter operations on strings involving string functions such as LIKE and CHARINDEX can be somewhat slow, even when they run in batch mode. Provided that the string column is low cardinality (i.e. has a relatively low number of discrete values), you may benefit from grouping first and filtering second. For example, consider the following query: select PriceWholesaleCurrency, count(*) from dbo.Purchase where PriceWholesaleCurrency like 'E%' group by PriceWholesaleCurrency This query take 3 seconds to run. The following equivalent query takes only 0.08 seconds.
select T.* from (select PriceWholesaleCurrency, count(*) c from dbo.Purchase group by PriceWholesaleCurrency) as T where T.PriceWholesaleCurrency like 'E%'
The reason the second query is so much faster is that GROUP BY on strings is a lot faster than LIKE, and after the GROUP BY, only a few rows are left to which to apply the LIKE filter. The first of the two queries above has to perform the LIKE computation on every row. This rewrite follows the principal established in other examples -- reduce the number of rows as much as possible before resorting to expensive operations (such as row mode processing, or in this case, evaluation of an expensive string expression). Simple string filters that use equality checks are usually quite fast, so you don't need to worry about query rewrites like this. For example, this query runs in 0.07 seconds: select PriceWholesaleCurrency, count(*) from dbo.Purchase where PriceWholesaleCurrency = 'USD' or PriceRetailCurrency = 'EUR' group by PriceWholesaleCurrency Return to main SQL Server columnstore index tuning page.