Even though creating a columnstore index on a table makes the table non-updateable, meaning that bulk load and update DML operations on the table will fail, you can still add data to a table with a columnstore index with partition management operations. This can make your ETL process run faster because the alternative is to drop the columnstore index, do your load or update, and then rebuild the index. Since older data is normally not updated, this can result in losing the work required to index older data, and then having to redo it. The following example shows how to add data using partition switching.
This example shows how to add data to a partitioned table with a columnstore index
using partition switching operations, in a way that can be repeated every time new
data needs to be added, assuming data is added once a day.
-- Clean up objects left behind from prior runs.
if object_id('Fact', 'U') is not null
drop table Fact;
if object_id('Staging', 'U') is not null
drop table Staging;
if exists (select * from sys.partition_schemes where name = 'ps')
drop partition scheme ps;
if exists (select * from sys.partition_functions where name = 'pf')
drop partition function pf;
-- Create a partition function with 3 partitions, and an associated partition scheme.
create partition function pf (int) as range left for values (20111012, 20111013);
create partition scheme ps as partition pf all to ([PRIMARY]);
-- Create a partitioned table.
create table Fact (date_key int not null, measure int)
-- Add some data, leaving an empty partition on each end.
insert Fact values(20111013, 10);
-- Create a partition-aligned columnstore index on the table.
create nonclustered columnstore index ncci on Fact(date_key, measure);
-- Show the data and the partition number of each row.
select $partition.pf(date_key) as partition_number, *
-- Split the uppermost partition (which is now empty, so this is a fast, metadata-only operation).
alter partition function pf() split range (20111014);
-- Create a staging table with the same schema as the partitioned table,
-- to hold data to be appended to it via a partition swtich.
create table Staging(date_key int not null, measure int);
-- Create a constraint so that the staging table has the same bounds as the newly created partition.
alter table Staging with check add constraint bounds check (date_key > 20111013 and date_key <= 20111014) ;
-- Add data to the staging table.
insert Staging values (20111014, 60);
-- Create a columnstore on the staging table with the same columns in it as the
-- columnstore index on the target partitioned table.
create nonclustered columnstore index ncci on Staging(date_key, measure);
-- Switch the staging table into the proper empty partition of the target table.
-- Afterwards, an empty partition remains as the rightmost partition,
-- so it can be split later as a metadata-only operation.
declare @p int = $partition.pf(20111014);
alter table Staging switch to Fact partition @p;
-- Retrieve the rows in the partitioned table, showing the partition of each row.
Existing techniques and guidelines for using SQL Server table partitioning are documented in other places, and in general they still apply with columnstore indexes. Below are links to other partitioning resources:
Thanks for posting these, Eric. I have a 1tb table (compressed) that's already been separated into live/archive so that I could do this when 2012 came out. We'll see how it does, and if it's as fast as you guys have said for our workload. Thanks also for all the other workarounds - I've written them down since I have no doubt we'll use them. FWIW, ColumnStore and AlwaysOn are responsible for our early adoption of 2012.
How would this work in an Always On setup and when you have per partition schema two data files. I assume having all the partitons on on data file doesn't make much sense in reality since columnstore index is for big amount of data.