none
Need to build a job that purges old data and rebuilds indexes

    Question

  • I need to put together a job that purges old data as well as rebuilds indexes.  Just not sure if these two things should be done sequentially or separately with the index rebuild done nightly?

    Id like to setup a config table where the date cutoff is kept, and used on the filter for purging records older than that.

    Only thing is, the initial run would involve hundreds of thousands of records in tables that have not yet been purged.  That means taking frequent tlog backups so as not to fill up the transaction log.

    Any suggestions?

    Monday, October 28, 2013 6:36 PM

Answers

  • Jeremy,

    setting the database to BULK_LOGGED will not work! DELETE is NOT supported by bulk logged! I've made a few tests with the following table structure:

    -- Create a demo db
    IF db_id('DeleteRecord') IS NOT NULL
    BEGIN
    	ALTER DATABASE DeleteRecord SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    	DROP DATABASE DeleteRecord;
    END
    
    CREATE DATABASE DeleteRecord
    ON PRIMARY 
    ( 
    	NAME = N'DelRecord', 
    	FILENAME = N'S:\Backup\DelRecord.mdf',
    	SIZE = 100MB,
    	MAXSIZE = 1000MB,
    	FILEGROWTH = 100MB
    )
    LOG ON 
    (
    	NAME = N'DelRecord_log',
    	FILENAME = N'S:\Backup\DelRecord.ldf',
    	SIZE = 100MB,
    	MAXSIZE = 1GB,
    	FILEGROWTH = 100MB
    );
    GO
    
    -- change the recovery model to SIMPLE
    ALTER DATABASE DeleteRecord SET RECOVERY SIMPLE;
    GO
    
    -- Now create a table with a clustered index on an Id and add 100.000 records
    USE DeleteRecord;
    GO
    
    CREATE TABLE dbo.bigtable
    (
    	Id	int				NOT NULL	IDENTITY (1, 1),
    	c1	char(100)		NOT NULL	DEFAULT ('only stupid stuff'),
    	c2	varchar(100)	NOT NULL	DEFAULT ('more stupid stuff'),
    	c3	date			NOT NULL	DEFAULT (getdate()),
    
    	CONSTRAINT pk_bigTable PRIMARY KEY CLUSTERED (Id)
    );
    GO
    
    SET NOCOUNT ON
    GO
    
    INSERT INTO dbo.bigtable DEFAULT VALUES
    GO 100000

    The above database will have a table [dbo].[bigtable] which gets 100.000 records. Half of the records should be deleted. I've choosen two differnent scenarios:

    Scenario 1:

    Deleting the half of the record with DELETE-Operation

    BEGIN TRANSACTION
    	-- delete half of the records
    	DELETE	dbo.bigTable WHERE Id % 2 = 0;
    
    	-- rebuld the index becaus it's fragmented
    	ALTER INDEX ALL ON dbo.bigTable REBUILD;
    
    	SELECT	database_transaction_log_bytes_used
    	FROM	sys.dm_tran_database_transactions
    	WHERE	database_id = db_id('DeleteRecord');
    ROLLBACK TRANSACTION

    The above script delete all even records and - and it has to be covered, too! - afterwards the fragmented clustered index (low density) has to be rebuild. Than the amount of used bytes in the transaction log will be evaluated.

    Scenario 2:

    TRUNCATE table after half of the records have been moved to a staging table

    IF OBJECT_ID('dbo.bigtable_intermediate', 'U') IS NOT NULL
    	DROP TABLE dbo.bigtable_intermediate;
    	GO
    
    -- Different scenario
    BEGIN TRANSACTION
              -- Bulk logged
    	SELECT	*
    	INTO	dbo.bigtable_intermediate
    	FROM	dbo.bigtable
    	WHERE	Id % 2 = 0;
    	
              -- minimal logged because DDL-Operation
    	TRUNCATE TABLE dbo.bigtable;
    	
    	SET IDENTITY_INSERT dbo.bigTable ON;
    	INSERT INTO dbo.bigtable (Id, c1, c2, c3)
    	SELECT Id, c1, c2, c3 FROM dbo.bigtable_intermediate
              ORDER BY Id;
    	SET IDENTITY_INSERT dbo.bigtable OFF;
    	
    	SELECT	database_transaction_log_bytes_used
    	FROM	sys.dm_tran_database_transactions
    	WHERE	database_id = db_id('DeleteRecord');
    ROLLBACK TRANSACTION

    The most interesting is the amount of transaction log used by each dedicated operation type:

    DELETE-Operation
    SIMPLE:	11.064.328 Bytes
    FULL:	18.086.744 Bytes
    BULK:	11.064.304 Bytes
    
    TRUNCATE-Operation
    SIMPLE:	12.006.508 Bytes
    FULL:	19.091.820 Bytes
    BULK:	12.006.508 Bytes

    As you can see both scenarios will have at least the same logged workload. The reason is a quite simple one... The pages have been allocated before the load operation. When the bulk load operation causes a new page to be allocated, all of the rows sequentially filling that new page are minimally logged.Rows inserted into pages that are allocated before the bulk load operation occurs are still fully logged, as are rows that are moved as a result of page splits during the load. This means that for some tables, you may still get some fully logged inserts!

    You can speed up the bulk loads by using TF 610. BUT...

    When you commit a bulk load transaction that was minimally logged, all of the loaded pages must be flushed to disk before the commit completes. Any flushed pages not caught by an earlier checkpoint operation can create a great deal of random I/O. Contrast this with a fully logged operation, which creates sequential I/O on the log writes instead and does not require loaded pages to be flushed to disk at commit time.

    These are the measures for the same workloads with activated Traceflag 610!

    DELETE-Operation + TF 610
    Simple:	11.064.256
    FULL:	18.086.696
    BULK:	11.064.296
    
    TRUNCATE-Operation + TF 610
    SIMPLE:	732.392 Bytes
    FULL:	19.323.216 Bytes
    BULK:	732.392 Bytes

    As the DELETE-Operation will not have any impact (because no BULK-Operation) the benefits in the TRUNCATE-scenario are enormous because of two reasons:

    - SELECT INTO is bulk logged operation

    - TRUNCATE is minimally logged because it's DDL

    - INSERT INTO is bulk logged

    To dive deeper into performance boosts with bulk logged operations I would refer to this GREAT article:

    http://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    Tuesday, October 29, 2013 6:46 AM
  • Hallo Shiftbit,

    yes - that's correct. The reason behind is that TRUNCATE is a DDL and DELETE is DML. So DDL change metadata of the underlying object (in rough: it recreates the table with all constraints)

    If you try to drop a table with FK or other constraints which refer to different objects it will fail. And that is exactly the same as TRUNCATE (drop the table and recreate it from the given schema data).

    So the solution with TRUNCATE will only work with "stand alone relations" :)


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    • Marked as answer by shiftbit Tuesday, October 29, 2013 1:38 PM
    Tuesday, October 29, 2013 1:10 PM

All replies

  • Hi shiftbit,

    I would handle them separately as two jobs.   I would start with the purge with the following actions.

    Purge
    During off hours or off peak hours.

    1.  Perform a full backup the database.
    2.  Set the database to bulk-logged.
    3.  Run the purge scripts
    4.  Verify the correct data was purged
    5.  change the database back to Full recovery mode.

    Index maintenance
    1.  After your initial purge run a rebuild/reorganize based on percentage of fragmentation.
    2.  It is good to do scheduled maintenance on the indexes.   Rather than using the maintenance plan try Ola's script here .http://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html.   You probably don't want to perform this action nightly, it also depends on how frequently your data is changing.

    Good luck!

    _______________________________________________________________________________________

    Please click the Mark as answer button if I answered your question, and vote as helpful if this reply helps you. Thank you!

    Monday, October 28, 2013 8:33 PM
  • Jeremy,

    setting the database to BULK_LOGGED will not work! DELETE is NOT supported by bulk logged! I've made a few tests with the following table structure:

    -- Create a demo db
    IF db_id('DeleteRecord') IS NOT NULL
    BEGIN
    	ALTER DATABASE DeleteRecord SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    	DROP DATABASE DeleteRecord;
    END
    
    CREATE DATABASE DeleteRecord
    ON PRIMARY 
    ( 
    	NAME = N'DelRecord', 
    	FILENAME = N'S:\Backup\DelRecord.mdf',
    	SIZE = 100MB,
    	MAXSIZE = 1000MB,
    	FILEGROWTH = 100MB
    )
    LOG ON 
    (
    	NAME = N'DelRecord_log',
    	FILENAME = N'S:\Backup\DelRecord.ldf',
    	SIZE = 100MB,
    	MAXSIZE = 1GB,
    	FILEGROWTH = 100MB
    );
    GO
    
    -- change the recovery model to SIMPLE
    ALTER DATABASE DeleteRecord SET RECOVERY SIMPLE;
    GO
    
    -- Now create a table with a clustered index on an Id and add 100.000 records
    USE DeleteRecord;
    GO
    
    CREATE TABLE dbo.bigtable
    (
    	Id	int				NOT NULL	IDENTITY (1, 1),
    	c1	char(100)		NOT NULL	DEFAULT ('only stupid stuff'),
    	c2	varchar(100)	NOT NULL	DEFAULT ('more stupid stuff'),
    	c3	date			NOT NULL	DEFAULT (getdate()),
    
    	CONSTRAINT pk_bigTable PRIMARY KEY CLUSTERED (Id)
    );
    GO
    
    SET NOCOUNT ON
    GO
    
    INSERT INTO dbo.bigtable DEFAULT VALUES
    GO 100000

    The above database will have a table [dbo].[bigtable] which gets 100.000 records. Half of the records should be deleted. I've choosen two differnent scenarios:

    Scenario 1:

    Deleting the half of the record with DELETE-Operation

    BEGIN TRANSACTION
    	-- delete half of the records
    	DELETE	dbo.bigTable WHERE Id % 2 = 0;
    
    	-- rebuld the index becaus it's fragmented
    	ALTER INDEX ALL ON dbo.bigTable REBUILD;
    
    	SELECT	database_transaction_log_bytes_used
    	FROM	sys.dm_tran_database_transactions
    	WHERE	database_id = db_id('DeleteRecord');
    ROLLBACK TRANSACTION

    The above script delete all even records and - and it has to be covered, too! - afterwards the fragmented clustered index (low density) has to be rebuild. Than the amount of used bytes in the transaction log will be evaluated.

    Scenario 2:

    TRUNCATE table after half of the records have been moved to a staging table

    IF OBJECT_ID('dbo.bigtable_intermediate', 'U') IS NOT NULL
    	DROP TABLE dbo.bigtable_intermediate;
    	GO
    
    -- Different scenario
    BEGIN TRANSACTION
              -- Bulk logged
    	SELECT	*
    	INTO	dbo.bigtable_intermediate
    	FROM	dbo.bigtable
    	WHERE	Id % 2 = 0;
    	
              -- minimal logged because DDL-Operation
    	TRUNCATE TABLE dbo.bigtable;
    	
    	SET IDENTITY_INSERT dbo.bigTable ON;
    	INSERT INTO dbo.bigtable (Id, c1, c2, c3)
    	SELECT Id, c1, c2, c3 FROM dbo.bigtable_intermediate
              ORDER BY Id;
    	SET IDENTITY_INSERT dbo.bigtable OFF;
    	
    	SELECT	database_transaction_log_bytes_used
    	FROM	sys.dm_tran_database_transactions
    	WHERE	database_id = db_id('DeleteRecord');
    ROLLBACK TRANSACTION

    The most interesting is the amount of transaction log used by each dedicated operation type:

    DELETE-Operation
    SIMPLE:	11.064.328 Bytes
    FULL:	18.086.744 Bytes
    BULK:	11.064.304 Bytes
    
    TRUNCATE-Operation
    SIMPLE:	12.006.508 Bytes
    FULL:	19.091.820 Bytes
    BULK:	12.006.508 Bytes

    As you can see both scenarios will have at least the same logged workload. The reason is a quite simple one... The pages have been allocated before the load operation. When the bulk load operation causes a new page to be allocated, all of the rows sequentially filling that new page are minimally logged.Rows inserted into pages that are allocated before the bulk load operation occurs are still fully logged, as are rows that are moved as a result of page splits during the load. This means that for some tables, you may still get some fully logged inserts!

    You can speed up the bulk loads by using TF 610. BUT...

    When you commit a bulk load transaction that was minimally logged, all of the loaded pages must be flushed to disk before the commit completes. Any flushed pages not caught by an earlier checkpoint operation can create a great deal of random I/O. Contrast this with a fully logged operation, which creates sequential I/O on the log writes instead and does not require loaded pages to be flushed to disk at commit time.

    These are the measures for the same workloads with activated Traceflag 610!

    DELETE-Operation + TF 610
    Simple:	11.064.256
    FULL:	18.086.696
    BULK:	11.064.296
    
    TRUNCATE-Operation + TF 610
    SIMPLE:	732.392 Bytes
    FULL:	19.323.216 Bytes
    BULK:	732.392 Bytes

    As the DELETE-Operation will not have any impact (because no BULK-Operation) the benefits in the TRUNCATE-scenario are enormous because of two reasons:

    - SELECT INTO is bulk logged operation

    - TRUNCATE is minimally logged because it's DDL

    - INSERT INTO is bulk logged

    To dive deeper into performance boosts with bulk logged operations I would refer to this GREAT article:

    http://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)


    Tuesday, October 29, 2013 6:46 AM
  • Excellent info Uwe, thank you! I have come across a few other discussions that use a truncate/reload strategy for purging old data - as opposed to using delete.  The trick is that for some tables that have constraints or foreign keys attached to them, you have to drop or disable the constraints first, and then re-enable them it would seem.  Correct?
    • Edited by shiftbit Tuesday, October 29, 2013 1:05 PM fgdfghfg
    Tuesday, October 29, 2013 1:05 PM
  • Hallo Shiftbit,

    yes - that's correct. The reason behind is that TRUNCATE is a DDL and DELETE is DML. So DDL change metadata of the underlying object (in rough: it recreates the table with all constraints)

    If you try to drop a table with FK or other constraints which refer to different objects it will fail. And that is exactly the same as TRUNCATE (drop the table and recreate it from the given schema data).

    So the solution with TRUNCATE will only work with "stand alone relations" :)


    MCM - SQL Server 2008
    MCSE - SQL Server 2012
    db Berater GmbH
    SQL Server Blog (german only)

    • Marked as answer by shiftbit Tuesday, October 29, 2013 1:38 PM
    Tuesday, October 29, 2013 1:10 PM
  • Jeremy,

    setting the database to BULK_LOGGED will not work! DELETE is NOT supported by bulk logged! 

    Thanks for correcting me on that.   It makes sense that BULK_LOGGED is only for BULK operations and not deletes.


    Tuesday, October 29, 2013 2:59 PM