SQL Server: Delete a Huge Amount of Data from a Table

SQL Server: Delete a Huge Amount of Data from a Table



How often do you have a situation when you need to remove old records from a table to free storage for new data? If you have a very large table with billions of records the correct solution is essential for the workload and all other underlying operations:

  • Consider the huge amount of transaction log a DELETE operation will cause
  • If you have a high availability scenario the logged operation need to be transferred to the mirror / AlwaysOn-Secondaries / Log shipping partners
  • Each logged operation is time consuming
  • Resources will be locked until the operation is confirmed

The above topics may give an idea of the correct strategy for deletion. Very often the following recommendation is given in forums:

  • Set the database in SIMPLE mode
  • Make a full backup
  • Delete the data
  • Rebuild the index
  • Return to the FULL recovery mode

The above strategy may work on a single server but cannot be used in a High Availability scenario because it requires the FULL recovery mode! This article compares two different solutions and describes the actions "behind the scene".

Environment for tests

The first script will create a demo database and fill a demo table with 100.000 records. This table will be used for the investigation of all next scenarios:

IF db_id('DeleteRecord') IS NOT NULL
BEGIN

    
ALTER DATABASE DeleteRecord SET SINGLE_USER WITH ROLLBACK IMMEDIATE; 

    
DROP DATABASE DeleteRecord;

END


CREATE
DATABASE DeleteRecord

ON
PRIMARY 

(  

    
NAME = N'DelRecord',  

    
FILENAME = N'S:\Backup\DelRecord.mdf'

    
SIZE = 100MB, 

    
MAXSIZE = 1000MB, 

    
FILEGROWTH = 100MB

)

LOG
ON 


    
NAME = N'DelRecord_log'

    
FILENAME = N'S:\Backup\DelRecord.ldf'

    
SIZE = 100MB, 

    
MAXSIZE = 1GB, 

    
FILEGROWTH = 100MB

);

GO


-- change the recovery model to SIMPLE

ALTER
DATABASE DeleteRecord SET RECOVERY SIMPLE;

GO


-- Now create a table with a clustered index on an Id and add 100.000 records

USE DeleteRecord;

GO


CREATE
TABLE dbo.bigtable

(

   
Id  int                      NOT NULL    IDENTITY (1, 1),

   
c1  char(100)     NOT NULL    DEFAULT ('only stupid stuff'),

   
c2  varchar(100)  NOT NULL    DEFAULT ('more stupid stuff'), 

   
c3  date                    NOT NULL    DEFAULT (getdate()),
  

    
CONSTRAINT pk_bigTable PRIMARY KEY CLUSTERED (Id)

);

GO


SET
NOCOUNT ON

GO


INSERT
INTO dbo.bigtable DEFAULT VALUES

GO 100000

Setting the recovery mode of the database to BULK_LOGGED before usage of DELETE

The above recommendation can be read quite often based on questions for fast deletion of data. This solution is not feasible because the DML-Operation DELETE is a fully logged operation which does not support BULK-operations. The following script runs a DELETE operation against the above table and deletes all even numbered records. This operation will be run in all three recovery modes (SIMPLE, FULL, BULK_LOGGED)

-- Change the recovery model for each run!
ALTER
DATABASE DeleteRecord SET RECOVERY SIMPLE;
GO

BEGIN
TRANSACTION

    
-- delete half of the records

    
DELETE  dbo.bigTable WHERE Id % 2 = 0;


    
-- rebuild the index because it's fragmented

    
ALTER INDEX ALL ON dbo.bigTable REBUILD;

    
SELECT  database_transaction_log_bytes_used

    
FROM    sys.dm_tran_database_transactions

    
WHERE   database_id = db_id('DeleteRecord');

ROLLBACK
TRANSACTION

The DELETE operation is included in a dedicated transaction to measure the amount of log space which will be written within the transaction. After half of the records have been deleted the clustered index need to be rebuild as part of the complete DELETE-process because of a low density in the data pages (only 50% filled) after DELETE has finished. The ROLLBACK TRANSACTION has been used to avoid the rebuild of the test scenario for each different test.
Concerning the above operational process the generated amount of transaction log is as follows:

Recovery model

Transaction Log

 
SIMPLE

 11.065.344

Bytes
BULK_LOGGED

11.096.036

Bytes
FULL

18.086.656

Bytes

The result shows that BULK_LOGGED as a recovery model will not change significantly the amount of produced transaction log. And the result is not really surprising because DELETE is no DML-Operation which supports BULK-operations. If the recovery model is set to FULL then all operations are full logged! So the above strategy doesn't seem to be the best choice concerning the produces amount of transaction log space. Some people recommend to delete data in small portions and backup the log - but this scenario isn't considered for this article.

Use of TRUNCATE instead of DELETE for the deletion of records

The usage of TRUNCATE instead of DELETE seems to be inviting, because TRUNCATE does not log the deletion of each record. TRUNCATE is a DDL (Data Definition Language) operation and not a DML (Data Manipulation Language) operation and only the system tables are part of the transaction. TRUNCATE removes the complete metadata structure of the underlying table and ALL records. So a step in between has to be done: All records which should stay need to be stored in a staging table. The next code demonstrates the process in detail:


-- Drop the staging table

IF OBJECT_ID(
'dbo.bigtable_intermediate', 'U') IS NOT NULL

    
DROP TABLE dbo.bigtable_intermediate;

    
GO


BEGIN
TRANSACTION

    
-- Bulk logged 

    
SELECT  * INTO dbo.bigtable_intermediate 

    
FROM dbo.bigtable 

    
WHERE Id % 2 = 1;


    
-- minimal logged because DDL-Operation 

    
TRUNCATE TABLE dbo.bigtable;


    
SET IDENTITY_INSERT dbo.bigTable ON

    
INSERT INTO dbo.bigtable (Id, c1, c2, c3)

    
SELECT Id, c1, c2, c3 FROM dbo.bigtable_intermediate ORDER BY Id;

    
SET IDENTITY_INSERT dbo.bigtable OFF;

    
SELECT  database_transaction_log_bytes_used

    
FROM    sys.dm_tran_database_transactions

    
WHERE   database_id = db_id('DeleteRecord');

ROLLBACK
TRANSACTION

The above code first checks whether the staging table exists. Due to the fact that a BULK operation will be used ("SELECT INTO") the staging table does not exist. Otherwise the operation will fail. After the remaining records of the database have been copied into the staging table the source table will be rebuild by using TRUNCATE. When the table has been rebuilt the data from the staging table will be transferred back to the original table. An INDEX REBUILD is not necessary because the TRUNCATE will create a new metadata scheme of the table which has no "old pages". By using ORDER BY in the INSERT INTO statement it's guaranteed that no index fragmentation will occur. The results for the amount of transaction log in all recovery modes are as follows:
Recovery Model   Transaction Log  
SIMPLE

12.006.508

Bytes
BULK_LOGGED

12.006.508

 Bytes
FULL

19.091.820

Bytes

The recovery model SIMPLE and BULK_LOGGED perform "best" contrary to the recovery model FULL. The main part of the amount of transaction log will be created by the INSERT INTO operation while in SIMPLE and BULK_LOGGED the two first operations ("SELECT ... INTO" and "TRUNCATE") generate less than 1% of the total amount of transaction log (!!!):
  • SELECT...INTO consumes ~40 Kbytes
  • TRUNCATE consumes ~1 KByte

Why does the amount of transaction log action is so high for the INSERT ... INTO ... SELECT *? The INSERT .. INTO ..-Command can only benefit from a BULK LOGGED operation if it is exclusively locked. An exclusive lock can be forced with the TABLOCK hint as the following scenario shows.


BEGIN
TRANSACTION

    
-- Bulk logged 

    
SELECT  *

    
INTO    dbo.bigtable_intermediate

    
FROM    dbo.bigtable

    
WHERE   Id % 2 = 0;


    
-- minimal logged because DDL-Operation 

    
TRUNCATE TABLE dbo.bigtable;
  

    
-- Bulk logged because target table is exclusivly locked! 

    
SET IDENTITY_INSERT dbo.bigTable ON;

    
INSERT INTO dbo.bigtable WITH (TABLOCK) (Id, c1, c2, c3)

    
SELECT Id, c1, c2, c3 FROM dbo.bigtable_intermediate ORDER BY Id;

    
SET IDENTITY_INSERT dbo.bigtable OFF;


    
SELECT  database_transaction_log_bytes_used

    
FROM    sys.dm_tran_database_transactions

    
WHERE   database_id = db_id('DeleteRecord');

ROLLBACK
TRANSACTION

The difference in the amount of generated transaction log is tremendous:

Recovery Model  Transaction Log  
SIMPLE

89.568

Bytes
BULK_LOGGED

89.568

Bytes
FULL

14.128.668

Bytes

The recovery model FULL will not cover any BULK logged operation and will always force fully logged operations. Only the recovery model SIMPLE and BULK_LOGGED give the possibility to work with minimal logged operations.

Conclusion

Due to the growing amount of data the requirement for fast workloads are growing simultaneously. Microsoft SQL Server will meet these requirements with bulk logged operational functionality. Basically a difference between DML-Operations which support bulk logged operations and DML-operations which does not has to be considered.
DELETE is not a DML-operation which support bulk logging and if a table does not have constraints which prevents a TRUNCATE than the following workload seems to be the best by consideration of bulk logged operations:

  • Separate the data which should not be deleted by using SELECT ... INTO
  • Rebuild the table by using TRUNCATE
  • Fill the new created table by using INSERT INTO... SELECT with TABLOCK as hint for the target table

SELECT...INTO and INSERT INTO both support bulk logged operations and TRUNCATE is a tremendous way to get rid of data because it is only schema changing DDL!


See Also

DELETE-Command: http://technet.microsoft.com/en-us/library/ms191244.aspx
TRUNCATE-Command: http://technet.microsoft.com/en-us/library/ms177570.aspx
INSERT INTO ...: http://technet.microsoft.com/en-us/library/ms174335.aspx



Note: This article refers to the following question(s) in the MSDN forums of Microsoft:
http://social.msdn.microsoft.com/Forums/sqlserver/en-US/6bc6b5c0-99da-4299-9661-7cb96683bdec/need-to-build-a-job-that-purges-old-data-and-rebuilds-indexes?forum=sqldatabaseengine
and
http://social.msdn.microsoft.com/Forums/sqlserver/en-US/76a0976d-4d27-4374-a2e9-c6ca1d83521f/removing-large-number-of-records-with-truncate?forum=sqldatabaseengine

Both threads are searching for equal solutions!
Sort by: Published Date | Most Recent | Most Useful
Comments
  • The article may benefit from the See Also section. I also don't understand a usage of 'relation' in that article. What do you mean by 'relation'

  • Congratulations for winning a TechNet Guru gold medal: blogs.technet.com/.../technet-guru-awards-october-2013.aspx

  • One more comment - please don't include line numbers with the code samples - they may it more difficult to copy code

  • This article was featured on the home page of TNWiki! social.technet.microsoft.com/wiki

  • Dear Naomi,

    promised - next article without numbered lines :)

  • Why do you need ORDER BY id in the INSERT INTO command?

  • I've removed line numbers from code samples and as you can see code formatting somehow changed. It would be great it you'll be able to fix this.

  • I've tried to bring the article (codes) in format.

    Next time I would appreciate to do it by yourself or leaf it as it is.

    As you may see I've promised to post next articles without lines ;)

  • Great Info...Thanks for sharing

Page 1 of 1 (9 items)