Introduction

How to delete all rows from a table except the Top (N) rows, for example, leaving only the top (1000) rows? This article is the right answer to such questions and others like this oneAll Code samples in this article are downloadable from this link

Problem

To explore this scenario we need a test database. The following code creates a sample table and inserts just five rows into it.

CREATE TABLE ErrorLog
    (
      ErrorLogId BIGINT ,
      ErrorDate DATETIME
    );
GO
 
INSERT  dbo.ErrorLog
        ( ErrorLogId, ErrorDate )
VALUES  ( 1, '2015-01-01' ),
        ( 2, '2015-01-02' ),
        ( 3, '2015-01-03' ),
        ( 4, '2015-01-04' ),
        ( 5, '2015-01-05' )
GO

The problem is to delete all rows but do not touch the top (2) rows ( based on the ErrorDate values).  The script to solve this problem would be the following:

DELETE 
FROM dbo.ErrorLog
WHERE  
ErrorLogId NOT IN ( SELECT TOP ( 2 )
                            ErrorLogId
                    FROM    dbo.ErrorLog
                    ORDER BY ErrorDate )

Problem solved.
But is this a good - efficient - solution?
As illustrated in the next picture, this query scans the index two times . This approach is fine for smaller tables but could be a problem with huge tables  What solution could offer better performance?

Solutions

Huge Table

Deleting data from a huge table takes a long time to complete. In such a situation, we have two alternatives:

Truncate Table

In this solution, we can insert the remaining data into a temporary table, truncate the original and finally insert data back in from the temporary table. The following script illustrates the approach:

SELECT TOP 2 *
INTO #temp
FROM dbo.ErrorLog
ORDER BY ErrorDate
 
--fast delete
TRUNCATE TABLE dbo.ErrorLog
 
INSERT dbo.ErrorLog
SELECT *
FROM #temp ;

Delete in Chunks

Truncate table has some restrictions. One important ( and unfortunately quite common) limitation is that we cannot truncate a table which is referenced by a Foreign Key constraint. If we cannot use truncate table, the delete command is the alternative to use. In order to reduce the transaction size, we can use an old technique and apply the delete in segments. The following script illustrates this approach:

SELECT TOP 2
        *
INTO    #temp
FROM    dbo.ErrorLog
ORDER BY ErrorDate
 
-- delete in chunks
DECLARE @RowCount INT = 1;
  
WHILE @RowCount > 0
    BEGIN
        BEGIN TRANSACTION;
        DELETE TOP ( 25000 )
                dbo.ErrorLog
        WHERE   ErrorLogId NOT IN ( SELECT  ErrorLogId
                                    FROM    #temp );
  
        SET @RowCount = @@ROWCOUNT;
        COMMIT TRANSACTION;
    END

Small Table

Back to the problem section, we saw one solution to solve this issue for small tables. If user is using SQL Server 2012 and above , he can use OFFSET N ROWS in Order BY clause that help in achieving better performance in this problem. The following code show this solution:

;WITH cte AS
    (
    SELECT  ErrorLogId
    FROM    dbo.ErrorLog
    ORDER BY ErrorDate
    OFFSET 2 ROWS
    )
DELETE cte ;

As illustrated in the next picture, this query scans the index one time which leads to better query performance. Because of existing this new feature, we could get rid of NOT IN. So, we can directly delete the final rows.

Conclusion

This problem shows how OFFSET N ROWS as an extra feature introduced in SQL Server 2012 can impact our queries. Also, we saw three traditional solutions for solving this problem.

See Also