none
Delete duplicate rows from table RRS feed

  • Question

  • Hi,

    I have a table with more than 1000 records. The columns look like this

    Import id   |   Reference   |  Date  |  Username  | Filename

    Some of the rows are duplicates except for the fact that they have different values for the first column alone (import id)

    11111      test   21-07-2015   xxxxxx   yyyyyy

    22222      test   21-07-2015   xxxxxx   yyyyyy

    Is it possible to delete these kind of duplicate rows by ignoring the import id ? 

    Tuesday, July 21, 2015 1:48 PM

Answers

  • Hi ganeshsg,

    To delete the duplicated rows in your case, the below demonstration would work.

    USE TestDB;
    GO
    
    CREATE TABLE TBL
    ([Import id] INT,
     Reference VARCHAR(99), 
     [Date] DATE,  
     Username VARCHAR(99),
     [Filename] VARCHAR(99));
    GO
    
    INSERT INTO TBL 
    VALUES
    (11111,'test','20150721','xxxxxx','yyyyyy'),
    (22222,'test','20150721','xxxxxx','yyyyyy'),
    (33333,'test2','20150721','xxxxxx','yyyyyy');
    
    SELECT * FROM TBL -- 3 rows
    
    DELETE T FROM TBL T
    WHERE EXISTS( SELECT 1 FROM TBL WHERE Reference=t.Reference 
    								  AND [Date]=T.[Date]
    								  AND Username=T.Username
    								  AND [Filename]=t.[Filename]
    								  AND [Import id]<t.[Import id])
    SELECT * FROM TBL -- 2 rows
    
    DROP TABLE TBL

    If you have any question, feel free to let me know.

    Eric Zhang
    TechNet Community Support

    Wednesday, July 22, 2015 6:07 AM
    Moderator
  • ;with cte as (select *, row_number() over (partition by Reference, [Date], UserName, FileName order by [Import Id]) as Rn from DataFile)

    delete from cte where Rn > 1 -- this will delete all duplicates with greater import id and same other columns


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Tuesday, July 21, 2015 2:47 PM
    Moderator
  • CREATE TABLE #table (Import_id INT, Reference VARCHAR(10), Date DATE,  Username VARCHAR(20), Filename VARCHAR(20))
    INSERT INTO #table (Import_id, Reference, Date, Username, Filename) VALUES
    (11111, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (22222, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (33333, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (44444, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (55555, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (66666, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (77777, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (55555, 'test1', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (66666, 'test1', '2015-07-21', 'xxxxxx', 'yyyyyy')
    
    select * from #table
    
    ;WITH Duplicates AS 
    (
    SELECT
          Import_id,  ROW_NUMBER() OVER( PARTITION BY reference,date,username,filename
              ORDER BY  reference ) AS rownum
    		  from #table
    )
    
    DELETE Duplicates
    WHERE rownum > 1
    Hope it helps!

    Please click "Mark As Answer" if my post helped.


    Tuesday, July 21, 2015 2:56 PM

All replies

  • Read my blog for the subject

    http://dimantdatabasesolutions.blogspot.co.il/2007/02/dealing-with-duplicates.html


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Tuesday, July 21, 2015 1:50 PM
    Answerer
  • Hi,

    I read your blog. But still my case here is a bit different since I have a delta with one column. The import id column is different and ignoring it other columns are duplicates. I want to ignore this column and then find the duplicates!!!!!

    Tuesday, July 21, 2015 1:56 PM
  • Identify and Delete duplicate records from a table.

    Check the 1st method, and within the OVER() clause, use all columns except the first to ignore it.


    ~manoj | email: http://scr.im/m22g
    http://sqlwithmanoj.wordpress.com
    MCCA 2011 | My FB Page

    Tuesday, July 21, 2015 2:03 PM
  • Try an approach like this:

    CREATE TABLE #table (Import_id INT, Reference VARCHAR(10), Date DATE,  Username VARCHAR(20), Filename VARCHAR(20))
    INSERT INTO #table (Import_id, Reference, Date, Username, Filename) VALUES
    (11111, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (22222, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (33333, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (44444, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (55555, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (66666, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (77777, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (55555, 'test1', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (66666, 'test1', '2015-07-21', 'xxxxxx', 'yyyyyy')
    
    SELECT *
     INTO #temp
      FROM #table t
     WHERE Import_id = (SELECT MAX(import_ID) FROM #table WHERE Reference = t.Reference AND Date = t.Date AND Username = t.Username AND Filename = t.Filename)
    
    TRUNCATE TABLE #table
    
    INSERT INTO #table
    SELECT *
      FROM #temp
    
    DROP TABLE #temp
    
    SELECT *
      FROM #table
    
    
    DROP TABLE #table
    I'm assuming you have more dupes than clean rows, so we're staging the clean rows in a temp table, truncating the actual table, and then putting the clean rows back again.


    Don't forget to mark helpful posts, and answers. It helps others to find relevant posts to the same question.

    Tuesday, July 21, 2015 2:40 PM
  • ;with cte as (select *, row_number() over (partition by Reference, [Date], UserName, FileName order by [Import Id]) as Rn from DataFile)

    delete from cte where Rn > 1 -- this will delete all duplicates with greater import id and same other columns


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Tuesday, July 21, 2015 2:47 PM
    Moderator
  • CREATE TABLE #table (Import_id INT, Reference VARCHAR(10), Date DATE,  Username VARCHAR(20), Filename VARCHAR(20))
    INSERT INTO #table (Import_id, Reference, Date, Username, Filename) VALUES
    (11111, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (22222, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (33333, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (44444, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (55555, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (66666, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (77777, 'test', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (55555, 'test1', '2015-07-21', 'xxxxxx', 'yyyyyy'),
    (66666, 'test1', '2015-07-21', 'xxxxxx', 'yyyyyy')
    
    select * from #table
    
    ;WITH Duplicates AS 
    (
    SELECT
          Import_id,  ROW_NUMBER() OVER( PARTITION BY reference,date,username,filename
              ORDER BY  reference ) AS rownum
    		  from #table
    )
    
    DELETE Duplicates
    WHERE rownum > 1
    Hope it helps!

    Please click "Mark As Answer" if my post helped.


    Tuesday, July 21, 2015 2:56 PM
  • Hi ganeshsg,

    To delete the duplicated rows in your case, the below demonstration would work.

    USE TestDB;
    GO
    
    CREATE TABLE TBL
    ([Import id] INT,
     Reference VARCHAR(99), 
     [Date] DATE,  
     Username VARCHAR(99),
     [Filename] VARCHAR(99));
    GO
    
    INSERT INTO TBL 
    VALUES
    (11111,'test','20150721','xxxxxx','yyyyyy'),
    (22222,'test','20150721','xxxxxx','yyyyyy'),
    (33333,'test2','20150721','xxxxxx','yyyyyy');
    
    SELECT * FROM TBL -- 3 rows
    
    DELETE T FROM TBL T
    WHERE EXISTS( SELECT 1 FROM TBL WHERE Reference=t.Reference 
    								  AND [Date]=T.[Date]
    								  AND Username=T.Username
    								  AND [Filename]=t.[Filename]
    								  AND [Import id]<t.[Import id])
    SELECT * FROM TBL -- 2 rows
    
    DROP TABLE TBL

    If you have any question, feel free to let me know.

    Eric Zhang
    TechNet Community Support

    Wednesday, July 22, 2015 6:07 AM
    Moderator