none
Need a way to delete duplicate entries.

    Question

  • Good Morning,

    I have identified duplicates in my code table, 4,661 to be exact. I need a way to delete them. I have look at examples out there and when test them I'm not getting the results I like. Here is my select code to identify them.

    Select COUNT(*), AccountNumber, CodeType,CodeValue from DS_PROD.dbo.A01cAccountCodes

    group by AccountNumber,CodeType,CodeValue

    having COUNT(*) >1

    What is the best way to get rid of my duplicates so I can concentrate on why it is occurring?

    Thursday, October 03, 2013 12:30 PM

Answers

All replies

  • One method:

    WITH dups AS (
    	SELECT 
    		ROW_NUMBER() OVER(ORDER BY AccountNumber, CodeType,CodeValue) AS row_num
    	FROM DS_PROD.dbo.A01cAccountCodes
    	)
    DELETE FROM dups
    WHERE row_num > 1;

    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Thursday, October 03, 2013 12:38 PM
  • You ay try the below to identify the duplicates. If everything looks fine, use Delete instead of Select as below:

    ;With cte
    as
    (
    	Select AccountNumber, CodeType,CodeValue, 
    		ROW_NUMBER()over(partition by AccountNumber, CodeType,CodeValue order by AccountNumber, CodeType,CodeValue) Rn
    	from DS_PROD.dbo.A01cAccountCodes
    ) 
    --Delete	
    Select * From cte Where Rn>1
    


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Thursday, October 03, 2013 12:41 PM
  • This is great. But if I wanted to further test it with a single account number could I do the following?

    With cte as
    (
    Select AccountNumber, CodeType,CodeValue, ROW_NUMBER()over(partition by AccountNumber, CodeType,CodeValue order by AccountNumber, CodeType,CodeValue) Rn
    from DS_PROD.dbo.A01cAccountCodes

    Where accountnumber = 78500
    )
    --Delete
    Select * From cte Where Rn>1

    Thursday, October 03, 2013 12:48 PM
  • Just a small piece of advice.

    Find out how many rows are going to be deleted before deleting. You might want to run your deletes in batches if there are lots to be deleted. Running it in batches will avoid excessive log growth.

    Thursday, October 03, 2013 12:48 PM
  • This is great. But if I wanted to further test it with a single account number could I do the following?

    With cte as
    (
    Select AccountNumber, CodeType,CodeValue, ROW_NUMBER()over(partition by AccountNumber, CodeType,CodeValue order by AccountNumber, CodeType,CodeValue) Rn
    from DS_PROD.dbo.A01cAccountCodes

    Where accountnumber = 78500
    )
    --Delete
    Select * From cte Where Rn>1

    Yes, it will work.

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Thursday, October 03, 2013 12:57 PM
  • DECLARE @A01cAccountCodes table (AccountNumber int, CodeType int,CodeValue int)
    
    INSERT INTO @A01cAccountCodes
    SELECT 1,2,3 union all
    SELECT 1,2,3 union all
    SELECT 3,2,3 union all
    SELECT 3,2,3 union all
    SELECT 4,2,3 union all
    SELECT 4,2,3 
    
    select *, ROW_NUMBER() OVER(partition by AccountNumber, CodeType,CodeValue ORDER BY AccountNumber, CodeType,CodeValue) AS row_num from @A01cAccountCodes
    
    delete x FROM 
    (select ROW_NUMBER() OVER(partition by AccountNumber, CodeType,CodeValue ORDER BY AccountNumber, CodeType,CodeValue) AS row_num FROM @A01cAccountCodes)x
    where x.row_num > 1
    
    SELECT * from @A01cAccountCodes


    Regards, Dineshkumar,
    Please 'Mark as Answer' if my post answers your question and 'Vote as Helpful' if it helps you

    Dineshkumar's BI Blog

    Thursday, October 03, 2013 1:04 PM
  • Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Learn how to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect. 

    This is minimal polite behavior on SQL forums. 

    >> I have identified duplicates in my code table, 4,661 to be exact. I need a way to delete them. <<

    Not, you really need a way to prevent the problem! First you mop the floor, then you fix the roof. We need the DDL to help you! 

    SELECT COUNT(*), account_number, code_type, code_value 
      FROM DS_PROD.DBO.A01c_Account_Codes
    GROUP BY account_number, code_type, code_value
    HAVING COUNT(*) > 1;

    >> That is the best way to get rid of my duplicates so I can concentrate on why it is occurring? <<

    Your design is fundamentally wrong!  Ther are no dup in a valid schema 

    An “account_nbr” is fine. But a “code_type” and “code_value” are absurd! An attribute is a “<something>_code” or “<something>_type” but never this hybrid crap. Likewise, “<something>_code” is a scalar value on a nominal scale. What is your zip_code_value or zip_code_type? 

    Have you read even a popular article on ISO-11179 Standards? A professional SQL person would do at least this! 

    I will guess, without any help from your posting, that you have a OTLT (“One True Lookup Table”) with all the encodings for everything in the data model in one table!! 

    Please post DDL and specs.

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Thursday, October 03, 2013 6:47 PM