Marking duplicates in a table
-
Friday, January 18, 2013 2:55 PM
I have a table where there are multiple duplicates in one of the columns. My need is to find those duplicate values and mark all but one of them as a duplicate. This should be fairly simple, but most of the solutions I've found have to deal with deleting or just removing the dupes in a SELECT.
My output needs to resemble:
findingkey dupefinding ----- ---- 2.38116 0 2.38116 1 2.38116 1 25.45022 0 25.45022 1 25.45022 1
I've written the following query, but it obviously marks ALL the duplicate records. Have been trying to come up with a solution that doesn't mark the first occurrence as s duplicate.
;WITH cte_dupes AS ( SELECT findingkey, COUNT(*) totalCount FROM excelDataRepository WHERE dataSetID=1 GROUP BY findingKey HAVING COUNT(*) >1 ) UPDATE excelDataRepository SET dupeFinding = 1 FROM excelDataRepository as ee INNER JOIN cte_dupes as dupes ON ee.findingKey = dupes.findingKey WHERE dupes.totalCount > 1Would appreciate the help once again. Thanks.
- Edited by ms960 Friday, January 18, 2013 2:56 PM
All Replies
-
Friday, January 18, 2013 3:23 PM
;WITH cte_dupes AS ( SELECT findingkey, dupefinding, Row_Number() Over (Partition By findingkey Order By findingkey) As rn FROM excelDataRepository WHERE dataSetID=1 ) UPDATE cte_dupes SET dupeFinding = 1 WHERE rn > 1That will mark all but one of each findingkey. Note that with the above, you don't know which of the duplicates with be marked. If you have an algorithm for which of the duplicates should be the one selected to not be marked, just change the order by in the Over clause so that the one you don't want marked comes first.
As with any update, carefully test and be sure to have a good backup before running against a production database.
Tom
- Marked As Answer by ms960 Friday, January 18, 2013 3:30 PM
-
Friday, January 18, 2013 3:31 PM
I was inching closer but your solution solves my problem. Thank you very much!;WITH cte_dupes AS ( SELECT findingkey, dupefinding, Row_Number() Over (Partition By findingkey Order By findingkey) As rn FROM excelDataRepository WHERE dataSetID=1 ) UPDATE cte_dupes SET dupeFinding = 1 WHERE rn > 1That will mark all but one of each findingkey. Note that with the above, you don't know which of the duplicates with be marked. If you have an algorithm for which of the duplicates should be the one selected to not be marked, just change the order by in the Over clause so that the one you don't want marked comes first.
As with any update, carefully test and be sure to have a good backup before running against a production database.
Tom
-
Friday, January 18, 2013 3:32 PMModerator
;WITH cte_dupes AS ( SELECT findingkey, dupeFinding, ROW_NUMBER() OVER(PARTITION BY findingkey ORDER BY findingkey ) rn FROM excelDataRepository WHERE dataSetID=1 ) UPDATE cte_dupes SET dupeFinding = CASE WHEN rn=1 THEN 0 ELSE 1 END

