Eliminate duplicate data and merge after removing duplicates

Answered Eliminate duplicate data and merge after removing duplicates

  • Wednesday, September 19, 2012 3:47 PM
     
     

    I need to emilinate duplicates for cars based on vin number, license number and year of car. I m little confused on what exactly do we need to check on either license number or vin number. Problem with data is some records have license number and no vin number and some of them have vin number and license. I need to check correct way of eliminating duplicates and then merge some thing like below

    year                  license              vin

                             dgdsg123        

    NULL                dgdsg123          dsgdsghsh

    2000                NULL                  NULL

    then i need to merge above data and get something like

    year                  license              vin

    2000                 dgdsg123         dsgdsghsh

    Thanks in advance for help

All Replies

  • Wednesday, September 19, 2012 3:50 PM
    Moderator
     
     Answered

    How about

    select max(year) as year, license, max(vin) as vi

    from DataTable WHERE License IS NOT NULL

    GROUP BY license

    HAVING COUNT(*) > 1

    We can do the same queries for Year or Win, assuming that two same non-null values in one of the column determine duplicates.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


  • Wednesday, September 19, 2012 8:37 PM
     
      Has Code

    I'm going to assume that you actually have a rego column and an ID column.  All tables should have an ID.  This code first creates a sample table to demonstrate the query after it.  You only need the query portion.

    -- create sample table and populate
    create table car (id int identity, rego varchar(8), [year] int, license varchar(20), vin varchar(50))
    insert car select 'ABC123', null, 'dgdsg123', null
    insert car select 'ABC123', null, 'dgdsg123', 'dsgdsghsh'
    insert car select 'ABC123', 2000, null, null
    insert car select 'XYZ111', 2012, 'abcd', 'xxyyxx'
    
    -- this first query chooses the first record of each
    -- group (rego) and updates it from the "max"
    -- subquery which arbitrarily keeps the best (max)
    -- data from each column of that group
    ;with C as
    (
    	select row_number() over (
    		partition by rego order by id) rn, *
    	from car
    )
    update C set
    	[year] = U.[year],
    	license = U.license,
    	vin = U.vin
    from (
    	select rego, max([year]) [year], max(license) license, max(vin) vin
    	from car
    	group by rego) U
    where U.rego = C.rego
      and c.rn = 1
    
    -- this next query removes all the duplicates
    -- other than the ones just updated above
    ;with C as
    (
    	select row_number() over (
    		partition by rego order by id) rn, *
    	from car
    )
    delete C where rn > 1
    

    If you don't have a rego column, and the license is actually the rego, then this is unwinnable because how does one know which record the "year 2000 with no license/vin" belongs to with no other correlation??

  • Wednesday, September 19, 2012 10:30 PM
     
     

    Hi Cyberkiwi

    Can you tell me what is Rego column i have only vin, license and Year in my table?

  • Wednesday, September 19, 2012 10:50 PM
     
     Answered Has Code

    Then there is almost no chance of winning this battle, if your table has ONLY these three columns and many many records.

    Consider the sample dataset

    NULL    dgdsg123     NULL   
    NULL    dgdsg123     dsgdsghsh
    2000    NULL         NULL
    2002    xya          1234567
    2002    xyc          NULL
    2002    null         goobledygook
    2001    NULL         NULL
    2003    NULL         abced
    NULL    xyx          NULL
    

    How can you possible say which license/vin the year 2001 belongs to?  Or, which of the VINs "abcde" and "goobledygook" should be paired up with license "xyx"?
    • Marked As Answer by SQL Insane Monday, September 24, 2012 4:46 AM
    •