Eliminate duplicate data and merge after removing duplicates
-
Wednesday, September 19, 2012 3:47 PM
I need to emilinate duplicates for cars based on vin number, license number and year of car. I m little confused on what exactly do we need to check on either license number or vin number. Problem with data is some records have license number and no vin number and some of them have vin number and license. I need to check correct way of eliminating duplicates and then merge some thing like below
year license vin
dgdsg123
NULL dgdsg123 dsgdsghsh
2000 NULL NULL
then i need to merge above data and get something like
year license vin
2000 dgdsg123 dsgdsghsh
Thanks in advance for help
All Replies
-
Wednesday, September 19, 2012 3:50 PMModerator
How about
select max(year) as year, license, max(vin) as vi
from DataTable WHERE License IS NOT NULL
GROUP BY license
HAVING COUNT(*) > 1
We can do the same queries for Year or Win, assuming that two same non-null values in one of the column determine duplicates.
For every expert, there is an equal and opposite expert. - Becker's Law
My blog- Edited by Naomi NMicrosoft Community Contributor, Moderator Wednesday, September 19, 2012 11:00 PM
- Marked As Answer by SQL Insane Monday, September 24, 2012 4:46 AM
-
Wednesday, September 19, 2012 8:37 PM
I'm going to assume that you actually have a rego column and an ID column. All tables should have an ID. This code first creates a sample table to demonstrate the query after it. You only need the query portion.
-- create sample table and populate create table car (id int identity, rego varchar(8), [year] int, license varchar(20), vin varchar(50)) insert car select 'ABC123', null, 'dgdsg123', null insert car select 'ABC123', null, 'dgdsg123', 'dsgdsghsh' insert car select 'ABC123', 2000, null, null insert car select 'XYZ111', 2012, 'abcd', 'xxyyxx' -- this first query chooses the first record of each -- group (rego) and updates it from the "max" -- subquery which arbitrarily keeps the best (max) -- data from each column of that group ;with C as ( select row_number() over ( partition by rego order by id) rn, * from car ) update C set [year] = U.[year], license = U.license, vin = U.vin from ( select rego, max([year]) [year], max(license) license, max(vin) vin from car group by rego) U where U.rego = C.rego and c.rn = 1 -- this next query removes all the duplicates -- other than the ones just updated above ;with C as ( select row_number() over ( partition by rego order by id) rn, * from car ) delete C where rn > 1
If you don't have a rego column, and the license is actually the rego, then this is unwinnable because how does one know which record the "year 2000 with no license/vin" belongs to with no other correlation??
-
Wednesday, September 19, 2012 10:30 PM
Hi Cyberkiwi
Can you tell me what is Rego column i have only vin, license and Year in my table?
-
Wednesday, September 19, 2012 10:50 PM
Then there is almost no chance of winning this battle, if your table has ONLY these three columns and many many records.
Consider the sample dataset
NULL dgdsg123 NULL NULL dgdsg123 dsgdsghsh 2000 NULL NULL 2002 xya 1234567 2002 xyc NULL 2002 null goobledygook 2001 NULL NULL 2003 NULL abced NULL xyx NULL
How can you possible say which license/vin the year 2001 belongs to? Or, which of the VINs "abcde" and "goobledygook" should be paired up with license "xyx"?- Marked As Answer by SQL Insane Monday, September 24, 2012 4:46 AM

