none
Custom, partial distinct

    Question

  • Hello,

    I have following table which has repeated codes in the Code column for each names.

    How can I eliminate the rows just consequently duplicated codes for each names?

    CREATE TABLE #T
    (
    Name nvarchar(50)
    ,ID int
    ,Dt datetime
    ,Code  nvarchar(50)
    )

    INSERT INTO #T
    SELECT 'NameA',1, '1.1.2013 00:00','Code1'
    UNION SELECT   'NameA',2, '1.1.2013 01:00','Code1'
    UNION SELECT   'NameA',3, '1.1.2013 02:00','Code2'
    UNION SELECT   'NameA',4, '1.1.2013 03:00','Code2'
    UNION SELECT   'NameA',5, '1.1.2013 04:00','Code1'
    UNION SELECT   'NameA',6, '1.1.2013 05:00','Code1'
    UNION SELECT   'NameB',7, '1.1.2013 00:00','Code1'
    UNION SELECT   'NameB',8, '1.1.2013 01:00','Code1'
    UNION SELECT   'NameB',9, '1.1.2013 02:00','Code2'
    UNION SELECT   'NameB',10, '1.1.2013 03:00','Code2'
    UNION SELECT   'NameB',11, '1.1.2013 04:00','Code1'
    UNION SELECT   'NameB',12, '1.1.2013 05:00','Code1'
    select Name,dt,ID,Code from #T  group by Name,dt,ID,code
    DROP TABLE #T

     I need to get following output that eliminates the rows with IDs: 2-4-6 (for name A)  8-10-12 (for name B)

    NameA---- 01.01.2013 00:00:00---- 1---- Code1



    NameA---- 01.01.2013 02:00:00---- 3---- Code2




    NameA---- 01.01.2013 04:00:00---- 5---- Code1




    NameB---- 01.01.2013 00:00:00---- 7---- Code1




    NameB---- 01.01.2013 02:00:00---- 9---- Code2




    NameB---- 01.01.2013 04:00:00---- 11---- Code1




    thank you...



    • Edited by mech3 Saturday, December 07, 2013 10:17 PM layout
    Saturday, December 07, 2013 10:15 PM

Answers

  • SQL 2012:

    WITH CTE AS (
         SELECT Name, ID, dt, code,
                        prevcode = LAG(code) OVER (PARTITION BY Name ORDER BY ID)
         FROM     #T
    )
    SELECT Name, ID, dt, code
    FROM     CTE
    WHERE    prevcode IS NULL OR prevcode <> code

    LAG() gives you the value for the previous row, as defined by the order clause.

    SQL 2008 and SQL 2005:

    WITH CTE AS (
         SELECT Name, ID, dt, code,
                        rownum = row_number() OVER (PARTITION BY Name ORDER BY ID)
         FROM     #T
    )
    SELECT a.Name, a.ID, a.dt, a.code
    FROM     CTE a
    LEFT     JOIN CTE b ON b.Name = a.Name
                                     AND b.rownum = a.rownum - 1
    WHERE    b.code IS NULL OR b.code <> a.code

    No LAG() here, but you can number the rows and self-join the CTE. Less efficient, though.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Saturday, December 07, 2013 10:25 PM
  • CREATE TABLE #T ( Name nvarchar(50) ,ID int ,Dt datetime ,Code nvarchar(50) ) INSERT INTO #T SELECT 'NameA',1, '1.1.2013 00:00','Code1' UNION SELECT 'NameA',2, '1.1.2013 01:00','Code1' UNION SELECT 'NameA',3, '1.1.2013 02:00','Code2' UNION SELECT 'NameA',4, '1.1.2013 03:00','Code2' UNION SELECT 'NameA',5, '1.1.2013 04:00','Code1' UNION SELECT 'NameA',6, '1.1.2013 05:00','Code1' UNION SELECT 'NameB',7, '1.1.2013 00:00','Code1' UNION SELECT 'NameB',8, '1.1.2013 01:00','Code1' UNION SELECT 'NameB',9, '1.1.2013 02:00','Code2' UNION SELECT 'NameB',10, '1.1.2013 03:00','Code2' UNION SELECT 'NameB',11, '1.1.2013 04:00','Code1' UNION SELECT 'NameB',12, '1.1.2013 05:00','Code1' select Name,dt,ID,Code from #T group by Name,dt,ID,code ; with cte as ( select *, case when lag([name],1,null) over (order by id ) = [name] and lag([code],1,null) over (order by id) = code then 0 else 1 end as FirstRecord from #T ) select * from cte where FirstRecord = 1 DROP TABLE #T


    This includes Name in the condition for the SQL 2012 code

    Russel Loski, MCT, MCSE Data Platform/Business Intelligence. Twitter: @sqlmovers; blog: www.sqlmovers.com

    Saturday, December 07, 2013 10:45 PM

All replies

  • SQL 2012:

    WITH CTE AS (
         SELECT Name, ID, dt, code,
                        prevcode = LAG(code) OVER (PARTITION BY Name ORDER BY ID)
         FROM     #T
    )
    SELECT Name, ID, dt, code
    FROM     CTE
    WHERE    prevcode IS NULL OR prevcode <> code

    LAG() gives you the value for the previous row, as defined by the order clause.

    SQL 2008 and SQL 2005:

    WITH CTE AS (
         SELECT Name, ID, dt, code,
                        rownum = row_number() OVER (PARTITION BY Name ORDER BY ID)
         FROM     #T
    )
    SELECT a.Name, a.ID, a.dt, a.code
    FROM     CTE a
    LEFT     JOIN CTE b ON b.Name = a.Name
                                     AND b.rownum = a.rownum - 1
    WHERE    b.code IS NULL OR b.code <> a.code

    No LAG() here, but you can number the rows and self-join the CTE. Less efficient, though.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Saturday, December 07, 2013 10:25 PM
  • CREATE TABLE #T ( Name nvarchar(50) ,ID int ,Dt datetime ,Code nvarchar(50) ) INSERT INTO #T SELECT 'NameA',1, '1.1.2013 00:00','Code1' UNION SELECT 'NameA',2, '1.1.2013 01:00','Code1' UNION SELECT 'NameA',3, '1.1.2013 02:00','Code2' UNION SELECT 'NameA',4, '1.1.2013 03:00','Code2' UNION SELECT 'NameA',5, '1.1.2013 04:00','Code1' UNION SELECT 'NameA',6, '1.1.2013 05:00','Code1' UNION SELECT 'NameB',7, '1.1.2013 00:00','Code1' UNION SELECT 'NameB',8, '1.1.2013 01:00','Code1' UNION SELECT 'NameB',9, '1.1.2013 02:00','Code2' UNION SELECT 'NameB',10, '1.1.2013 03:00','Code2' UNION SELECT 'NameB',11, '1.1.2013 04:00','Code1' UNION SELECT 'NameB',12, '1.1.2013 05:00','Code1' select Name,dt,ID,Code from #T group by Name,dt,ID,code ; with cte as ( select *, case when lag([name],1,null) over (order by id ) = [name] and lag([code],1,null) over (order by id) = code then 0 else 1 end as FirstRecord from #T ) select * from cte where FirstRecord = 1 DROP TABLE #T


    This includes Name in the condition for the SQL 2012 code

    Russel Loski, MCT, MCSE Data Platform/Business Intelligence. Twitter: @sqlmovers; blog: www.sqlmovers.com

    Saturday, December 07, 2013 10:45 PM
  • >> I have following table which has repeated codes in the Code column for each names. <<

    Please post valid DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. Learn how to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats (you screwed up big time. Nothing was right). Code should be in Standard SQL as much as possible and not local dialect. 

    What you did post cannot ever be a table. You used the magical default NVARCHAR(50) that is a common MS default, not a careful design. Do you really need enough Chinese characters for a sutra? There is not generic “name”, “id” “date” or “code” in RDBMS. Those are called attribute properties and they need an attribute to make sense. 

    >> How can I eliminate the rows just consequently duplicated codes for each names? <<

    By not allow them into the schema in the first place! This is why we have keys in RDBMS. What you have done is copy a deck of punch cards in SQL, then you want to physical remove the extra card. Here is what  you have now, cleaned up a bit. I also used the correct INSERT INTO syntax. 

    CREATE TABLE Foobars
    (foo_name VARCHAR(35) NOT NULL, -- post officd standards
     floob_seq INTEGER NOT NULL, --- possible key! 
     screwup_timestamp DATETIME2(0) NOT NULL,  -- note the datatype
     screwup_code CHAR(5) NOT NULL -- valid size
    );

    INSERT INTO Foobar
    VALUES 
    ('NameA', 1, '2013-01-01 00:00:00', 'Code1'), 
    ('NameA', 2, '2013-01-01 01:00:00', 'Code1'), 
    ('NameA', 3, '2013-01-01 02:00:00', 'Code2'), 
    ('NameA', 4, '2013-01-01 03:00:00', 'Code2'), 
    ('NameA', 5, '2013-01-01 04:00:00', 'Code1'), 
    ('NameA', 6, '2013-01-01 05:00:00', 'Code1'), 
    ('NameB', 7, '2013-01-01 00:00:00', 'Code1'), 
    ('NameB', 8, '2013-01-01 01:00:00', 'Code1'), 
    ('NameB', 9, '2013-01-01 02:00:00', 'Code2'), 
    ('NameB', 10, '2013-01-01 03:00:00', 'Code2'), 
    ('NameB', 11, '2013-01-01 04:00:00', 'Code1'), 
    ('NameB', 12, '2013-01-01 05:00:00', 'Code1');

    >>  I need to get following output that eliminates the rows with IDs: 2-4-6 (for name A) 8-10-12 (for name B) <<

    CREATE TABLE Foobars
    (foo_name VARCHAR(35) NOT NULL, 
     screwup_timestamp DATETIME2(0) NOT NULL, 
     screwup_code CHAR(5) NOT NULL,
     PRIMARY KEY (foo_name, screwup_timestamp)
    );

    Now the bad data cannot appear in the schema and you have an actual table. What were you trying to do?? 

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Sunday, December 08, 2013 12:39 AM
  • This includes Name in the condition for the SQL 2012 code

    Yes, I was about to start to write something like that, but then I decided that it made more sense to partition on Name. But obviously that depends on the actual business requirements. If elimination should happen per name, no matter whether names are interleaved or not, my solution is the correct one. But if names are interleaved (with regards to ID) and we should always include the row where the name changes, your solution is the correct one.

    Mech3 did not indicate which of these two he/she wanted, but hopefully Mech3 is able to sort out which is the best approach for the acutal problem at hand.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, December 08, 2013 11:59 AM
  • Please post valid DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are.

    Drop dead, Celko. Mech3 included sufficient information to solve the problem. (Well, almost, see my reply to Russ.) It's a throwaway table to demonstrate the table, which is likely to be condensed version of the actual problem that hides the actual business domain. You know, some people in these forum feel obliged to respect the intellectual property of their employer/client.

    I don't expect people in this forum to post the actual

    Learn how to follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats (you screwed up big time. Nothing was right). Code should be in Standard SQL as much as possible and not local dialect. 

    By not allow them into the schema in the first place!

    Who says that is his/her schema? A lot of people who post here are in not the position that they can control the schema they work with, or they taking over someone else's mess. Or maybe data was born this way, because it was born in the real word and not in theoratical world of relational algebra.

    And on top of all, there were alread two posts adequately answering the question. Absolutely no need for your useless insults.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, December 08, 2013 12:06 PM