sql query

Question

• Hi ,,

I have a table with some columns , but am concerned about this column.

If you see the values for this column below, the 1st and 2nd, 3rd and 4th ,5th and 6th are actually same..just a number is appended in front of them.

example lets take first two values: 20208296 and 000000208296. These two are same , some how 2 is appended in front and its making those two disitnct values.

How to find how many of those are like that in table. There are million rows in the table.

This column also has other kind of values , such as the last three values.I am not concerned about those kind.

ColumnA

20208296

000000208296

50822404

000000822404

60984072

000000984072

144233434

234354355

345346676

Monday, February 03, 2014 6:56 PM

• ```CREATE TABLE [dbo].[Data](
ColumnA [varchar](20) NOT NULL
) ON [PRIMARY]

GO
-- Jan
INSERT INTO [dbo].[Data]  VALUES ('20208296')
,('000000208296')
,('50822404')
,('000000822404')
,('60984072')
,('000000984072')
,('144233434')
,('234354355')
,('345346676')

;with mycte as
(
select ColumnA, Cast(Right(ColumnA,6) as int) right6,
ROW_NUMBER() Over(Partition by Cast(right(ColumnA,6) as int)  Order by  Cast(right(ColumnA,6) as int)   ) rn from data
)
Select * from mycte
Where right6 IN
(Select right6 from mycte
Where rn=2)

drop table data```

Monday, February 03, 2014 8:57 PM

All replies

• ```SELECT t.ColumnA,t1.ColumnA
FROM table t
INNER JOIN table t1
ON t1.ColumnA LIKE '%' + CAST(CAST(t.ColumnA AS int) AS varchar(10))
WHERE t.ColumnA LIKE '000000%'
AND t1.ColumnA NOT LIKE '000000%'```

Please Mark This As Answer if it helps to solve the issue Visakh ---------------------------- http://visakhm.blogspot.com/ https://www.facebook.com/VmBlogs

Monday, February 03, 2014 7:04 PM
• there is no guarantee that the values will have fixed 6 leading zeros.

It may have >= or < 6 leading zeros.

Monday, February 03, 2014 7:20 PM
• Better suggestion is Redo the ETL than trying to fix up the erraneous data..

Second option : do you have a table that atleast maps to the values in this table? if so we can do a join and find it over.. if you don't have that you have to go back to option 1.

See the below example.

```DECLARE @Input1 VARCHAR(20) = '111222', @Input2 VARCHAR(20) = '00010111222', Declare @input3 VARCHAR(20) = '00051111222'
```

You cannot really differentiate between @input1, @input2, @input3 using like operator until or unless you know a specific pattern. I hope you understood.

Good luck :) ... visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

Monday, February 03, 2014 8:02 PM
• Hi Leo,

From the small sample provided:

All the ones without 0's appended to the front are 8 digits in length and the ones with 0's in front are 6 digits in length.

Why do some have 0's and some not have 0s?

How do you know that the 20208296 is the same as the 000000208296? What logic was used to figure this out?

Do these IDs get loaded into this table from different sources? Do these errors occur as a result of simple input error?

Please provide a bit more information so we can assist you with finding a resolution. Thanks!

Monday, February 03, 2014 8:05 PM
• ```CREATE TABLE [dbo].[Data](
ColumnA [varchar](20) NOT NULL
) ON [PRIMARY]

GO
-- Jan
INSERT INTO [dbo].[Data]  VALUES ('20208296')
,('000000208296')
,('50822404')
,('000000822404')
,('60984072')
,('000000984072')
,('144233434')
,('234354355')
,('345346676')

;with mycte as
(
select ColumnA, Cast(Right(ColumnA,6) as int) right6,
ROW_NUMBER() Over(Partition by Cast(right(ColumnA,6) as int)  Order by  Cast(right(ColumnA,6) as int)   ) rn from data
)
Select * from mycte
Where right6 IN
(Select right6 from mycte
Where rn=2)

drop table data```

Monday, February 03, 2014 8:57 PM
• hi

Try this

create table #temp(columnA varchar(25) )
go
Insert into #temp
Values('20208296')
Insert into #temp
Values('000000208296')

Insert into #temp
Values('50822404')

Insert into #temp
Values('000000822404')

Insert into #temp
Values('60984072')

Insert into #temp
Values('000000984072')

Insert into #temp
Values('144233434')

Insert into #temp
Values('234354355')
Insert into #temp
Values('345346676')
go
Select cast(rtrim(ltrim(columnA)) as int) as test ,* from #temp
Select distinct cast(rtrim(ltrim(columnA)) as int) as test  from #temp
go
Drop table #temp
go

Mark as Answer if you find it useful

Shridhar J Joshi

Thanks alot

Tuesday, February 04, 2014 12:38 PM
• See if this helps, I am assuming that you have a unique ID in your table.

```/*create table #temp(Id int identity(1,1),col varchar(25))
go
Insert into #temp
Values('20208296'),
('000000208296'),
('50822404'),
('000000822404'),
('60984072'),
('000000984072'),
('144233434'),
('234354355'),
('345346676');
*/
select t1.id,t1.col,t2.id from #temp t1 join #temp t2 on convert(varchar(100),t1.col) like '%'+convert(varchar(100),convert(bigint,t2.col))+'%'
--drop table #temp```

You can remove the ending percentage '%'+convert(varchar(100),convert(bigint,t2.col))+'%'

which will avoid matching the row 10 & 11( ID 7 & 8)

Satheesh
My Blog

• Edited by Tuesday, February 04, 2014 1:37 PM
• Proposed as answer by Tuesday, February 11, 2014 4:51 PM
Tuesday, February 04, 2014 1:34 PM