none
Function to replace special character embedded with char into just character.

    Question

  • Hi,

    I have a situation where i am getting special characters from data eg. @, &, #, É , Ã. Is there is way that these can be replaced. For example:

    @ with space/blank ''

    , & with space/blank ''

    , #  with space/blank ''

    , É with E

    ,  Ã with A.

    I have table with all these special characters. (@, &, #, É , Ã...) - dbo.splchars , I can use replace function to replace all of them with blank. But the character like É , Ã should be replaced with their corresponding chars like E, A.

    Is there is any way to do that automated? Any help is appreciated. Thanks.

    Punia

    Friday, July 26, 2013 3:21 PM

Answers

  • Try

    create table #replacements (original char(1),replacement char(1))
    insert into #replacements values('@',' ')
    insert into #replacements values('É','E')
    insert into #replacements values('Ã','A')
    go
    
    create table #t (val varchar(100))
    insert into #t values ('bla@spam.com')
    insert into #t values ('BÉÃCON@thatsright.com')
    insert into #t values ('BÉÃÉÃCON@tÉÃhatÃÃsright.com')
    go
    -- Solution starts here
    ;with cte as (select T.val, COALESCE(REPLACE(T.val, R.original, R.Replacement), T.val) as Corrected, 1 as Level
    from #t T LEFT JOIN #replacements R ON T.val like '%' + R.original + '%'
    UNION ALL
    
    select T.Val, REPLACE(T.Corrected, R.original, R.Replacement) as Corrected, T.Level + 1 as Level
    from cte T INNER JOIN #replacements R ON T.Corrected like '%' + R.original + '%'),
    
    cte2 as (select *, ROW_NUMBER() over (PARTITION by Val order by Level DESC) as Rn 
    from cte)
    
    select val, Corrected from cte2 where Rn = 1
    


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    • Proposed as answer by Prajesh Friday, July 26, 2013 5:47 PM
    • Marked as answer by Kalman TothModerator Wednesday, August 07, 2013 9:28 PM
    Friday, July 26, 2013 5:17 PM
  • Here's one way:

    create table replacements (original char(1),replacement char(1))
    insert into replacements values('@',' ')
    insert into replacements values('É','E')
    insert into replacements values('Ã','A')
    go
    
    create table #t (val varchar(100))
    insert into #t values ('bla@spam.com')
    insert into #t values ('BÉÃCON@thatsright.com')
    go
    
    create function dbo.Normalize(@val varchar(100))
     returns varchar(100)
    as begin
      select @val=replace(@val,original,replacement)
      from replacements
      return @val
    end
    go
    
    select dbo.Normalize(val)
    from #t
    go
    
    drop function dbo.Normalize
    drop table replacements
    drop table #t
    


    Gert-Jan

    Friday, July 26, 2013 4:32 PM
  • There's one too many underscores in the collation name. You probably meant Latin1_General_BIN


    Gert-Jan

    Monday, July 29, 2013 10:51 PM

All replies

  • Do something like this

    select REPLACE(REPLACE('Hello Élephant @', 'É', 'E'), '@', ' ')


    Thanks and Regards, Prajesh Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Friday, July 26, 2013 3:27 PM
  • Thanks for the reply Prajesh.

    But i have to do that manually and write the replace stmt for each character. Is there is anything that can be done dynamically/ automated?

    Punia

    Friday, July 26, 2013 3:29 PM
  • Does any body have any other resolution? This is urgent. Any help is appreciated.

    Thanks,

    Punia

    Friday, July 26, 2013 4:13 PM
  • I am creating a function for you

    Thanks and Regards, Prajesh Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Friday, July 26, 2013 4:18 PM
  • Here's one way:

    create table replacements (original char(1),replacement char(1))
    insert into replacements values('@',' ')
    insert into replacements values('É','E')
    insert into replacements values('Ã','A')
    go
    
    create table #t (val varchar(100))
    insert into #t values ('bla@spam.com')
    insert into #t values ('BÉÃCON@thatsright.com')
    go
    
    create function dbo.Normalize(@val varchar(100))
     returns varchar(100)
    as begin
      select @val=replace(@val,original,replacement)
      from replacements
      return @val
    end
    go
    
    select dbo.Normalize(val)
    from #t
    go
    
    drop function dbo.Normalize
    drop table replacements
    drop table #t
    


    Gert-Jan

    Friday, July 26, 2013 4:32 PM
  • Try

    create table #replacements (original char(1),replacement char(1))
    insert into #replacements values('@',' ')
    insert into #replacements values('É','E')
    insert into #replacements values('Ã','A')
    go
    
    create table #t (val varchar(100))
    insert into #t values ('bla@spam.com')
    insert into #t values ('BÉÃCON@thatsright.com')
    insert into #t values ('BÉÃÉÃCON@tÉÃhatÃÃsright.com')
    go
    -- Solution starts here
    ;with cte as (select T.val, COALESCE(REPLACE(T.val, R.original, R.Replacement), T.val) as Corrected, 1 as Level
    from #t T LEFT JOIN #replacements R ON T.val like '%' + R.original + '%'
    UNION ALL
    
    select T.Val, REPLACE(T.Corrected, R.original, R.Replacement) as Corrected, T.Level + 1 as Level
    from cte T INNER JOIN #replacements R ON T.Corrected like '%' + R.original + '%'),
    
    cte2 as (select *, ROW_NUMBER() over (PARTITION by Val order by Level DESC) as Rn 
    from cte)
    
    select val, Corrected from cte2 where Rn = 1
    


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    • Proposed as answer by Prajesh Friday, July 26, 2013 5:47 PM
    • Marked as answer by Kalman TothModerator Wednesday, August 07, 2013 9:28 PM
    Friday, July 26, 2013 5:17 PM
  • Thanks Gert and Naomi!

    Right now we are manually insert the replacements into #replacements table. Is there is any way to write sql code that will automaticaly replace the

    É with E

    ,  Ã with A..........

    . I mean automated code that will take out the symbols from any alphabets [taking out any symbol ( - , ^ ,  etc) from any alphabet (A...Z) ].

    Thanks for your help.

    Punia

    Monday, July 29, 2013 2:54 PM
  • Hi,

    Unfortunately, SQL doesn't know that [à] derives from [a].

    In all cases, it's up to you to define a strategy to deal with non-English characters.

    Some people will replace any character above 127 with a [?].

    Others configure substitution tables like [é] = [e] and [µ] = [m].



    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu

    Monday, July 29, 2013 3:09 PM
  • I am not sure it's possible.

    Also, I am getting an error on this line in SQL Server 2012 I am not sure why

    declare @t table (Original nchar(1), replacement char(1))
    
    insert into @t (Original) values (N'É'), (N'Ã')
    
    select Original, cast(original as char(1)) collate Latin_1_General_BIN as Replacement from @t 

    I read several BOL pages and 2012 version of documentation lists that collation. So, why am I getting an error 'Collation not found' here?


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Monday, July 29, 2013 3:09 PM
  • Hi,

    Please, try the following code

    declare @t table (Original nchar(1), replacement char(1) collate latin1_general_bin)

    insert into @t (Original) values (N'É'), (N'Ã')

    select Original, cast(original as char(1)) as Replacement from @t order by replacement


    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu

    Monday, July 29, 2013 3:20 PM
  • I am just not sure why I can not use COLLATE clause with that binary collation in the CAST command.

    In any case, this idea didn't work anyway - I can not automatically convert 'Ã' into A


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Monday, July 29, 2013 3:44 PM
  • I completely agree that babbupunia needs to determine the appropriate translations.

    Sebastian gives an interesting example: µ

    It depends on the source language (English? something else), and also how the inappropriate character arise.

    If your text is Spanish, a character such as ¡ is perfectly fine, and you don't want to convert it to anything else (such as an i).

    If you are scanning document and have used OCR to get to the text that you store in the database, then you may want to convert a character like ¦ to l, i or 1, but in other situations you may want to discard it.

    And these are only examples of Latin-1. If you have to handle Unicode too, then this approach is no longer feasible.


    Gert-Jan

    Monday, July 29, 2013 6:16 PM
  • Hey Guys!

    Thanks a lot for your replies. But unfortunately, it is not working.

    Any other work around??

    Thanks,

    Punia

    Monday, July 29, 2013 8:23 PM
  • What exactly do you mean by not working? If you have replacement table, does my code work for you?

    There is nothing in SQL Server to automatically figure out the translations. You need to have a mapping table.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Monday, July 29, 2013 8:46 PM
  • There's one too many underscores in the collation name. You probably meant Latin1_General_BIN


    Gert-Jan

    Monday, July 29, 2013 10:51 PM
  • Looks like you're right, but I am sure I copied/pasted from documentation. I guess there may be a typo in Documentation then.

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Monday, July 29, 2013 11:05 PM