Answered Extended Ascii problem

  • Thursday, May 17, 2012 3:22 PM
     
     

    I have two packages in 2005. The first reads data from the database and writes it to a flat file. The second reads the same flat file. When the file is written, extended ascii characters like ñ look fine when I open the file in notepad. When the second package reads the file the ñ looks like this, ñ  , when viewed in the dataviewer and in UltraEdit. This throws off the columns by one character and the subsequent fields end up one character off. I can tell because the package dies on converting a date field from the line containing the ñ. I can see in the dataviewer that what should be something like "01012005" ends up " 0101200".  

    What gives?  The database fields are all DT_STR, the database is codepage 1252, everything in the packages is codepage 1252. Nothing is unicode and unicode is currently not an option.  I've tried setting AlwaysUseDefaultCodePage to true and false, data conversions to unicode and OEM code pages, setting the flatfile to be codepage 437,  and about 5 other things all with no success.  

All Replies

  • Thursday, May 17, 2012 3:28 PM
     
     
    When you open the file in ultraedit, what is it reporting for the codepage/encoding of the file?  If ultraedit can not tell you then download Editpad Lite, which has that capability.

    Chuck

  • Thursday, May 17, 2012 3:31 PM
    Moderator
     
     
    Why not to strip out the offending chars if they are not part of the load?

    Arthur My Blog

  • Thursday, May 17, 2012 3:40 PM
     
     
    EditPad lite says it's 1252. It also shows the ñ as ñ

    Glenn Plott

  • Thursday, May 17, 2012 3:42 PM
     
     
    The ñ is part of peoples names. I can't strip them out or convert them, ñ -> n

    Glenn Plott

  • Thursday, May 17, 2012 3:47 PM
    Moderator
     
     

    Hmm... you say nothing is Unicode, I simply cannot imagine how the names lurked in with the extended ASCII chars in their names?


    Arthur My Blog

  • Thursday, May 17, 2012 5:52 PM
     
     

    Hi,

    Latin American Spanish text usually comes as codepage 850, not 1252.

    Here are the settings I use to import Spanish 8-bit ASCII text files :

    1. SSIS origin source (flat file)

    Code Page 850, OEM Multilingual Latin 1

    Local : English US

    Unicode : No

    2. Destination source (SQL)

    Check that your destination fields are either NCHAR or NVARCHAR (that'd be the ideal choice)

    If your destination fields are CHAR or VARCHAR, then you'll have to change their collation to SQL_Latin1_General_CP_850_BIN (or the others

    related collations SQL_Latin1_General_CP_850_CI_AS, etc accent-sensitive, etc)

    Here's a piece of code that will create a sample table

    USE [MyBase]
    GO

    CREATE TABLE [dbo].[MyTable](
     [MyKey] [int] IDENTITY(1,1) NOT NULL,
     [MySpanishText] [nvarchar](20) NOT NULL
    ) ON [PRIMARY]

    GO

    Then, save the following text as MyScript.vbs and execute it to create a sample CP850 Spanish text on c:\temp\spanish.txt

    Option Explicit
    Dim I
    Dim SpanishLine
    Dim objFile
    Dim objFSO
    SpanishLine = ""
    SpanishLine = SpanishLine & Chr(74)  & Chr(111) & Chr(115) & Chr(130) & Chr(32)
    SpanishLine = SpanishLine & Chr(78)  & Chr(163) & Chr(164) & Chr(101) & Chr(122) & Chr(32)
    SpanishLine = SpanishLine & Chr(100) & Chr(101) & Chr(32)
    SpanishLine = SpanishLine & Chr(65)  & Chr(114) & Chr(97)  & Chr(103) & Chr(162) & Chr(110)
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objFile = objFSO.CreateTextFile("c:\temp\spanish.txt")
    objFile.WriteLine(SpanishLine)
    objFile.Close
    set objFile = Nothing
    set objFSO = Nothing
    Wscript.Quit

    If everything goes right, you should have a José Núñez de Aragón string on the sample table after importing via SSIS


    Sebastian Sajaroff Senior DBA Pharmacies Jean Coutu


  • Thursday, May 17, 2012 8:27 PM
     
     

    After some more research here is what I've found

    This data is entered into the app and saved into the database into a varchar (not nvarchar) field using codepage 1252 collation SQL_Latin1_General_CP1_CI_AS with no problems. There are just a few users in the system who know how to enter spanish characters like ñ using Alt+164.  It's no surprise that that all works fine. After all, ñ is in codepage 1252. The problem is with SSIS writing the file. When I look at the file with a hex editor SSIS is writing two characters ( dec 195 and 177 ) instead of the one character ( dec 241 in codepage 1252, dec 164 in codepage 850, dec 241 in UTF-16) that it should be writing. It turns out that what it is writing ( dec 195 and 177 ) is ñ in UTF-8 which is a variable-width encoding scheme

    So my question should be, "why is SSIS writing the file as UTF-8 (code page 65001) when the flat file connection manager specifically says to use code page 1252?"  I'd be okay with UTF-8  except that the data from the file eventually has to go back into a varchar field.


    Glenn Plott


    • Edited by Glenn Plott Thursday, May 17, 2012 8:38 PM
    •  
  • Thursday, May 17, 2012 9:38 PM
    Moderator
     
     Answered

    Hmm, then this sounds like you need a conversion back from the Unicode to ASCII.

    You may use the Data Conversion block or a derived column transformation with (DT_STR, 1252, 500)(Column1)


    Arthur My Blog

  • Thursday, May 24, 2012 2:35 PM
     
     Answered
    Sorry for the delay, got sidetracked. What I found is that instead of using a plain flat file destination the package was using a script to write the file. The connection used by the script was set as codepage 1252 but the StreamWriter that is used to write the file doesn't care what codepage the connection is set as. The StreamWriter class, instead of defaulting to System.Text.Encoding.Default, defaults to UTF-8.  I specified the default encoding and it wrote in 1252. All good. Why it was using a script I have no idea. I changed to using a flat file destination and it worked just as well. Thanks eveyone for your help.

    Glenn Plott

    • Marked As Answer by Glenn Plott Thursday, May 24, 2012 2:35 PM
    •