SSIS flat file connection manager with file encoded in UTF-8 and fixed width
-
Sunday, January 27, 2013 5:34 PM
Hi,
I have flat file connection manager with file encoded in UTF-8. I've set fixed width on it and I've put all information regarding columns widths. The file has data which are from multilingual sites - so some chars are on 1 byte some on 2 bytes.
I've noticed strange behaviour of SSIS flat file manager - it seems to read bytes instead of chars - so if it find 2 bytes char in row it shift result by one position.
Could you please help me with this?
All Replies
-
Tuesday, January 29, 2013 3:09 PMModerator
Where do you see this?
Make sure you use Uncode datatypes all the way (DT_WSTR...).
We need pictures, sample data etc.
Arthur My Blog

-
Wednesday, January 30, 2013 11:02 PM
Hi ArthurZ,
Sample data from UTF-8 file (4 fixed DT_WSTR columns 6,9,8 and 2(end of line) chars):
s6 s9 s8
0000 Pdwopr aaaaa
1111 aaaaaa aaaółtrw
2222 sdrtyi bbbbbThe most important is second line with ół chars which in UTF-8 takes 2 bytes.
I've prepared images shows Connetion manager to sample file(general tab, and columns tab) but I can't submit (something wrong with my forum account)
Only what you have to do is to create connetion manager, set fixed length on it and you will see issue on columns tab.
RafZak
-
Wednesday, January 30, 2013 11:25 PMDoesn't seem to me that it is actually a UTF-8 file. Have you opened the file with a hex editor to see what the data looks like without interpretation?
Chuck Pedretti | Magenic – North Region | magenic.com
-
Thursday, January 31, 2013 4:38 AM
Hi RafZak,
As Arthur stated, it is unicode file. For column s8 change your data type to dt_wstr and length should be around 500 for example.
Thanks, hsbal
-
Friday, February 01, 2013 6:37 PM
Hi
Harry Bal - the idea is that s8 column is not necessary the last - just suppose that we have a lot of text columns in this file.
UTF-8 is unicode as well as UTF-16 - if I have this data in UTF-16 I won't have any issue - in UTF-16 files characters (ex. "ół") are on two bytes like others "normal" chars. Use UTF-16 file can be workaround in this case but that is the point - I have to use UTF-8 file.
If you want to create file as my sample just copy and paste my example and save it as UTF-8.
Unfortunately my account is still not verify by Microsoft (crazy for me), so I can't add images to my post
Best Regards
-
Monday, February 04, 2013 1:07 AMModerator
You can provide links to SkyDrive or ImageShack that host images very well.
I convulsively arrive to a thought that it is the fixed length that spoils the load.
How about you using the Ragged Right instead?
Arthur My Blog

- Marked As Answer by Eileen ZhaoMicrosoft Contingent Staff, Moderator Friday, February 08, 2013 6:55 AM
-
Wednesday, March 13, 2013 7:18 PM
You can provide links to SkyDrive or ImageShack that host images very well.
I convulsively arrive to a thought that it is the fixed length that spoils the load.
How about you using the Ragged Right instead?
Hello, Has this been resolved.
I am also facing the same issue, importing UTF-8 file to SQL db. the UTF-8 flat file is ready byte by byte not character by character.
I have the file like, Family Name(60) MiddleName(60)LastName(60)LocalLanguageName(60)Country(60).....
Local Language Name can be from any country, When I try to specify the Local language name's start and endposition it's all over depending up on the Characters(2 byte/1byte) hence Country is not coming in to correct db column.
Thanks in advance for any help
Regards
SK

