none
Removing commas and quotes from numeric fields in csv file using SSIS

    Question

  • Hi,

    I am creating SSIS package which reads data from a csv file and stores in SQL Server database. There are a few numeric fields in the csv files. They sometimes contain value like "1,008.54"

    How do I remove the quotes and comma from the value?

    I have successfully separated the rows with this kind of data by using Conditional Split Transformation. (SUBSTRING([Column 9],1,1) == "\"")

    After this, I tried using Derived Column Transformation to REPLACE comma and quotes with empty string. But it is not working.

    Please advise.

    Thanks!

    Tuesday, January 10, 2012 8:45 AM

Answers

  • Thank you all for replying.

     

    Making the following changes solved the issue:

    1. Setting Text Qualifier of Flat File Connection Manager to ".

    2. Using REPLACE(REPLACE(Column,",",""),"\"","") in Derived Column Transformation.

     

    • Marked as answer by Khush_D Wednesday, January 11, 2012 8:39 AM
    Wednesday, January 11, 2012 8:39 AM
  • OK

    Step 1: Edit your File connection manager to set text qualifier as ""

    Step 2: Using a derived column transformation, replace the existing column with REPLACE([Column 0],",","") expression

    Step 3: Use NULL check, if required


    Please vote as helpful or mark as answer, if it helps
    • Proposed as answer by Raunak J Wednesday, January 11, 2012 8:57 AM
    • Marked as answer by Khush_D Thursday, January 12, 2012 1:34 AM
    Tuesday, January 10, 2012 9:43 AM

All replies

  • DECLARE @var AS varchar(100)
    SET @var = '"10,000.54"'
    SELECT REPLACE(REPLACE(@var,'"',''),',','')
    

    Try this. But I am curious about the double quotes. Are they text qualifiers?
    Cheers, Raunak Jhawar | t:@raunakjhawar
    • Proposed as answer by Raunak J Wednesday, January 11, 2012 8:56 AM
    Tuesday, January 10, 2012 9:04 AM
  • Thanks for your reply Raunak.

     

    Can I use this syntax in Derived Column Transformation component?

    Adding a Script Component to my SSIS package to accomplish this is my last option.

     

    The quotes have been added so that the comma in the field value is not interpreted as field delimiter.

    So, wherever there is a comma inside a value, it is enclosed in quotes.

     

    But i have to read these values into "money" datatype fields in SQL Server table, so I need to get rid of these commas and quotes.

     


    • Edited by Khush_D Tuesday, January 10, 2012 9:25 AM
    Tuesday, January 10, 2012 9:25 AM
  • OK

    Step 1: Edit your File connection manager to set text qualifier as ""

    Step 2: Using a derived column transformation, replace the existing column with REPLACE([Column 0],",","") expression

    Step 3: Use NULL check, if required


    Please vote as helpful or mark as answer, if it helps
    • Proposed as answer by Raunak J Wednesday, January 11, 2012 8:57 AM
    • Marked as answer by Khush_D Thursday, January 12, 2012 1:34 AM
    Tuesday, January 10, 2012 9:43 AM
  • When I view the data using a data viewer, all the field with values like "1,008.54" is treated as 2 fields: ("1) and (008.54")

    1- first you have to read the file as ONE column (link)

    2- replace the , that represents the seperation of the fields with a piple line (|) this is not the letter L or i it is the character pipeline | ( find the \ on your keyboard)

    assuming that you have a text file like

    1,2,"This is, a test",1,2
    1,r,"gfdgfdsthis is an, example",y , 44
    "s,,x",y," 456456how is,this, , ,  ",r,t
    3,"f,fds,fdsf,dfsa",t,"h",r
    "4","t","h","u","h"
    a,e,",2",4,5
    ,,",",,
    ",s","wqt","tyr","iuy","h,"
    4,t,h,u,h

    3- use a script component to .....

     '
            ' Add your code here
            '
    
            Dim strRow As String
            Dim FirstIndex As Integer
            Dim LastIndex As Integer
            Dim strLen As Integer
            Dim strTemp As String
            Dim strTemp2 As String
    
    
            strRow = Row.Line.ToString
            strLen = strRow.Length
            strRow = Replace(strRow, ",", "|")
    
            FirstIndex = strRow.IndexOf("""", 0)
            If Len(strRow) > 0 And FirstIndex > -1 Then
                LastIndex = strRow.IndexOf("""", FirstIndex + 1)
    
                While FirstIndex <> -1 And LastIndex <> -1
                    strTemp = ""
                    strTemp2 = ""
                    'MsgBox("START ===>>>" & strRow & "--- FirstIndex " & FirstIndex.ToString & " --------- " & "LastIndex" & LastIndex.ToString)
    
                    If FirstIndex > 0 Then
                        strTemp = Left(strRow, FirstIndex)                      ' If the string doesn't start with a "
                        strTemp2 = Right(strRow, strLen - strTemp.Length)       ' only select after the FirstIndex Or after the fields processed
                    Else
                        strTemp2 = strRow.ToString                              ' If string starts with a "
                    End If
    
                    strTemp2 = Left(strTemp2, strTemp2.IndexOf("""", 1) + 1)    ' get string up to the next "
                    strTemp2 = strTemp2.Replace("|", ",")                       ' replace , with |
                    strTemp = strTemp & strTemp2                                ' Concat the 
    
                    If strLen > LastIndex Then                                  ' Concat the rest of the string after the LastIndex
                        strTemp = strTemp & Right(strRow, strLen - LastIndex - 1)
                    End If
    
                    strRow = strTemp.ToString                                   ' save change on each loop field change
                    'MsgBox(strRow & " --- strRow")
    
                    FirstIndex = strRow.IndexOf("""", LastIndex + 1)            ' Find next " field
                    LastIndex = strRow.IndexOf("""", FirstIndex + 1)            ' Find next " field
                    'MsgBox("FirstIndex" & FirstIndex.ToString & " --------- " & "LastIndex" & LastIndex.ToString)
    
                End While
                'Row.Line = strRow.ToString
            End If
            Row.Line = strRow.ToString
        End Sub
    


    4- now you will have

    Line F1 F2 F3 F4
    1|2|"This is, a test"|1|2    
    1|r|"gfdgfdsthis is an, example"|y | 44    
    "s,,x"|y|" 456456how is,this, , ,  "|r|t    
    3|"f,fds,fdsf,dfsa"|t|"h"|r    
    "4"|"t"|"h"|"u"|"h"    
    a|e|",2"|4|5    
    ||","||    
    ",s"|"wqt"|"tyr"|"iuy"|"h,"    
    4|t|h|u|h    

    5- Add to the script to remove " in text and , for numbers

    6- use another script component to split the one column to multiple columns ( end part of link)

     

     


    Sincerely SH -- MCITP 2008, MCTS 2008 & 2005 -- Please kindly mark the post(s) that answered your question and/or vote for the post(s).
    Wednesday, January 11, 2012 4:08 AM
  • Hello Nik/Khush_D,

    I don't think that a script task is required for such an issue. I am sure, tried and tested that SSIS derived task and text qualifiers will sovle the issue.


    Please vote as helpful or mark as answer, if it helps
    • Proposed as answer by Raunak J Wednesday, January 11, 2012 8:57 AM
    Wednesday, January 11, 2012 5:24 AM
  • Thank you all for replying.

     

    Making the following changes solved the issue:

    1. Setting Text Qualifier of Flat File Connection Manager to ".

    2. Using REPLACE(REPLACE(Column,",",""),"\"","") in Derived Column Transformation.

     

    • Marked as answer by Khush_D Wednesday, January 11, 2012 8:39 AM
    Wednesday, January 11, 2012 8:39 AM
  • Hello Khush_D,

    Nice to see that you managed to sove the issue. Please mark the post as answer. It helps the slution provider gain reconition in the community. Please do so.


    Please vote as helpful or mark as answer, if it helps
    Wednesday, January 11, 2012 8:42 AM