none
Adding CRLF to file with no delimiters

    Question

  • I've got a flat file that I need to add a CRLF to after each 129 chars. This is so a Unix program can run against it, which needs/uses CRLF as a delimiter.  I've tried to use a derived column but since I'm adding 2 chars the next 129 char are off by two. I think i need to use a script task and just create a new file from the original and add the CRLF after each 129 chars. Any other ideas? I'm fairly new to SSIS. I'm using VS 2017
    Thursday, December 6, 2018 3:17 PM

All replies

  • Hi XandyX,

    It seems that the SSIS Script Task will work best for your scenario.

    Please provide a sample of both: your input as well as output files.

    Thursday, December 6, 2018 4:11 PM
  • here's what the data coming in looks like: 

    C     COPYRIGHT(C) 02-18 USPS 001                                                                                               

    A00501  INTERNAL REVENUE SERVICE            WAVERLY                     AVE   O19940501                                          A00601  CALLE A PALMIERI                    CALLE ALFREDO PALMIERI            A20040407                                          A00601  CALLE E SANABRIA                    CALLE ENRIQUE SANABRIA            A20110907                                          A00601  CALLE FCO PIETRI                    CALLE FRANCISCO PIETRI            A20040407                                          A00601  CALLE JM DE ANDINO                  CALLE J MARTINEZ DE ANDINO        A20110907                                          A00601  CALLE R GONZALEZ                    CALLE RODULFO GONZALEZ            A20120924                                          A00601  CALLE T FIGUEROA                    CALLE TEODORO FIGUEROA            A20040407                                          A00601  CALLE VEGA DE PUIG                  CALLE AMERICO RODRIGUEZ           

    output would be 129 chars of above data and a CRLF at end. Something like below

    C     COPYRIGHT(C) 02-18 USPS 001                                                                                               

    A00501  INTERNAL REVENUE SERVICE            WAVERLY                     AVE   O19940501                                      A00601  CALLE A PALMIERI                    CALLE ALFREDO PALMIERI            A20040407                                      A00601  CALLE E SANABRIA                    CALLE ENRIQUE SANABRIA            A20110907                                    A00601  CALLE FCO PIETRI                    CALLE FRANCISCO PIETRI            A20040407                                        A00601  CALLE JM DE ANDINO                  CALLE J MARTINEZ DE ANDINO        A20110907                                  A00601  CALLE R GONZALEZ                    CALLE RODULFO GONZALEZ            A20120924                                  A00601  CALLE T FIGUEROA                    CALLE TEODORO FIGUEROA            A20040407                                    A00601  CALLE VEGA DE PUIG                  CALLE AMERICO RODRIGUEZ           

    Thursday, December 6, 2018 4:32 PM
  • that did not format as i expected. it should look more table like and not jagged. If that makes sense
    Thursday, December 6, 2018 4:33 PM
  • Please check if the following code can read you input file line by line:

    string filename = @"e:\Temp\yourfilename.csv";
    var lines = File.ReadAllLines(filename);
    
    foreach (string line in lines)
    {
    	Console.WriteLine(line);
    }

    Thursday, December 6, 2018 4:59 PM
  • I'm theorizing, but since there is no delimiter the code wont work.

    Thursday, December 6, 2018 5:14 PM
  • Ok. Please try the following.

    Input file QuarkUnixInput.txt:

    1111222233334444


    c#:

    void Main()
    {
    	// Read the file as one string.
    	string inputFile = System.IO.File.ReadAllText(@"e:\Temp\QuarkUnixInput.txt");
    
    	var output = Split(inputFile, 4);
    
    	// Create a file and write into it.
    	string outputFile = @"e:\Temp\QuarkUnixOutput.txt";
    	File.WriteAllLines(outputFile, output);
    }
    
    // Define other methods and classes here
    static IEnumerable<string> Split(string str, int chunkSize)
    {
    	return Enumerable.Range(0, str.Length / chunkSize)
    		.Select(i => str.Substring(i * chunkSize, chunkSize));
    }

    Output file QuarkUnixOutput.txt:

    1111
    2222
    3333
    4444


    Thursday, December 6, 2018 5:22 PM
  • Thank you ! I was thrown a curve ball. The file is 900.000 kb. I'm getting a out of memory exception
    Thursday, December 6, 2018 9:48 PM
  • Hi XandyX,

    Yes, file size could be a problem.

    Unfortunately, we cannot read the input file system line by line.

    How much RAM your computer has?

    Do you have a 64-bit OS?

    Is it possible to request your input file in a different format?

    For example, XML format?


    Thursday, December 6, 2018 10:17 PM
  • Here is another version to try. It just reads buffer by buffer from the file system.

    string inputFile = @"e:\Temp\QuarkUnixInput.txt";
    	string outputFile = @"e:\Temp\QuarkUnixOutput.txt";
    
    	const int bufferSize = 4;
    
    	var buffer = new Char[bufferSize];
    	var count = bufferSize;
    	string line = string.Empty;
    
    	// Create a file and write into it.
    	using (StreamWriter file = new StreamWriter(outputFile, false))
    	{
    		using (var sr = new StreamReader(inputFile))
    		{
    			while (count > 0)
    			{
    				count = sr.Read(buffer, 0, bufferSize);
    				line = string.Concat(buffer);
    				file.WriteLine(line);
    			}
    		}
    	}






    Thursday, December 6, 2018 10:42 PM
  • that did not format as i expected. it should look more table like and not jagged. If that makes sense

    Linux lineterms are LF (ASCII 0x0a) only.  Windows lineterms are CRLF (ASCII 0x0d0a).  The HTML lineterm is only <br/> and both CR and LF are completely ignored in browser rendering of text.  Try putting that mess in a <pre>mess here</pre> tag (using the post's HTML Edit Mode) or just dump it into a Code Box (which automagically creates a pre for you, but also does syntax coloration so might make it look funnier still).

    Anyway, if it looks "table like" at all on your home text editor, then there are definitely delimiters.  If this data originates from a Linux system, say as output from a Linux program, then its terminators would be LF only.  See if you can just do a string replacement converting "\n" to "\r\n" with regular expressions or standard C# string replacement, if you're using C#.

    You can probably work out a cat/grep or sed/grep method from the linux commandline too.


    It never hurts to try. In the worst-case scenario, you'll learn something.

    Thursday, December 6, 2018 11:47 PM
  • went old school:

                      

          const int bufferSize = 182;

                var sb = new StringBuilder();
                var buffer = new Char[bufferSize];
                var length = 0L;
                var totalRead = 0L;
                var count = bufferSize;

                using (var sr = new StreamReader(zip4mst4FileToProcess))
                {
                    length = sr.BaseStream.Length;
                    while (count > 0)
                    {
                        count = sr.Read(buffer, 0, bufferSize);
                        sb.Append(buffer, 0, count);
                        totalRead += count;
                    }
                }

    now just need to create a new file with the data in buffer and add CRLF

    Friday, December 7, 2018 12:29 AM
  • Hi XandyX,

    It seems that you missed the 2nd solution I provided a little bit earlier.

    Friday, December 7, 2018 1:19 AM
  • here's what the data coming in looks like: 

    C     COPYRIGHT(C) 02-18 USPS 001                                                                                               

    A00501  INTERNAL REVENUE SERVICE            WAVERLY                     AVE   O19940501                                          A00601  CALLE A PALMIERI                    CALLE ALFREDO PALMIERI            A20040407                                          A00601  CALLE E SANABRIA                    CALLE ENRIQUE SANABRIA            A20110907                                          A00601  CALLE FCO PIETRI                    CALLE FRANCISCO PIETRI            A20040407                                          A00601  CALLE JM DE ANDINO                  CALLE J MARTINEZ DE ANDINO        A20110907                                          A00601  CALLE R GONZALEZ                    CALLE RODULFO GONZALEZ            A20120924                                          A00601  CALLE T FIGUEROA                    CALLE TEODORO FIGUEROA            A20040407                                          A00601  CALLE VEGA DE PUIG                  CALLE AMERICO RODRIGUEZ           

    output would be 129 chars of above data and a CRLF at end. Something like below

    C     COPYRIGHT(C) 02-18 USPS 001                                                                                               

    A00501  INTERNAL REVENUE SERVICE            WAVERLY                     AVE   O19940501                                      A00601  CALLE A PALMIERI                    CALLE ALFREDO PALMIERI            A20040407                                      A00601  CALLE E SANABRIA                    CALLE ENRIQUE SANABRIA            A20110907                                    A00601  CALLE FCO PIETRI                    CALLE FRANCISCO PIETRI            A20040407                                        A00601  CALLE JM DE ANDINO                  CALLE J MARTINEZ DE ANDINO        A20110907                                  A00601  CALLE R GONZALEZ                    CALLE RODULFO GONZALEZ            A20120924                                  A00601  CALLE T FIGUEROA                    CALLE TEODORO FIGUEROA            A20040407                                    A00601  CALLE VEGA DE PUIG                  CALLE AMERICO RODRIGUEZ           

    Hi XandyX, 

    In SSIS, you can use a Fixed width Flat File Connection to import the source data, and in the flat file destination create a Delimited/Ragged right connection, then CRLF will be added automatically in the destination file. 

    Here is how I configure the Flat File connections: 

    1, Source

    2, Destination

    Result:

    It seems that there is a tab at the end of the first line, so I just skip the first two lines(Header rows to skip: 2). 


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Friday, December 7, 2018 2:33 AM