none
How to convert a CR+LF (DOS/Windows) to LF (Unix)? RRS feed

  • Question

  • Hi everyone,

    I work on T-SQL and have no idea about VBScripts. I'm trying to do a bcp on SQL Server. However, the file I'm bcp'ing is writing CRLF to the file. I'll be moving this file later to an Unix environment which only accepts LF. I've searched online and couldn't find any script that I can run on a Windows environment to remove the CRLF in the file. Could you please help me if there is a better solution to do this?

    Here is one of the script I found online which doesn't seem to be working.

    Dim fileTranOK
    
    Function Main()
    
    Dim sFile, tFile
    	fileTranOK = True
    	sFile = GlobalVariables("dPath")  &  GlobalVariables("dFLog1")
    	tFile = GlobalVariables("dPath")  &  GlobalVariables("dFLog1") & ".temp"
    	ManageFileContent sFile, tFile
    	
    End Function
    
    
    Function ManageFileContent(sFile, tFile)
    
    Dim oFSO, objFile, inFile, outFile, sText1, sText2, catLine
    	Set oFSO = CreateObject("Scripting.FileSystemObject")
    	Set inFile = oFSO.OpenTextFile(sFile, 1)
    	Set objFile = oFSO.CreateTextFile(tFile) 
    	set objFile = nothing
    	Set outFile = oFSO.OpenTextFile(tFile, 2)  
    	Do While Not inFile.AtEndOfStream
    		sText1 = inFile.ReadLine
    
    		If (InStr(1,sText1, vbCr & vbLf)) Then
    			Replace(sText1, vbCrLf, vbLf)
    			outFile.WriteLine(sText1)
    		End If
    	Loop
    	inFile.Close
    	outFile.Close
    	oFSO.DeleteFile sFile
    
    	oFSO.MoveFile tFile,sFile
    
    Set oFSO = nothing
    
    End Function
    PS: I know there are bunch of commands on UNIX to change this but I'd like to do this as part of my SQL Server job without any third party tool.

    Thanks,

    Bangaaram


    Known is a DROP, Unknown is an OCEAN.

    Friday, October 3, 2014 3:04 PM

Answers

  • The bcp.exe documentation says the default row terminator is already '\n'.

    -- Bill Stewart [Bill_Stewart]

    Unfortunately in Windows \n is not the same as \l or \r.  It is compound. You must specify  0xA or \a or `a in some utilities.

    Almost all Unix utilities can translate the record terminator.  A couple of databases I have found don't do this natively.

    I just test BCP to be sure it works as expected.

    This command:

    BCP issue.dbo.demo_task out C:\temp\test.txt -c -t"," -r 0xA -T -S .\SQLExpress

    Adds 0xA as a record terminator. -r 0xA.  It is undocumented before SQS 2008R2.

    Look at the format file specs to get all of the variations on converging BCP references.


    ¯\_(ツ)_/¯


    • Edited by jrv Friday, October 3, 2014 6:54 PM
    • Marked as answer by Bangaaram Friday, October 3, 2014 8:55 PM
    Friday, October 3, 2014 6:54 PM

All replies

  • Two simple solutions are: 1) Set whatever is generating the file to use LF-only output, or 2) convert on the unix side after you copy the file over. I believe tr or sed can do it pretty easily on the unix side.


    -- Bill Stewart [Bill_Stewart]

    Friday, October 3, 2014 3:12 PM
    Moderator
  • You can also use Replace-FileString.ps1 on the Windows side (PowerShell) to convert the line endings to LF-only instead of CRLF:


    replace-filestring "`r`n" "`n" inputfile.txt -multiline -overwrite


    -- Bill Stewart [Bill_Stewart]

    Friday, October 3, 2014 3:20 PM
    Moderator
  • With BCP you can create a control file that will determine the field and record terminators.  This si a notmal part of BCP and has nothing to do with scripting nor does it require any scripting.

    See the documentation for BCP or post more questions in the SQLServer forum.

    http://msdn.microsoft.com/en-us/library/ms162802.aspx

    Pay attention to the -f "format file" switch.

    You can also just use -r to set linefeed only.

    -


    ¯\_(ツ)_/¯

    Friday, October 3, 2014 3:41 PM
  • The bcp.exe documentation says the default row terminator is already '\n'.

    -- Bill Stewart [Bill_Stewart]

    Friday, October 3, 2014 3:49 PM
    Moderator
  • The bcp.exe documentation says the default row terminator is already '\n'.

    -- Bill Stewart [Bill_Stewart]

    Unfortunately in Windows \n is not the same as \l or \r.  It is compound. You must specify  0xA or \a or `a in some utilities.

    Almost all Unix utilities can translate the record terminator.  A couple of databases I have found don't do this natively.

    I just test BCP to be sure it works as expected.

    This command:

    BCP issue.dbo.demo_task out C:\temp\test.txt -c -t"," -r 0xA -T -S .\SQLExpress

    Adds 0xA as a record terminator. -r 0xA.  It is undocumented before SQS 2008R2.

    Look at the format file specs to get all of the variations on converging BCP references.


    ¯\_(ツ)_/¯


    • Edited by jrv Friday, October 3, 2014 6:54 PM
    • Marked as answer by Bangaaram Friday, October 3, 2014 8:55 PM
    Friday, October 3, 2014 6:54 PM
  • Typically CR (carriage return, \r, character 0x0D) followed by LF (line feed, \n, 0x0A) is the line-terminator for Windows.

    The bcp documentation says the default is \n, which should mean that lines are terminated by character 0x0A only. If that's not the case (and the default is really \r\n), then the documentation is in error because it specifically says "The default [for the -r option] is \n (newline character)".

    In any case, you and I are saying the same thing: The easiest way to deal with this problem is to generate the output file correctly in the first place. If its easiest for the file to have \n only as the line terminator, then generate it that way to begin with and then you don't have to worry about converting it afterwards.


    -- Bill Stewart [Bill_Stewart]



    Friday, October 3, 2014 7:30 PM
    Moderator
  • JRV, your bcp query works like a charm. No need of additional conversions. I didn't have the additional switch on my query earlier and that was causing me issue until now. I'm now able to move the file to UNIX and load into the Oracle database without any issues with the line feeds.

    Thanks a ton :)

    Just an FYI, the query I was running previously which was creating the CRs was:

    BCP issue.dbo.demo_task out C:\temp\test.txt -c -t, -T -S

    Cheers!


    Known is a DROP, Unknown is an OCEAN.

    Friday, October 3, 2014 8:54 PM
  • As Bill pointed out;  the default is \n which is known as the "line terminating sequence" in Windows. When not specified or when specifid as \n Windows adds a CR to the sequence. It is also known in the teletype world as the EOR sequence. This sequence historically was CrLf or 0xD 0xA or Crtl-M, Ctrl-L.  " 0xA" is the traditional new line character but Windows uses a new line sequence of the old TTY characters due to IBM legacy. AT&T Unix defined EOR and the terminating sequence as a single line feed character.

    Confusing huh?

    Oracle on Unix can be altered per session or per database to use the Windows terminator by default.

    BCP is regularly used to port data to Oracle on Unix so we can be certain it is able to do this.  With a format file we can do almost any conversions needed.


    ¯\_(ツ)_/¯

    Friday, October 3, 2014 9:30 PM