The SORT Command Drops Most Records

Answered The SORT Command Drops Most Records

  • Thursday, January 24, 2013 3:49 PM
     
     

    Hello,

    Believe it or not, the command line SORT program that has been a part of Windows/DOS for 30 years seems not to be working.

    I have a text file where each line of text is about 400 characters long. I want to sort starting with the first character. The input file is about 24MB. The output file is about 3.7MB. Obviously, much of the inpout file is missing from the output file.

    I have tried various values for the /M parameter but it seems to make no difference.

    Here's a command line I tried:
    sort /M 5600 MyInput.txt /o MySortedOut.txt

    Does anyone know why this happens? I'm not looking for alternatives so please do not respond with, "Try this..." or "Try PowerShell....". I am trying to determine why the SORT command is not working correctly.

    Thanks

All Replies

  • Thursday, January 24, 2013 4:09 PM
     
     

    Just out of curisosity why are you limiting the memory usage to 5600 kb?  Have you tried not specifying a memory limit?


    sort MyInput.txt /o MySortedOut.txt
  • Thursday, January 24, 2013 4:47 PM
     
     

    Does anyone know why this happens? I'm not looking for alternatives so please do not respond with, "Try this..." or "Try PowerShell....". I am trying to determine why the SORT command is not working correctly.

    Thanks

    I can't duplicate your observation. Is it based on a test performed on just one machine? If so then there is a serious risk of you falling into the infamous Fleischmann and Pons trap.

    It would also help if you posted some sample records.

  • Saturday, January 26, 2013 3:47 AM
     
     

    Does anyone know why this happens? I'm not looking for alternatives so please do not respond with, "Try this..." or "Try PowerShell....". I am trying to determine why the SORT command is not working correctly.

    Thanks

    I can't duplicate your observation. Is it based on a test performed on just one machine? If so then there is a serious risk of you falling into the infamous Fleischmann and Pons trap.

    It would also help if you posted some sample records.

    I've tried it on 3 machine each with at least 8 GB RAM. I have tried the SORT command without the /M option.

    Again, each "record" in the file is about 400 character (ASCII). I certainly can't post a 24MB file but the data is all text -- names and addresses, etc., nothing eotic.

    Try to duplicate it with a 24+MB file, please.

    Thanks

  • Saturday, January 26, 2013 8:06 AM
     
     Answered Has Code

    Try to duplicate it with a 24+MB file, please.

    Thanks

    The VBScript further down creates a text file of a little over 24 MBytes. Each record consists of a random number plus a fixed string of 420 characters. I sorted it with the command

    sort MyInput.txt /o MySortedOut.txt

    The output is exactly the same size as the input, as expected. My guess is that your problem has nothing to do with file size but that your data contains embedded "end of file markers" ($1a) which would tell sort.exe that this is the end of the file. Did you run any tests with large text files generated in a different way, e.g. like so: dir c:\ /s > Bigfile.txt?

    sRecord = "John Doe, 55 Main Street, Long Island, 0800 1234 5678, January 5 1970 John Doe, 55 Main Street, Long Island, 0800 1234 5678, January 5 1970 John Doe, 55 Main Street, Long Island, 0800 1234 5678, January 5 1970 John Doe, 55 Main Street, Long Island, 0800 1234 5678, January 5 1970 John Doe, 55 Main Street, Long Island, 0800 1234 5678, January 5 1970 John Doe, 55 Main Street, Long Island, 0800 1234 5678, January 5 1970"
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oBigFile = oFSO.CreateTextFile("d:\MyInput.txt", True)
    
    For i = 0 To Int(24000000 / (Len(sRecord) + 9))
        oBigFile.WriteLine Int(1000000000 * Rnd()) & " " & sRecord
    Next
    oBigFile.Close



  • Sunday, January 27, 2013 5:15 AM
     
     

    My guess is that your problem has nothing to do with file size but that your data contains embedded "end of file markers" ($1a) which would tell sort.exe that this is the end of the file.

    You are 100% correct!! Or client sent us a bad file with an embedded 0x1A. Thank you, and very well done!!!!