locked
BULK INSERT yields inconsistent results RRS feed

  • Question

  • I have a really STRANGE issue that's driving me nuts. I am getting inconsistent results when BULK INSERTING data from a tab-delimited text file. As part of my testing, I run the same code on the same file again and again, and I get different results every time! I get this on SQL 2005 and SQL 2012 R2.

    We have an application that imports data from a spreadsheet. The sheet contains section headers with account numbers and detail rows with transactions by date:

    AAAA.1234 /* (account number)*/

    1/1/2015      $150                 First Transaction

    1/3/2015      $24.233              Second Transaction

    BBBB.5678

    1/1/2015      $350                 Third Transaction

    1/3/2015      $24.233              Fourth Transaction

    My Import program saves this spreadsheet at tab-delimited text, then I use BULK INSERT to bring the data into a generic table full of varchar(255) fields. There are about 90,000 rows in each day's data; after the BULK INSERT about half of them are removed for various reasons.

    Next I add a RowID column to the table with the IDENTITY (1,1) property. This gives my raw data unique row numbers.

    I then run a routine that converts and copies those records into another holding table that's a copy of the final destination table. That routine parses though the data, assigning the account number in the section header to each detail row. It ends up looking like this:

    AAAA.1234     1/1/2015      $150          First Purchase

    AAAA.1234     1/3/2015      $24.233              Second Purchase

    BBBB.5678     1/1/2015      $350                 Third Purchase

    BBBB.5678     1/3/2015      $24.233              Fourth Purchase

    My technique: I use a cursor to get the starting RowID for each Account Number: I then use the upper and lower RowIDs to do an INSERT into the final table. The query looks like this:

    SELECT RowID, SUBSTRING(RowHeader, 6,4) + '.UBC1' AS AccountNumber

    FROM   GenericTable

    WHERE RowHeader LIKE '____.____%'

    Results look like this:

    But every time I run the routine, I get different numbers!

    Needless to say, my results are not accurate. I get inconsistent results EVERY TIME. It's maddening! 

    Here is my code, with table, field and account names changed for business confidentiality. I’m open to suggestions; I’m truly stuck. This is a high profile project at my company; any help would be deeply appreciated.

    TRUNCATE TABLE GenericImportTable;

    ALTER TABLE GenericImportTable DROP COLUMN RowID;

    BULK INSERT GenericImportTable FROM '\\SERVER\General\Appname\DataFile.2015.05.04.tab.txt'

    WITH (FIELDTERMINATOR = '\t', ROWTERMINATOR = '\n', FIRSTROW = 6)

    ALTER TABLE GenericImportTable ADD RowID int IDENTITY(1,1) NOT NULL

    SELECT RowID, SUBSTRING(RowHeader, 6,4) + '.UBC1' AS AccountNumber

    FROM GenericImportTable

    WHERE  RowHeader LIKE '____.____%'


    • Edited by Barry Seymour Wednesday, May 6, 2015 11:58 PM Added SQL Server versions
    Wednesday, May 6, 2015 11:58 PM

Answers

  • There are a lot of moving parts here but I suspect you are assuming the addition of the IDENTITY column will assign the IDENITY values in a particular sequence (e.g. same order the rows were originally inserted).  This is not guaranteed.  You may get different values by happenstance simply by rerunning the same process with the same data. 

    If you need the RowID values assigned in a particular order, you'll need to use another technique that honors an ORDER BY clause, such as ROW_NUMBER() OVER(ORDER BY...), an IDENTITY function in a SELECT...INTO...ORDER BY, or assign a sequence number in the application as you create the delimited file.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Thursday, May 7, 2015 1:08 AM

All replies

  • There are a lot of moving parts here but I suspect you are assuming the addition of the IDENTITY column will assign the IDENITY values in a particular sequence (e.g. same order the rows were originally inserted).  This is not guaranteed.  You may get different values by happenstance simply by rerunning the same process with the same data. 

    If you need the RowID values assigned in a particular order, you'll need to use another technique that honors an ORDER BY clause, such as ROW_NUMBER() OVER(ORDER BY...), an IDENTITY function in a SELECT...INTO...ORDER BY, or assign a sequence number in the application as you create the delimited file.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Thursday, May 7, 2015 1:08 AM
  • Dan, thank you so much! This is like discovering that your wife Mary is actually named Katherine! I had always assumed that after a BULK INSERT, the rows in the table would be in the same order as they had been in the source file. Evidently that is not an assumption I can make.

    I have a program that converts an XLS file to XLSX, then again to tab-delimited text. I'll add code to the end of that process to insert my 'RowID' column into the text file, then import that. This will give me the RowID I need to ensure my columns are in the correct order.

    Thanks a million!

    Barry Seymour

    Thursday, May 7, 2015 2:19 PM