locked
Need help writing SQL Query - Capture rows with issues and load the remaining rows into TargetTable RRS feed

  • Question

  • CREATE TABLE #SourceTable (ID INT, Product VARCHAR(100), Qty int , ErrorMessage varchar(2000))
    CREATE TABLE #TargetTable (ID INT, Product VARCHAR(6), Qty int not null)

    INSERT INTO #SourceTable (ID, Product, Qty)
    SELECT 1, 'Bike' , 10   union all
    SELECT 10, 'Computer', 8   union all
    SELECT 18, 'Phone' , 5    union all
    SELECT 23, 'Tablet' , 45   union all
    SELECT 100, 'Car', NULL 

    SELECT * FROM #SourceTable

    I have to load data from SourceTable to Target table everyday but the load should not fail even if load has issues with few source rows.
    In this example, i have 5 rows that i have to load into Target table.
    By looking at data we know the row with ID=10 will fail because the length of Product is big in Source but its less in Target.
    Also, Row with ID=100 will fail because the Qty is null in Source table but its Not NULL column in Target table.

    Because of these 2 rows the load should not fail. The remaining 3 rows should make it to Target table and for the 2 rows with issues, update the SourceTable with  Error messages it might have failed during load in ErrorMessage column in the same SourceTable

    This is just small table i gave as an example, but my case i have few columns and heavy data. There might be many rows which might cause the load to fail.

    Not in SSIS. This should be in SQL Query like Insert into Target and Select from Source Table

    Friday, December 21, 2018 9:21 PM

All replies

  • CREATE TABLE #SourceTable (ID INT, Product VARCHAR(100), Qty int , ErrorMessage varchar(2000))
    CREATE TABLE #TargetTable (ID INT, Product VARCHAR(6), Qty int not null)
    
    INSERT INTO #SourceTable (ID, Product, Qty)
    SELECT 1, 'Bike' , 10   union all
    SELECT 10, 'Computer', 8   union all
    SELECT 18, 'Phone' , 5    union all
    SELECT 23, 'Tablet' , 45   union all
    SELECT 100, 'Car', NULL 
    
    Update #SourceTable
    Set ErrorMessage='truncation occurs'
    WHERE Len(Product)>6;
    
    Update #SourceTable
    Set ErrorMessage=' Qty is null'
    WHERE Qty is null
    
    
    Merge #TargetTable t
    using #SourceTable s on s.ID=t.ID
    WHEN NOT MATCHED and ErrorMessage is null then 
    Insert (ID, Product, Qty) Values(s.ID, S.Product, s.Qty )
    ;
    
    
    Select * from #TargetTable
    
    SELECT * FROM #SourceTable
    
    drop table #SourceTable,#TargetTable

    Friday, December 21, 2018 9:54 PM
  • Thanks. But I have to capture the errors that occurred during the load process and update in Errormessage column. There might be plenty of reasons the load might fail for few rows. And even one row can fail with issues with data in two or more columns and if one row fails for many reasons then put all messsages concatenated in the Errormessage columns for same row.
    My real source table has around 15 columns and error can occur with data in one or more columns for same row.
    • Edited by Leo00 Friday, December 21, 2018 10:13 PM
    Friday, December 21, 2018 10:11 PM
  • If you want multiple error message for the same row, change it to:

    Update #SourceTable
    Set ErrorMessage = concat(ErrorMessage, ' Qty is null')
    WHERE Qty is null

    If you want to trap errors that occurs when you actually load the data, the is run the INSERT/UPDATE/MERGE or whatever you do, that is not possible. Each statement is atomic, so if one row fails, all other rows are rolled back.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Proposed as answer by Naomi N Sunday, December 23, 2018 2:54 AM
    Friday, December 21, 2018 10:27 PM
  • Thanks. But I have to capture the errors that occurred during the load process and update in Errormessage column. There might be plenty of reasons the load might fail for few rows. And even one row can fail with issues with data in two or more columns and if one row fails for many reasons then put all messsages concatenated in the Errormessage columns for same row.
    My real source table has around 15 columns and error can occur with data in one or more columns for same row.

    you can do this using SSIS tasks or using T-SQL

    Always doing this in T-SQL would be best as it would be set based logic compared to SSIS which will work in batches

    In SSIS you can configure error output within data flow tasks to capture the error code and description along with column details to your error

    For capturing the business logic related errors, you can use execute sql task or conditional task based logic inside data flow or even script task for doing validations

    As an example see how can do this below

    https://www.mssqltips.com/sqlservertip/2149/capturing-and-logging-data-load-errors-for-an-ssis-package/

    https://prathy.com/2017/01/validating-data-using-ssis/

    Doing in T-SQL requires a procedure where you can have separate UPDATE statements to check for each condition and append the error details to the error description columns

    You can also implement this using constraints and rules like below

    https://www.red-gate.com/simple-talk/sql/learn-sql-server/robyn-pages-sql-server-data-validation-workbench/


    Please Mark This As Answer if it solved your issue
    Please Vote This As Helpful if it helps to solve your issue
    Visakh
    ----------------------------
    My Wiki User Page
    My MSDN Page
    My Personal Blog
    My Facebook Page

    Saturday, December 22, 2018 9:32 AM
  • Hi Leo00,

    As Erland said that 'Each statement is atomic, so if one row fails, all other rows are rolled back' , when you insert your rows together and one of them shows error ,  it means the operation is failed.

     

    If you would like to get your requirement , I suggest you to insert your rows one by one. In  my following script , when you insert one row you will execute the procedure 'CatchErrorDemo' once. After you execute the procedure , you will update the error message or insert the row into target table.

     

    Please try it and see if it satisfies your requirement.

    ---drop table #SourceTable
    ---drop table #TargetTable
    
    CREATE TABLE #SourceTable (ID INT, Product VARCHAR(100), Qty int , ErrorMessage varchar(2000))
    CREATE TABLE #TargetTable (ID INT, Product VARCHAR(6), Qty int not null)
    
    INSERT INTO #SourceTable (ID, Product, Qty)
    SELECT 1, 'Bike' , 10   union all
    SELECT 10, 'Computer', 8   union all
    SELECT 18, 'Phone' , 5    union all
    SELECT 23, 'Tablet' , 45   union all
    SELECT 100, 'Car', NULL 
    
    SELECT * FROM #SourceTable
    
    
    go 
    CREATE PROCEDURE CatchErrorDemo
     @a int
    AS
        BEGIN TRY
           insert into #TargetTable select ID,Product, Qty from (select *, row_number()over(order by (select 1)) as rn from #SourceTable) t where rn =@a
        END TRY
        BEGIN CATCH
            update   a  Set a.ErrorMessage=ERROR_MESSAGE() 
    		from (select *, row_number()over(order by (select 1)) as rn from #SourceTable) t 
    		join #SourceTable a on t.ID=a.ID
    		where t.rn =@a 
                  
        END CATCH;
    
    EXEC CatchErrorDemo 1;
    EXEC CatchErrorDemo 2;
    EXEC CatchErrorDemo 3;
    EXEC CatchErrorDemo 4;
    EXEC CatchErrorDemo 5;
    SELECT * FROM #SourceTable
    /*
    ID          Product                                                                                              Qty         ErrorMessage
    ----------- ---------------------------------------------------------------------------------------------------- ----------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1           Bike                                                                                                 10          NULL
    10          Computer                                                                                             8           String or binary data would be truncated.
    18          Phone                                                                                                5           NULL
    23          Tablet                                                                                               45          NULL
    100         Car                                                                                                  NULL        Cannot insert the value NULL into column 'Qty', table 'tempdb.dbo.#TargetTable________________________________________________________________________________________________________00000000000D'; column does not allow nulls. INSERT fails.
    */
    SELECT * FROM #TargetTable
    /*
    ID          Product Qty
    ----------- ------- -----------
    1           Bike    10
    18          Phone   5
    23          Tablet  45
    */
    
    
    



    Hope it can help you .

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, December 24, 2018 6:58 AM
  • I came up with below query and its working. But as i have million of rows to load into TargetTable, can i modify this query to improve performance?

    DECLARE @ID		INT
    DECLARE @Product	VARCHAR(100)	
    DECLARE @Qty 		INT
    DECLARE @ErrorMessage	VARCHAR(2000)
    
    DECLARE SourceCur CURSOR FOR   
    SELECT ID, Product, Qty FROM #SourceTable
    OPEN SourceCur
    	FETCH NEXT FROM SourceCur INTO @ID,  @Product, @Qty
    	BEGIN_OF_LOOP:
    		BEGIN TRY
    
    			WHILE @@FETCH_STATUS = 0
    				BEGIN
    					INSERT INTO #TargetTable ( ID, Product, Qty )
    					VALUES (@ID, @Product, @Qty)
    				FETCH NEXT FROM SourceCur INTO @ID,  @Product, @Qty
    				END
    		
    		END TRY
    		BEGIN CATCH
    
    			SELECT @ErrorMessage = ERROR_MESSAGE()
    			UPDATE #SourceTable SET ErrorMessage = @ErrorMessage WHERE ID = @ID
    
    			FETCH NEXT FROM SourceCur INTO @ID,  @Product, @Qty
    			GOTO BEGIN_OF_LOOP
    		
    		END CATCH;
    CLOSE SourceCur; 
    DEALLOCATE SourceCur;


    • Edited by Leo00 Wednesday, December 26, 2018 3:01 PM
    Wednesday, December 26, 2018 3:00 PM
  • Unfortunately, for the general case  you do need to do something like that to be able to catch individual errors.

    There are a few things you can do to improve performance. The first is to make the cursor STATIC LOCAL, since the default cursor type is dynamic and not very efficient. I would not expect any dramatic performance improvement in this case. But at least it is simple.

    Other ways to speed things up are more complex. What you can to is to attempt to insert a couple of rows at a time, and if this fails, fall back to a loop which runs one by one. Exactly how many rows there should be a in a batch depends on how common you expect errors to be. If you expect one row in thousand to have an error, and you have a batch size of thousand, most of the batches will be wasted efforts, so in that case a batch size of 50 or 100 is better.

    The technique would be to add a batchno column to the source, and first populate this row so that there are 50 (or whatever) rows with batchno = 1, 50 rows with batchno = 2 etc. When a batch files with an error, you trap that, and renumber all rows with a new batch number, and now with a batch size of 1.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Wednesday, December 26, 2018 3:50 PM