none
Problems with merge join

    Question

  • Hello everyone

    Am having a raw file source and oledb source and am using a merge join (left outer join)

            raw file            oledb source
                     merge left join

    AM dealing with 10 million records.My problem is if i generate raw file with selected
    data .My join works perfectly whereas if i generate the file for entire 10 million rows and use the file for merge join, my merge join fails (it includes those records for which it succeeded above) .can someone throw some light on this?

                          
    Thursday, October 10, 2013 7:40 AM

Answers

  • hi every one,

    The issue got resolved !!

    it was the problem with way i was generating the raw file .I was using conditional split  and performed some transformations and did a union all which disturbed the sort order.

    Hence rather than splitting the data i took it in script component and used if else block (for achieving the objective of conditional split and hence did the trick for me)

    Thanks

    • Marked as answer by Nishink Friday, October 11, 2013 7:55 AM
    Friday, October 11, 2013 7:55 AM

All replies

  • Merge join is a asynchronous transformation which will cause performance hit for large data. Much better option would be to stage file data to a staging table using data flow task. Then use Execute SQL Task to do LEFT JOIN between Staging and your other source table. 
    Thursday, October 10, 2013 8:04 AM
  • Hi Visakh,

    thanks for the response but the package is existing one and i dont want to add any table .The existing package works fine . i had modified few things for the file generation and i dont want to retain data again as already am having it in raw file so it will be rework for me having it in raw file and again in table and all

    Thursday, October 10, 2013 8:13 AM
  • Hi Visakh,

    thanks for the response but the package is existing one and i dont want to add any table .The existing package works fine . i had modified few things for the file generation and i dont want to retain data again as already am having it in raw file so it will be rework for me having it in raw file and again in table and all

    Thursday, October 10, 2013 8:13 AM
  • But using merge join will perform very poorly for the incoming data as its a blocking transform. So yo've to either live with it or do suggested changes.

    Whats the error you got? Is it related to timeout?

    Thursday, October 10, 2013 8:24 AM
  • am not getting any error .But merge join is failing. the same records exist on left as well as right but still the merge join is failing
    Thursday, October 10, 2013 10:19 AM
  • Sorry didnt get that. If you're not getting any error then what do you mean by merge join is failing? Do you mean you're not getting desired output?
    Thursday, October 10, 2013 10:26 AM
  • Yes visakh,

    i meant i am not getting the correct output!!!

    Thursday, October 10, 2013 2:37 PM
  • if the raw file is having 10M rcds, and you are using merge join in SSIS to perform this, I think this is not a good approach. Merge join component requires sorted input, so you would sort the 10M raw file rcds...As Visakh16 mentioned, you can stage the raw file data in staging table and then use T-SQL to perform load. It will be good from performance prespective..

    Thanks, hsbal

    Thursday, October 10, 2013 5:23 PM
  • Harry,

    the data is sorted when the raw file is generated itself.And am working on the package as a part of enhancement,In the Existing package this task takes 15 mins.i think putting the data again in table will be extra overhead.More over if i generate raw file with selected
    data .My join works perfect whereas if i generate the file for entire 10 million rows and use the file for merge join with oledb source it is not working(also for rows for which it worked perfectly before).

    Thursday, October 10, 2013 5:45 PM
  • hi every one,

    The issue got resolved !!

    it was the problem with way i was generating the raw file .I was using conditional split  and performed some transformations and did a union all which disturbed the sort order.

    Hence rather than splitting the data i took it in script component and used if else block (for achieving the objective of conditional split and hence did the trick for me)

    Thanks

    • Marked as answer by Nishink Friday, October 11, 2013 7:55 AM
    Friday, October 11, 2013 7:55 AM