Why Data Sort is required for Merge Join Transformation


  • Merge Join is going to walk through two sets in the order that you gave in your input or using the Sort transformation. So, it loads one record from one input and one record from the second input.  If the keys match, it will output the row with information from both inputs.  If the left input has a value that is less than the value in right input, then either the row from the left input is sent out with no right input information (if there is an outer join) or if the join is an inner join, the Merge Join component will get the next row from Left input.  If the right input has a value that is less than the value that in the left input, then if there is a full outer join, output the right input with no left input information otherwise, get the next row from the right input.

    Here is the beauty, SSIS only needs to retain a couple rows in memory.

    What if Microsoft decided that there is no requirement for sorting?  Then in order for the Merge join to work is that it would load all of the rows from one input into memory and then the Merge Join would look up the row in memory.  That means a large amount of memory would be needed.

    By the way, if you want Merge Join behavior without the sort, use a Lookup component.

    Russel Loski, MCT, MCSE Data Platform/Business Intelligence. Twitter: @sqlmovers; blog:

    • Marked as answer by aamertaimor Monday, December 30, 2013 9:29 AM
    Sunday, December 29, 2013 6:35 PM