none
Fuzzy Matching Performance RRS feed

  • Question

  • We are running fuzzy matching and fuzzy grouping processes to get the best matches between source and reference records.

    The performance of the SSIS degrades greatly with the increase in the volume of reference records.

    We have tried increasing the buffer size, threads and also using sql commands within the SSIS within the package instead of views.

    Looks getting better but not really satisfactory.

    Would like to know whether anyone has faced this issue and has a solution /suggestion to the problem.

    Thanks in advance

    Wednesday, February 3, 2010 3:08 PM

Answers

  • Hi,

    The primary determinant of Fuzzy Grouping and Fuzzy Lookup performance is data size. So we have to keep the size of the column as small as possible.

    The following article will explain the performance of that two transforms, and it also provides some solution to increase performace.

    Understanding performance section:
    http://msdn.microsoft.com/en-us/library/ms345128(SQL.90).aspx#fzdtssql05_topic6



    Please feel free to let me know if I’ve misunderstood anything.
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    • Marked as answer by Zongqing Li Tuesday, February 9, 2010 6:06 AM
    Friday, February 5, 2010 3:22 AM
  • A small column size may improve performance but it will degrade the quality of the matches.

    Lower similarity scores = more matches = lower performance.


    RE Lookups:

    Have you got Exhaustive set to False?

    Are you maintaining your indexes or recreating them each time? You can probably improve performance by persisting these.

    RE Grouping:

    Memory will be the blocker for performance of this component.

    Cheers, James

    PS: These articles may be of use: http://www.bimonkey.com/tag/fuzzy/


    James Beresford @ http://www.bimonkey.com SSIS / MSBI Consultant in Sydney, Australia
    • Marked as answer by Zongqing Li Tuesday, February 9, 2010 6:06 AM
    Friday, February 5, 2010 4:14 AM

All replies

  • Hi,

    The primary determinant of Fuzzy Grouping and Fuzzy Lookup performance is data size. So we have to keep the size of the column as small as possible.

    The following article will explain the performance of that two transforms, and it also provides some solution to increase performace.

    Understanding performance section:
    http://msdn.microsoft.com/en-us/library/ms345128(SQL.90).aspx#fzdtssql05_topic6



    Please feel free to let me know if I’ve misunderstood anything.
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    • Marked as answer by Zongqing Li Tuesday, February 9, 2010 6:06 AM
    Friday, February 5, 2010 3:22 AM
  • A small column size may improve performance but it will degrade the quality of the matches.

    Lower similarity scores = more matches = lower performance.


    RE Lookups:

    Have you got Exhaustive set to False?

    Are you maintaining your indexes or recreating them each time? You can probably improve performance by persisting these.

    RE Grouping:

    Memory will be the blocker for performance of this component.

    Cheers, James

    PS: These articles may be of use: http://www.bimonkey.com/tag/fuzzy/


    James Beresford @ http://www.bimonkey.com SSIS / MSBI Consultant in Sydney, Australia
    • Marked as answer by Zongqing Li Tuesday, February 9, 2010 6:06 AM
    Friday, February 5, 2010 4:14 AM