none
SSIS Fuzzy Matching RRS feed

  • Question

  • Is there any way to use fuzzy lookup functionality (that exists in SSIS) in your .Net code?  I want to call a .Net class rather than calling an SSIS package.

    Thank you.

    -- AK

    Friday, January 6, 2006 9:49 PM

Answers

  • I see what you want. No, there is no way to avoid running package.

    Why do you think including a package slows down the process? How have you measured this (as there is no way to run a transform without a package)?

    Sunday, January 8, 2006 7:19 AM

All replies

  • There is a managed API for loading and running SSIS packages, see Books Online. You can also use SSIS ADO.NET provider to execute SSIS package and directly access data produced by it (convinient if you need data in the code rather than in some staging table).
    Saturday, January 7, 2006 12:53 PM
  • Thank you, Michael.

    I looked in the Books Online, and could find the API, everything ... except what I needed... calling fuzzy matching method.  I realize that I could call an SSIS package, but that is exactly what I want to avoid.  Including an SSIS package in the process slows down the process.  I need a fast response.

    -- AK

    Sunday, January 8, 2006 3:35 AM
  • I see what you want. No, there is no way to avoid running package.

    Why do you think including a package slows down the process? How have you measured this (as there is no way to run a transform without a package)?

    Sunday, January 8, 2006 7:19 AM
  •  Michael Entin SSIS wrote:

    I see what you want. No, there is no way to avoid running package.

    Why do you think including a package slows down the process? How have you measured this (as there is no way to run a transform without a package)?

    Michael,

    For small, discrete operations I have found the package startup time to be a burden. For example, I had a package that did my custom logging for me. I called it from all the eventhandlers from which i wanted to do logging so obviously this package was getting called very very often indeed. The startup time of the package each time I called it made it unworkable so I resorted to using a sproc. Nothing wrong with this of course but I would have liked to have kept all functionality in SSIS.

    I hope package parts (or whatever it will be called) in the next version will alleviate this.

    -Jamie

     

    Sunday, January 8, 2006 10:18 AM
    Moderator
  • The real issue here as I see it, is people want to leverage the Fuzzy routines, particularly lookup, for Web/Win form applications. Data entry type scenarios where you could use the lookup to find existing entries that “look like the one I have here” to save duplicate registration, or for enhanced searching. We can now use fuzzy in the back end of ETL to clean data, but would it not be cool to leverage this technology to prevent it happening in the first place? Fuzzy in the back end still makes sense for combining multiple sources of data, but to prevent duplication or just plain bad data entry within a single system, then you would want a non SSIS or rather non package dependent API, rather like we now have for data mining. Jamie McLenan(?) does a nice demo of data entry in a web form, which then leverages a DM model to determine how likely the data values are, compared to the model. This can detect your 13 year old father of 4 with 3 cars, but it does not work on individual fields, and fuzzy would. No doubt I am not the first person to ask for this….

     

    Back to the original point, the only way to leverage Fuzzy today is to use a package, and firing a package for a simple one field (one row) lookup does not scale.

    Sunday, January 8, 2006 3:25 PM
    Moderator
  • Thank you  Michel, Jamie, and Darren for clarifying the issue.

    Yes, as Jamie wrote, the startup time for a SSIS package is quite significant.  I have a project with a relatively big database (3+ million records), and needed to do fuzzy matching initiated from a UI. The response time was seconds, which was bigger than user expected response time.

    -- AK

    Monday, January 9, 2006 4:33 AM