none
How to call DQS matching policy without UI

    Question

  • Hi All,

    Is there any way to call DQS matching policy without UI provided by Data Quality Client?

    I have an existing table included Name & Address columns, and then I created KB included two domains named dm_Name & dm_Address.

    I want to match these columns to itself by using DQS matching policy.

    Ex.  Maybe I can call DQS by using command line/dll/others with variables (like data source, column name, column-to-domain mapping rules, weights setting, matching result destination...etc) without executing from UI.

    Thanks in advanced.

    Thursday, March 15, 2012 10:29 AM

Answers

  • I don't think that is possible for the moment.
    It sure isn't possible in SSIS. (but I think they are working on it)

    MCTS, MCITP - Please mark posts as answered where appropriate.


    That's is correct. No API or SSIS transformation to do that in this version.

    Help Others! Don't forget to mark your thread as answered

    Rafael Salas - SSIS and more

    Thursday, March 15, 2012 2:33 PM
  • I see 2 options:

    A) In SSIS you can use fuzzy grouping/lookup transfromations to do matching. This will not leverage DQS kb at all.

    B) Use the MDS-DQS integration to do matching via xcel add-in. This is more of an interactive matchng process adn requires bringing MDS into the picture.

    Notice that I am new to DQS/MDS so there may be other ways.


    Help Others! Don't forget to mark your thread as answered

    Rafael Salas - SSIS and more

    Thursday, March 15, 2012 4:33 PM

All replies

  • I don't think that is possible for the moment.
    It sure isn't possible in SSIS. (but I think they are working on it)

    MCTS, MCITP - Please mark posts as answered where appropriate.

    Thursday, March 15, 2012 12:06 PM
  • I don't think that is possible for the moment.
    It sure isn't possible in SSIS. (but I think they are working on it)

    MCTS, MCITP - Please mark posts as answered where appropriate.


    That's is correct. No API or SSIS transformation to do that in this version.

    Help Others! Don't forget to mark your thread as answered

    Rafael Salas - SSIS and more

    Thursday, March 15, 2012 2:33 PM
  • Thanks for your reply, but how can I should do to accomplish about this matching task at this moment?

    Thursday, March 15, 2012 2:55 PM
  • I see 2 options:

    A) In SSIS you can use fuzzy grouping/lookup transfromations to do matching. This will not leverage DQS kb at all.

    B) Use the MDS-DQS integration to do matching via xcel add-in. This is more of an interactive matchng process adn requires bringing MDS into the picture.

    Notice that I am new to DQS/MDS so there may be other ways.


    Help Others! Don't forget to mark your thread as answered

    Rafael Salas - SSIS and more

    Thursday, March 15, 2012 4:33 PM
  • I forgot option C

    C) a matching project inside of DQS client. once again this is an interactive solution.


    Help Others! Don't forget to mark your thread as answered

    Rafael Salas - SSIS and more

    Thursday, March 15, 2012 5:08 PM
  • Thanks your suggestions, Rafael!

    So if I need to do this matching automatically(night job/trigger by other services), fuzzy grouping/lookup function is the only choice at this moment, right?

    Friday, March 16, 2012 1:55 AM
  • Hi Jackie,

    Can you tell us a bit more about your scenario - in what kind of situation do you do address matching this often?

    Is the data coming nightly something that could have existing records and you just want to find those to present duplicates? Do you just want to compare to data within that set to prevent duplicate inserts, or you need to compare the new data to the existing data in the database?

    I just want to know to tell the product team why the demand for matching in SSIS/automated fashion may be important for you the customers.

    Thanks, Jason


    Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation - http://support.microsoft.com/select/default.aspx?target=assistance

    Friday, March 16, 2012 5:44 AM
    Moderator
  • Hi Jason,

    There're two steps in my scenario, 
    1. Clean-up my own table(included Name, Addr, City...etc) which may exist duplicates.  --> self-matching automatically
    2. After that, when user create new account through UI (provide the input like name, addr, city...), it may call/send the input to lookup with my own table, check this input exist or not, and then send back the result/suggestion to user what's the next step.  --> call services

    Hope you understand that, and please feel free to reply if you have any question.

    BTW, is it possible to use the store procedures exist in DQS_MAIN database to call DQS matching function?

    Thanks in advanced.


    • Edited by Jackie_tw Friday, March 16, 2012 6:38 AM
    Friday, March 16, 2012 6:34 AM
  • Hi All,

    I found some APIs from MDS document and there name are start with DataQuality.

    I'm not sure these methods could do all matching jobs.

    Does anyone try these before?

    Reference URL : http://msdn.microsoft.com/de-de/library/microsoft.masterdataservices.services.servicecontracts.iservice(v=sql.110).aspx 

    Public method DataQualityCleansingOperationCreate Infrastructure. Creates an operation request for DQS quality cleansing.
    Public method DataQualityInstalledState Infrastructure. Returns the state of the installed DQS.
    Public method DataQualityKnowledgeBasesGet Infrastructure. Retrieves the knowledge bases of the DQS.
    Public method DataQualityMatchingOperationCreate Infrastructure. Creates an operation request for DQS matching.
    Public method DataQualityOperationPurgeOperation Infrastructure. Returns a purge operation for DQS operation.
    Public method DataQualityOperationResultsGet Infrastructure. Retrieves the results of a completed DQS operation.
    Public method DataQualityOperationStart Infrastructure. Starts a DQS operation.
    Public method DataQualityOperationStatus Infrastructure. Retrieves the status of the DQS operation.
    Public method DataQualityStoreInputData Infrastructure. Retrieves the data for the data quality store input.


    Friday, March 16, 2012 7:14 AM
  • Can you tell us a bit more about your scenario - in what kind of situation do you do address matching this often?

    Jason,

    There are several cases where unattended matching is required. In previous project I worked, we included matching logic as part of a nightly ETL to deduplicate individual, household and company records. 2 reasons for doing it that way comes to my mind:
    1. High volume of records. This batch matching was a first pass and 'good enough'.
    2. Data stewards could sample and approve/reject results later and request adjustments to matching logic.

    This is a very common approach in marketing DBs of prospects.


    Help Others! Don't forget to mark your thread as answered

    Rafael Salas - SSIS and more

    Saturday, March 17, 2012 10:54 PM
  • Hi All,

    I found some APIs from MDS document and there name are start with DataQuality.

    I'm not sure these methods could do all matching jobs.

    Does anyone try these before?

    Reference URL : http://msdn.microsoft.com/de-de/library/microsoft.masterdataservices.services.servicecontracts.iservice(v=sql.110).aspx 

    Public method DataQualityCleansingOperationCreate Infrastructure. Creates an operation request for DQS quality cleansing.
    Public method DataQualityInstalledState Infrastructure. Returns the state of the installed DQS.
    Public method DataQualityKnowledgeBasesGet Infrastructure. Retrieves the knowledge bases of the DQS.
    Public method DataQualityMatchingOperationCreate Infrastructure. Creates an operation request for DQS matching.
    Public method DataQualityOperationPurgeOperation Infrastructure. Returns a purge operation for DQS operation.
    Public method DataQualityOperationResultsGet Infrastructure. Retrieves the results of a completed DQS operation.
    Public method DataQualityOperationStart Infrastructure. Starts a DQS operation.
    Public method DataQualityOperationStatus Infrastructure. Retrieves the status of the DQS operation.
    Public method DataQualityStoreInputData Infrastructure. Retrieves the data for the data quality store input.


    Looks like they are not methods/properties that we could call..

    "This API supports the SQL Server 2012 infrastructure and is not intended to be used directly from your code"


    Chandra Sekhar

    Monday, March 19, 2012 8:15 PM
  • Hi All,

    Please vote for the following connect suggestions around automation via SSIS/web services if you agree:

    https://connect.microsoft.com/SQLServer/feedback/details/695885/dqs-run-matching-policy-from-ssis

    https://connect.microsoft.com/SQLServer/feedback/details/696741/dqs-access-matching-cleaning-via-web-service-or-other-endpoint

    Jason, just to give you a bit of feedback from other customers that we've seen. Essentially matching should be an automated task, e.g. you've got 100 new customers today and you want matching to find matches for those 100 customers against your existing store of say 1 million customers. Then, the user involvement should be simply accepting or rejecting the matches, rather than the user actually carrying out the matching as they would with DQS/MDS in Excel.

    Thanks

    Jeremy


    http://blogs.adatis.co.uk/blogs/jeremykashel/default.aspx If a post answers your question, please click the Mark As Answer button.

    Thursday, March 22, 2012 11:34 AM
  • Absolutely agree. Unless we have a completely wrong approach, as we could do cleasing operations through SSIS, we should run matching operations as well.

    I have a similar situation as Jeremy: we have to check if in a new list of hotels that give us an external provider, exist new or just existing ones reviewing fields like name, location, etc.

    Friday, March 23, 2012 10:39 AM
  • Absolutely agree. Unless we have a completely wrong approach, as we could do cleasing operations through SSIS, we should run matching operations as well.

    I have a similar situation as Jeremy: we have to check if in a new list of hotels that give us an external provider, exist new or just existing ones reviewing fields like name, location, etc.


    Until DQS Matching is included in SSIS, you can help yourself with the SSIS Lookup component and the Fuzzy Lookup component (if you have Enterprise edition).

    MCTS, MCITP - Please mark posts as answered where appropriate.

    Friday, March 23, 2012 10:43 AM
  • I know, but I suppose that DQS must be more powerful than the fuzzy components of SSIS
    Friday, March 23, 2012 10:57 AM
  • I guess so too, but we'll have to wait :)

    MCTS, MCITP - Please mark posts as answered where appropriate.

    Friday, March 23, 2012 10:59 AM
  • You can mimic Fuzzy Lookup functionality within a matching policy if you create a knowledge base in a manner that supports it. It took a couple of tries but I found a way. If you stage your data in such a manner that includes the original PKs of all the potentially matching data then you can extrapolate the results back out of it with a self join on the PIVOT_MARK column that is provided in the matching results. You can also determine your original PK for both records in the same self join as well. You just use the 'pivot' record as a means to collapse the hierarchy of records generated in the matching results. While you can't automate a matching policy for now you can definitely take advantage of what the matching policies offer that Jeremy outlines as advantages in his comparison on his blog. I'm not saying what I did is brilliant or anything but it is a way to leverage the matching policy functionality because of its advancements over the fuzzy lookup and fuzzy grouping.


    Brian


    • Edited by MrZaphod Friday, July 06, 2012 2:57 PM
    Friday, July 06, 2012 2:56 PM