DQS to solve enterprise data alignment subject


  • Hi, I need to compare two Excel wordbook. I'm beginner with DQS.

    Now, the DQS could be useful, to align two different tables?

    Each table belongs to a different database. Both databases represent the same information, but are misaligned between them.

    In order to solve this issue, do I join the two workbooks in an unique entity to apply DQS, please?

    In general, how is it possible to operate to match two data source misaligned with DQS?


    • Edited by pscorca Sunday, February 26, 2012 7:19 PM adding details
    Saturday, February 25, 2012 11:41 PM

All replies

  • Please define "misaligned."  Can you give an example of the data between the two workbooks?

    Phil Brammer | | Twitter:

    Saturday, March 03, 2012 1:05 AM
  • Hi Phil,

    I intend dis-aligned or not aligned data. A such issue I could have for two SQL tables and the issue it isn't closely related to the nature of the data source (flat file, workbook, SQL table, ...).


    Tuesday, March 06, 2012 11:39 AM
  • I don't get the question still... sorry.

    Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation -

    Friday, March 09, 2012 3:59 AM
  • Hi Jason, I'm worried about your doubts. I think that DQS couldn't be solved a data alignment problem and I think that a such subject it isn't very known in the Microsoft world.

    I try to explain better the issue, with an example. Suppose to have two databases, the first one in a main office named Production, the second one in a sales branch office named Sales. The Production db contains the Products table, with ProductID and ProductName, and the Sales db contains the Orders table, with the OrderID, OrderDate, OrderedProductID and OrderedProductName. Now, it could be an error in OrderedProductID and/or OrderedProductName respect to the same product as registered in the Products table. So, it is necessary to align the products data of the Orders table with the corresponding values of the Products table.



    Friday, March 09, 2012 10:43 AM
  • It is like doing matching between two tables to find similar rows and showing them side by side.

    DQS only does matching on a single source data set, and compares the values within the set to look for duplicates, using the semantic knowledge in the domains and domain values. It doesn't really let you do diffing or matching between two sets like that side by side.

    There isn't a canned way to connect the two similar tables, to find the possible overlaps and make them more correct.

    I think you can make a VIEW to UNION on the selects from the two tables, and see if DQS finds the matches across the two sets, but your goal may not be to eliminate duplicates. DQS would cluster the similar results, so the Data Steward can see them and know about the potential duplicates.

    Another approach may be to use SSIS and the Fuzzy Lookup component, to compare the two sets and find potential similar results across the two tables.

    Just curious, do you use a product that does this today alignment feature?

    Thanks, Jason

    Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation -

    Tuesday, March 13, 2012 9:57 PM
  • Hi Jason,

    there are other market players capable to manage data alignment issues, fe SAS. I think that Microsoft SQL Server solutions should improve about this subject. Your suggests lead to a possible solution. Really, it is more important the data alignment process or the functional view rather than the technologies to solve it.

    Perhaps, I've an alternative way. I think it is possible to assume one of two tables as the master data container. This master table in DQS could become a knowledge base with his domains and rules. In DQS, the second table could be matched respect to the knowledge base and the results produced by the data quality operations could be viewed and approved inside Master Data Services.

    I think that when a new technology is available, it is important to understand where and how it must be used in a business process. In this way, Microsoft technologies could be winning respect to another player that owns a product and not a platform or a technology set.


    Wednesday, March 14, 2012 1:48 PM
  • psorca,

    so what are you trying to 'allign' and what is the reference point? I would assume that the master catalog of products (IDs and names) is the in the production DB and that the sales DB is the one having issues with their product IDs and/or names. For that, before deciding in the toolset you need to explain to us and to yourself what are the rules to resolve the discrepancy. eg

    • The OrderedProductID is correct and can be used to fix the OrderedProductName
    • The OrderedProductName is correct and can be used to fix the OrderedProductID
    • neither The OrderedProductID nor OrderedProductName are relaible, but other attributes can be used such as unit price, some dates, etc

    Some analytics tools like SAS could certanly run some models in the data and present to you patterns, distributions, etc, but that would yous give you  the insight. You would like want/need to analyze those results and then implement a second process to fix the data.

    If you decompose this problem, I am sure you could solve it with SS2012 tools, and you can keep posting your questions here to help you.

    I doubt you will find a silver bullet to do all this for you automatically.

    Please post back as this an interesting problem.

    Help Others! Don't forget to mark your thread as answered

    Rafael Salas - SSIS and more

    Thursday, March 15, 2012 4:57 PM
  • Try Excel Fuzzy Lookup...

    Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation -

    Friday, March 16, 2012 4:43 AM