none
How use fuzzy merge for text strings in SQL? RRS feed

  • Question

  • Hi, can anybody help me please with fuzzy merge in MS SQL 2014?

    Table tab1 has one column NAME ( i copy from my Excel)

       NAME
                                                            Incorrect actions when a malfunction is detected - "push" "on the way
                                                                                2 passes of preliminary light signaling at night
                                    The inclusion of the Saut-Ts (TsM) KIO-Saut at a pressure in the TM, other than the charging
                                                                               Reversing with a completely unused shunting route
                                   Violation of the algorithm for a single or periodic check of vigilance SAUT-Ts (TsM), KIO-SAUT
                                            Violation of driving conditions for trains with increased weight and increased length
      Incorrect input of train characteristics in efficiency, KLUB-U (UP), BLOCK at / and the driver at the beginning of the trip
                                        Stop on limiting lift with critical weight without the need for an auxiliary locomotive
                                                                                 Stops of trains on a stage with an enable signal
                         Setting the toggle switch DZ to the position WITHOUT ALSN in the coded area or untimely setting in ALS
                                                                               AT in situations not specified in the classifier
                            AT due to the lack of confirmation of vigilance when following a prohibiting signal without passage
                                                 AT due to non-confirmation by the driver of operability at the request of TSKBM
                   АТ due to excess of speed controlled by a safety device when following a prohibiting signal without passage
                                                                                                          AT in the parking lot
        AT in case of violation of the sequence of traffic lights with a prohibition indication with CLUB-U (UP), BLOCK in / and
                                                                         AT in the absence of registration of pressing RB, RBS

    Table tab2 has such format with two column(NAME, idsnar)

                                                  NAME idspnar
                                                                     Incorrect actions when a malfunction is detected - "push" "on the way -- 1
                                                                                      2 passes of preliminary light signaling at night 2
                                        The inclusion of the Saut-Ts (TsM) KIO-Saut at a pressure in the TM, other than the charging 3
                                                                                  Reversing with a completely unused shunting route 4
                                               Violation of the algorithm for a single or periodic check of vigilance SAUT-Ts (TsM), KIO-SAUT 5
                                                             of driving conditions for trains with increased weight and increased length 6
                Incorrect input of train characteristics in efficiency, KLUB-U (UP), BLOCK at / and the driver at the beginning of the trip 7
                                                      Stop on limiting lift with critical weight without the need for an auxiliary locomotive 8
                                                                        Stops of trains where  on a stage with an enable signal was error 9
                           Setting the toggle switch DZ to the position WITHOUT ALSN in the coded area or untimely setting in ALS  400-500 10
                                                                                          AT in situations not specified in the classifier 11
                                        AT due to the lack of confirmation of vigilance when following a prohibiting signal without passage 12
                                                            AT due to non-confirmation by the driver of operability at the request of TSKBM 13
                       АТ due to excess of speed controlled by a safety  1000 device when following a prohibiting signal without passage 14
                                                                                                                     AT in the parking lot 15
    AT in case of violation of the sequence of traffic lights with a prohibition indication with CLUB-U (UP), BLOCK in / and something more 16
                                                                                  AT in the absence of registration of pressing RB, RBS TGV 17

    I need no simple inner join, i need fuzzy join, cause the texts can be very slightly different, somewhere similar text but there are a couple of numbers added, or somewhere some punctuation marks. Of course, completely different texts cat and dogs should not be joined:)  

    how to join these two tables by text strings desired output (key column is NAME it is text variable)

       NAME                                               NAME idspnar
                                                            Incorrect actions when a malfunction is detected - "push" "on the way                                                                  Incorrect actions when a malfunction is detected - "push" "on the way -- 1
                                                                                2 passes of preliminary light signaling at night                                                                                   2 passes of preliminary light signaling at night 2
                                    The inclusion of the Saut-Ts (TsM) KIO-Saut at a pressure in the TM, other than the charging                                     The inclusion of the Saut-Ts (TsM) KIO-Saut at a pressure in the TM, other than the charging 3
                                                                               Reversing with a completely unused shunting route                                                                               Reversing with a completely unused shunting route 4
                                   Violation of the algorithm for a single or periodic check of vigilance SAUT-Ts (TsM), KIO-SAUT                                            Violation of the algorithm for a single or periodic check of vigilance SAUT-Ts (TsM), KIO-SAUT 5
                                            Violation of driving conditions for trains with increased weight and increased length                                                          of driving conditions for trains with increased weight and increased length 6
      Incorrect input of train characteristics in efficiency, KLUB-U (UP), BLOCK at / and the driver at the beginning of the trip             Incorrect input of train characteristics in efficiency, KLUB-U (UP), BLOCK at / and the driver at the beginning of the trip 7
                                        Stop on limiting lift with critical weight without the need for an auxiliary locomotive                                                   Stop on limiting lift with critical weight without the need for an auxiliary locomotive 8
                                                                                 Stops of trains on a stage with an enable signal                                                                     Stops of trains where  on a stage with an enable signal was error 9
                         Setting the toggle switch DZ to the position WITHOUT ALSN in the coded area or untimely setting in ALS                        Setting the toggle switch DZ to the position WITHOUT ALSN in the coded area or untimely setting in ALS  400-500 10
                                                                               AT in situations not specified in the classifier                                                                                       AT in situations not specified in the classifier 11
                            AT due to the lack of confirmation of vigilance when following a prohibiting signal without passage                                     AT due to the lack of confirmation of vigilance when following a prohibiting signal without passage 12
                                                 AT due to non-confirmation by the driver of operability at the request of TSKBM                                                         AT due to non-confirmation by the driver of operability at the request of TSKBM 13
                   АТ due to excess of speed controlled by a safety device when following a prohibiting signal without passage                    АТ due to excess of speed controlled by a safety  1000 device when following a prohibiting signal without passage 14
                                                                                                          AT in the parking lot                                                                                                                  AT in the parking lot 15
        AT in case of violation of the sequence of traffic lights with a prohibition indication with CLUB-U (UP), BLOCK in / and AT in case of violation of the sequence of traffic lights with a prohibition indication with CLUB-U (UP), BLOCK in / and something more 16
                                                                         AT in the absence of registration of pressing RB, RBS                                                                               AT in the absence of registration of pressing RB, RBS TGV 17

    How can i correct join two tables ? tables can be different size, in one can be 20 rows and in second 10000 rows. In my case it can be, that difference in the count of lines can be very big. In one dataset 2 row and in second 101010101 rows)) as example. i join these tables cause i need get idsnar column

    Friday, October 18, 2019 11:27 AM

All replies

  • Good day Zerw,

    Do you know how to create new table in SQL Server and how to INSERT data to the table?

    If not, then please inform us.
    If yes, then please provide:

    1) Queries to CREATE your table(s) including indexes
    2) Queries  to INSERT sample data.
    3) The desired result given the sample, as text or image of excel for example.
    4) A short description of the business rules, and how you got 1-2 of the results
    5) Which version of SQL Server you are using (this will help to fit the query to your version).


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Saturday, October 19, 2019 2:43 AM
    Moderator
  • Have you tried the soundex function?

    https://docs.microsoft.com/en-us/sql/t-sql/functions/soundex-transact-sql?view=sql-server-ver15

    Sunday, October 20, 2019 2:14 AM
  • Hi ,

    Thank you for your posting,

     

    Could you  please share us your table structure (CREATE TABLE …) and some sample data (INSERT INTO…) along with your expected result? So that we’ll get a right direction and make some test.

    Also , I guess your requirement. Would you like to join two tables which has the same column name ? Are you worried about performance or do you want to know how to use JOIN?

    If you would like to use join, please refer to following script.

    --drop table a 
    --drop table b
    create table A
    (id int,
    name varchar(20))
    create table B
    (id int,
    job int,
    parent_id int)
     insert into A values 
     (1,'sam'),
     (2,'lily'),
     (3,'moon')
      insert into B values 
     (1,23,1),
     (2,34,2),
     (3,34,4)
     ------INNER JOIN
    select a.*,b.* from a INNER JOIN  b on a.id=b.parent_id
    /*
    id          name                 id          job         parent_id
    ----------- -------------------- ----------- ----------- -----------
    1           sam                  1           23          1
    2           lily                 2           34          2
    */
    ------LEFT OUTER JOIN
     select a.*,b.* from a LEFT OUTER JOIN  b on a.id=b.parent_id
     /*
    id          name                 id          job         parent_id
    ----------- -------------------- ----------- ----------- -----------
    1           sam                  1           23          1
    2           lily                 2           34          2
    3           moon                 NULL        NULL        NULL
     */
     ------RIGHT OUTER JOIN
     select a.*,b.* from a RIGHT OUTER JOIN  b on a.id=b.parent_id
     /*
    id          name                 id          job         parent_id
    ----------- -------------------- ----------- ----------- -----------
    1           sam                  1           23          1
    2           lily                 2           34          2
    NULL        NULL                 3           34          4
     */
      ------FULL OUTER JOIN
     select a.*,b.* from a FULL OUTER JOIN  b on a.id=b.parent_id
     /*
    id          name                 id          job         parent_id
    ----------- -------------------- ----------- ----------- -----------
    1           sam                  1           23          1
    2           lily                 2           34          2
    3           moon                 NULL        NULL        NULL
    NULL        NULL                 3           34          4
     */
    
    
    

    If you are worried about performance , please try to use index. 

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, October 21, 2019 6:07 AM
  • Hi!

    SSIS Fuzzy Lookup transformation is your friend here. You can find the basic documentation about it here: https://docs.microsoft.com/en-us/sql/integration-services/data-flow/transformations/fuzzy-lookup-transformation?view=sql-server-ver15.

    You will quickly find many examples and tutorials how to use it on the Web.

    Best regards,
    Dejan Sarka

    Monday, October 21, 2019 7:41 AM