none
Compare Text from 2 Columns on Row Level and Find Missing Words/Characters RRS feed

  • Question

  • Hi guys,

    I have 2 columns:

    buy bananas today | buy today

    buy apples today | buy today

    I want to create 3rd column telling me:

    bananas
    apples

    => I need to find words/characters which are not in the 2nd column. 

    Any ideas how to achieve that?

    DZ

    Wednesday, March 21, 2018 1:09 PM

Answers

  • and almost the same but with my algorythm:

    let
        ListOfWords = {"am","an","is","and","as","or","ananas"},
        Source = #table(
                1, 
                {
                    {"My name is John Doe and I am an astronaut alein"},
                    {"Buy bananas today"}
                }
            ),
        SplitByList = Table.AddColumn(
                Source, 
                "Custom", 
                each Splitter.SplitTextByAnyDelimiter(
                    List.Sort(ListOfWords, {Text.Length, Order.Descending}))([Column1])
                ),
        BackwardSplit = Table.AddColumn(
            SplitByList, 
            "Words found", 
            each 
                List.Distinct(
                    List.RemoveMatchingItems(
                        Splitter.SplitTextByEachDelimiter([Custom])([Column1]), 
                        {"", null}
                    )
                )
            ),
        AddedPositions = Table.AddColumn(
                BackwardSplit, 
                "Positions", 
                each 
                    List.PositionOfAny(
                        ListOfWords, 
                        List.RemoveMatchingItems(
                            [Words found],
                            {"",null}
                        ), 
                        Occurrence.All
                    )
            ),
        ExtractedWords = Table.TransformColumns(
                AddedPositions, 
                {
                    "Words found", 
                    each 
                        Text.Combine(
                            List.Transform(_, Text.From), 
                            ","
                        ), 
                    type text
                }
            ),
        ExtractedPositions = Table.TransformColumns(
                ExtractedWords, 
                {
                    "Positions", 
                    each Text.Combine(
                        List.Transform(_, Text.From), 
                        ","
                        ), 
                    type text
                }
            ),
        RemovedColumns = Table.RemoveColumns(ExtractedPositions,{"Custom"})
    in
        RemovedColumns


    Maxim Zelensky Excel Inside

    Friday, March 23, 2018 3:09 PM

All replies

  • Hi Dan

    Try this formuls for the custom column:

    =Text.Combine(List.RemoveMatchingItems(Text.Split([Column1], " "), Text.Split([Column2], " "))," ")

    For example:

    let
        Source = #table(2, {{"buy bananas today","today bananas"},{"buy apples today","buy apples"}}),
        #"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Combine(List.RemoveMatchingItems(Text.Split([Column1], " "), Text.Split([Column2], " "))," "))
    in
        #"Added Custom"

    This is for words, separated by spaces.

    For characters - do you mean that if two strings are "buy bananas today" | "buy today" the result should be "bnns", because character "a" is in 2nd column?


    Maxim Zelensky Excel Inside


    Wednesday, March 21, 2018 3:36 PM
  • Hi Maxim, 

    What I meant is that it should not matter whether the missing string is word or just a part of word.... The List function would be too easy for unfortunately...

    Updated example:

    "buy bananas today" | "buy today" => "bananas"

    "buy bananas today" | "buy b today" =>"ananas"

    Basically, we need something like Text.RemoveMatchingItems :)

    DZ



    • Edited by Dan Zrust Wednesday, March 21, 2018 4:30 PM
    Wednesday, March 21, 2018 4:21 PM
  • Dan, could you please add details to the logic behind this transformation?

    "buy bananas today" | "buy b today" =>"ananas"

    If we remove whole matching words from both strings, then we'll get this:

    "bananas" | "b"

    then we can perform some chars removing, of course, for example,the 1st occurrence of it. Or only on the 1st place?

    But what should be the result of the two strings below?

    "buy bananas today" | "buy a today"

    I suppose that it could be "bnns" or "bnanas". Is it a desired result?



    Maxim Zelensky Excel Inside


    Wednesday, March 21, 2018 4:56 PM
  • Hmm, I see there are lots of scenarios. So I will I describe my scenario in detail.

    I have a list of text strings in 1 column. 

    In the second column, I am doing
    Text.Combine(
    Splitter.SplitTextByAnyDelimiter(list_of_500_strings)([Column1]),
    " ")

    I am using this Splitter function while trying to create something like "Text.ContainsAnyOf" which does not exist.

    If the text in the new Column2 is shorter than in the first col, I know that the a string from my list of 500 items is contained in the Column1.

    On the top of all this, I want to know which string from my list of 500 strings was found in Column1...

    So...the missing characters in the new Column2 are always "neighbors". And these missing characters can be at the beginning, in the middle, at the end or part of another word...

    Does it make sense?

    DZ



    Thursday, March 22, 2018 9:41 AM
  • Hi Daniel

    If I understood your goal correctly, it is:

    On the top of all this, I want to know which string from my list of 500 strings was found in Column1...

    Then (didn't tested) may be this is a solution:

    let
        Source = #table(2, {{"My name is John Doe and I am an astronaut", "am,an,is,and,as,or"}}),
        SplitByList = Table.AddColumn(
                Source, 
                "Custom", 
                each Splitter.SplitTextByAnyDelimiter(
                    Text.Split([Column2],",")
                    )([Column1])
                ),
        BackwardSplit = Table.AddColumn(
            SplitByList, 
            "Custom1", 
            each List.RemoveMatchingItems(
                List.Distinct(
                    Splitter.SplitTextByEachDelimiter([Custom])([Column1])
                    ),
                {"",null})
            ),
        List2Text = Table.AddColumn(BackwardSplit, "Custom2", each Text.Combine([Custom1],","))
    in
        List2Text


    Maxim Zelensky Excel Inside

    Thursday, March 22, 2018 2:23 PM
  • You are genius. Thank you. :)

    I now need to get my around it but it does what I need.

    DZ

    Thursday, March 22, 2018 8:12 PM
  • Hi Maxim :-)

    Could you tell me why "and" doesn't exist in the result column?. If i understood correctly it should be there. If so, i would like to propose a slightly different approach.

    let
    
        ListOfWords = {"am","an","is","and","as","or","ananas"},
        Source = #table(1, {
                            {"My name is John Doe and I am an astronaut alein"},
                            {"Buy bananas today"}
                           }
                       ),
        #"Added Custom" = Table.AddColumn(
                                          Source,
                                         "Words found",
                                         (x) => 
                                                Text.Combine(
                                                             List.Transform(
                                                                            ListOfWords,
                                                                            each if Text.Contains(x[Column1], _)
                                                                                 then _ 
                                                                                 else null
                                                                           ),
                                                             ","
                                                            )
                                         )
    in
        #"Added Custom"

    You can even get info how many times the item from the list appeared in the text, if you replace "then _" to  

    then _ & "(" & Text.From(List.Count(Text.PositionOf(x[Column1], _, 2))) & ")"
    Cheers ;-)

    Thursday, March 22, 2018 11:39 PM
  • One more question...

    If I wanted to return index of the found item from my list, how would I do it? 

    DZ

    Friday, March 23, 2018 12:49 PM
  • Hi Bill

    Could you tell me why "and" doesn't exist in the result column?

    This is because "an" is before "and" in the list of delimiters. I forgot about this pitfall. So, to avoid this error, we can sort the list of delimiters by descending length, for example

    List.Sort(ListOfWords, {Text.Length, Order.Descending})

    Then the code will find the "and" first, then it will seek for the "an".

    It depends on what exactly Daniel want to find in the string "Buy bananas today" with the list of delimiters {"an", "as", "ananas"} :

    • "an"
    • "as"
    • "ananas"
    • all of them

    You code is great, as always (I made almost the same about a year ago and totally forgot it ^_^ ), but I do not know, which one is more suitable for Daniel


    Maxim Zelensky Excel Inside


    Friday, March 23, 2018 2:28 PM
  • Maxim's code is good enough for me... :)

    DZ


    Friday, March 23, 2018 2:40 PM
  • Hi Daniel

    Based on the Bill's code:

    let
        ListOfWords = {"am","an","is","and","as","or","ananas"},
        Source = #table(
                1, 
                {
                    {"My name is John Doe and I am an astronaut alein"},
                    {"Buy bananas today"}
                }
            ),
        AddedWords = Table.AddColumn(
                Source,
                "Words found",
                (x) => 
                    List.Transform(
                        ListOfWords,
                        each if Text.Contains(x[Column1], _)
                            then _ 
                            else null
                    )
            ),
        AddedPositions = Table.AddColumn(
                AddedWords, 
                "Positions", 
                each List.Select(
                    List.Positions(ListOfWords), 
                    (p)=>[Words found]{p}<>null
                    )
            ),
        ExtractedWords = Table.TransformColumns(
                AddedPositions, 
                {
                    "Words found", 
                    each Text.Combine(
                        List.Transform(_, Text.From), 
                        ","), 
                    type text
                }
            ),
        ExtractedPositions = Table.TransformColumns(
                ExtractedWords, 
                {
                    "Positions", 
                    each Text.Combine(
                        List.Transform(_, Text.From), 
                        ","
                        ), 
                    type text
                }
            )
    in
        ExtractedPositions


    Maxim Zelensky Excel Inside

    Friday, March 23, 2018 2:48 PM
  • and almost the same but with my algorythm:

    let
        ListOfWords = {"am","an","is","and","as","or","ananas"},
        Source = #table(
                1, 
                {
                    {"My name is John Doe and I am an astronaut alein"},
                    {"Buy bananas today"}
                }
            ),
        SplitByList = Table.AddColumn(
                Source, 
                "Custom", 
                each Splitter.SplitTextByAnyDelimiter(
                    List.Sort(ListOfWords, {Text.Length, Order.Descending}))([Column1])
                ),
        BackwardSplit = Table.AddColumn(
            SplitByList, 
            "Words found", 
            each 
                List.Distinct(
                    List.RemoveMatchingItems(
                        Splitter.SplitTextByEachDelimiter([Custom])([Column1]), 
                        {"", null}
                    )
                )
            ),
        AddedPositions = Table.AddColumn(
                BackwardSplit, 
                "Positions", 
                each 
                    List.PositionOfAny(
                        ListOfWords, 
                        List.RemoveMatchingItems(
                            [Words found],
                            {"",null}
                        ), 
                        Occurrence.All
                    )
            ),
        ExtractedWords = Table.TransformColumns(
                AddedPositions, 
                {
                    "Words found", 
                    each 
                        Text.Combine(
                            List.Transform(_, Text.From), 
                            ","
                        ), 
                    type text
                }
            ),
        ExtractedPositions = Table.TransformColumns(
                ExtractedWords, 
                {
                    "Positions", 
                    each Text.Combine(
                        List.Transform(_, Text.From), 
                        ","
                        ), 
                    type text
                }
            ),
        RemovedColumns = Table.RemoveColumns(ExtractedPositions,{"Custom"})
    in
        RemovedColumns


    Maxim Zelensky Excel Inside

    Friday, March 23, 2018 3:09 PM