none
Fast words count function RRS feed

  • Question

  • Hello all:)

    I'm asking power query gurus to provide example of Not-so-performance-heavy function, to count the number of words in text string. 

    Now I do several transformations like:

    1. Add column with Text.Length of original text
    2. Add column with modified text - original text with replaced spaces
    3. Add column with Text.Length of modified text
    4. Subtract from Text Length of original text Text Length of modified text

    And for convenience I need a productive function which will take some text string and return the number of words in it. 

    I suppose it can be done using List.Generate function, but I don't know how exactly. 

    Could anybody help me? :)


    Sunday, May 10, 2015 6:50 PM

Answers

  • How about:

    = List.Count(Text.Split(Source, " "))

    If you end up with additional characters you want to split on, you can call SplitAny, which takes a list of characters to split on. For example:

    = List.Count(Text.SplitAny(Source, " #(tab)#(lf)"))

    Ehren


    Monday, May 11, 2015 5:49 PM
    Owner
  • Hi Maxim :-)

    if you have double, triple or more spaces, TAB, LF between words, you should use one more function

    List.Count(List.RemoveItems(Text.SplitAny([SomeText]," #(tab)#(lf)"),{""}))

    • Marked as answer by Maxim Uvarov Thursday, May 14, 2015 12:51 AM
    Wednesday, May 13, 2015 9:07 AM

All replies

  • I love to make it more complicated:)

     = 1 + Text.Length([SomeText]) - Text.Length(Text.Remove([SomeText]," ")) -

    it works well

    Sunday, May 10, 2015 7:16 PM
  • How about:

    = List.Count(Text.Split(Source, " "))

    If you end up with additional characters you want to split on, you can call SplitAny, which takes a list of characters to split on. For example:

    = List.Count(Text.SplitAny(Source, " #(tab)#(lf)"))

    Ehren


    Monday, May 11, 2015 5:49 PM
    Owner
  • How about:

    = List.Count(Text.Split(Source, " "))

    If you end up with additional characters you want to split on, you can call SplitAny, which takes a list of characters to split on. For example:

    = List.Count(Text.SplitAny(Source, " #(tab)#(lf)"))

    Ehren


    Thank you! Even faster!
    Monday, May 11, 2015 11:40 PM
  • Hi Maxim :-)

    if you have double, triple or more spaces, TAB, LF between words, you should use one more function

    List.Count(List.RemoveItems(Text.SplitAny([SomeText]," #(tab)#(lf)"),{""}))

    • Marked as answer by Maxim Uvarov Thursday, May 14, 2015 12:51 AM
    Wednesday, May 13, 2015 9:07 AM
  • Thanks a lot! It does make sense in my case!
    Thursday, May 14, 2015 12:52 AM