none
Regexp using JavaScript RRS feed

  • Question

  • Hi All!

    I need Regexp, and found several variants of solution around the same excellent Web.Page & JavaScript idea - Biccauntant, Hugoberry. I had to adopt them (the resulting code is below) due to JavaScript limits.

    The main problem is that JavaScript has it's own tuff limit on the String variables - it is not possible to use "\" in it.

    My variant tackles the problem, BUT the main question is - I suspect that the "\" problem is not the only one.

    So my question is - does anybody see some other problems with PQ & JavaScript "co-operation" while using the Regexp?

    By the way, the performance of the solution is quite good for hundreds of rows at least.

    I understand that it is not possible to use it in PBI service (for my task it's ok).

    I have some very little hope for updating of local source from the PBI Service, but cannot try that right now.

    = (text as nullable text, pattern as nullable text) as logical =>
        let 
            l = List.Transform({text, pattern}, each Text.Replace(_, "\", "\\")), 
            t = Text.Format("<script>document.write(new RegExp('#{1}').test('#{0}'))</script>", l),
            w = Web.Page(t), 
            d = w[Data]?{0}?[Children]?{0}?[Children]?{1}?[Text]?{0}?, 
            result = text <> null and (pattern = null or (if d <> null then Logical.FromText(d) else error "Regular expression or text are not supported by JavaScript."))
        in
            result
    Sunday, August 18, 2019 4:03 PM

All replies

  • I don't think this type of scenario is supported, so I'd be surprised if anyone from the Data Integrations team provide any type of guidance on how to use javascript with M.

    Out of curiosity, why not try reaching your desired solution with just pure M code?

    Sunday, August 18, 2019 5:51 PM
  • Hi Miguel!

    There is a simple reason for such a tricky solution - there is no native support for Regexp in PQ at all. If somebody would implement Regexp in M code, I would be more that happy to use it.

    JS at the same time has full support of Regexp - that's a lot of development, I'm sure.

    And possibly that among all the PQ developers there are guys who has experience in JS, whether they are from DIT or not :-).


    Sunday, August 18, 2019 6:35 PM
  • While there is no support for regular expressions in Power Query, you can usually create your own Custom M function to mimic a regex or reach your desired solution.

    Perhaps you could tell us a bit more about what you're trying to accomplish with some sample input & output data and we could help you craft something within the M language scope to reach your desired solution.

    If you absolutely want to go the Regex route and don't want to invest your time creating your own custom function, then your best approach would be to use the R integration within Power BI Desktop or even the Python integration that probably has a library for it as well. Alternatively you could have your own web service that you send your text string to and have the regex happen at that web service level and you only get the response from it.

    I wish that I could help you with those, but I am no expert in either one of those and this forum is solely dedicated to questions about Power Query and the M language.

    Here's another thread within this same forum that is closely related to yours:

    https://social.technet.microsoft.com/Forums/en-US/3dd6a0de-a242-4bcb-a9ef-4ca737d2cbcb/regular-expressions-in-python-powerbi?forum=powerquery

    As my personal opinion and advice, don't rule out Power Query's M functions completely just because there's no regular expressions. You'd be surprised at what you can do with some Custom M code and it's usually a better idea to stay within the realm of M functions because they'll be able to work in all integrations of the tool.

    I believe that one of the caveats of using the Web.Page function with your own script is that it simply won't refresh in the service and probably wouldn't pass the security checks for a dataflow either:

    https://www.thebiccountant.com/2018/04/25/regex-in-power-bi-and-power-query-in-excel-with-java-script/

    Sunday, August 18, 2019 9:02 PM
  • Miguel, I wrote "I understand that it is not possible to use it in PBI service (for my task it's ok)." in my post.

    As for R, taking into account that performance of the solution above is very good, there is no reason to use R here. For R the whole table will be flashed on disc, then R will be launched, then .csv will be read by R, then transformed, then flashed back on disc, and then read by PQ, then replace NA to nulls or whatever, and then you have to restore table typing.

    And if you have columns of not-scalar types, R is just no-go solution (at least it will be very complex).

    There is no reason to use such monster instead of the function.

    The only reason (as Imke mentioned) to use R is if you publich the dataset to Power BI service, because I didn't found the way how to force PBI Service to perform the function via Personal Gateway, and for R it happens without efforts.

    So believe me, the idea to use Web function is just amazing (it is not my idea, but Huang Caiguang), and if you really need Regexp throughout your solution, this idea is your savior.

    Sunday, August 18, 2019 9:27 PM
  • Andrey, my apologies.

    My intention by mentioning the R and Python approaches was to provide supported alternatives that might scale.

    And what I meant about the caveats that it might not work on the Power BI service or dataflows is that it sets a record that it might not also work on other deployments such as custom connectors, SSAS, Report Builder (when it gets integrated), Flow, PowerApps or others, but I guess that you're only doing this for a local work and the scope is only limited so that it works locally.

    Again, if you'd like us to tackle this with pure M code, you can count on me.

    I'll step down and let other members of the community chime in that might be able to assist you in your case.

    Wish you the best!

    Sunday, August 18, 2019 9:44 PM
  • Andrey:

    I've rejected both the JavaScript and R options, for the following reasons:

    1) Neither option is universal

    2) For all the cons you mention about using R

    3) I created a JavaScript regex custom function, and compared a regex pattern with a custom function I created to do the same job. The regex version was so slow in comparison to the custom function I created that I immediately rejected the regex solution as usable for any work. I just use the regex function to practice writing regex patterns :)

    I've developed various custom functions for solving some common regex patterns, but I can't solve everything.

    I find it totally baffling that regex is not built into PQ. What's incredible is that regex is built into so many BI applications, and in every programming language. It is a feature that should of been there since day 0, and an option in every M function where you have to match a substring. From the standpoint of validating data, this omission is one of the biggest drawbacks of a product that should be able to do that sort of stuff out of the box. Then again, lack of regex is just one of several basic features that are unsupported out of the box, and that will probably never be implemented (for lack of dedicated resources, interest, etc.) Were it not for the ability to create custom functions, PQ would be an option for less than 50% of the work I do. 

    Despite the above bout of negativity, I still love M to death. :)

    Sunday, August 25, 2019 3:16 PM
  • Hi Colin!

    Thanks for your reply.

    Yes, I totally agree that M is too wonderful to be so unfeatured. That is why I'm so complainful here - if I would indifferent to M I wouldn't spend time to write in non-native language here :-).

    And regarding the JS option - did you try to pass the whole list of text to JS? Was it very slow as well? I don't know JS, so I cannot create JS script for that.

    From my experience, the variant I placed in the post takes 0.077 sec per Regex check, i.e. per row. Just in case - buffering doesn't provide any help here.

    This is useful if you need to check some dictionaries, but of course not for any "transactional" data.

    And a thought that I have to go to R from M is painful 4 me :-).

    Monday, September 2, 2019 12:27 PM