none
Removing HTML Tags when source is SharePoint List RRS feed

  • Question

  • I'm using Power Query to pull in the data from a SharePoint list. In a few columns, because of how the people who submit the items, it's pulling into the query with a load of HTML tags. Here's an example (I've changed the words in it, but the tags are a direct copy): 

    <div>This is the text I want.</div><div>&#160;</div><div>HOWEVER THE TAGS SHOWN ARE ALSO COMING THROUGH
    FOR<br>MOST OF THE COLUMS<br>Data I want.<br>Data I want;<br>Data I want.<br>Data I want.<br>Data I want.<br>data I want.<br>Data I want.<br>THANK YOU,<br>Name<br>******************************************************************************</div>

    How do I remove those tags? I've tried extracting the data between delimiters, but that's only worked when there is only one set of tags. I looked at other posts with similar issues, but those solutions didn't appear to work for me; closest I got was this one: https://social.technet.microsoft.com/Forums/en-US/7ec64d6d-c3fc-4110-94c7-2e0087171475/how-to-remove-html-tags-from-a-column?forum=powerquery. But it either won't work for my needs, or I'm using it wrong. Any ideas? 

    Friday, October 6, 2017 5:23 PM

Answers

  • This function might work for you:

    (HTML as text) =>
    let
        Source = Text.From(HTML),
        SplitAny = Text.SplitAny(Source,"<>"),
        ListAlternate = List.Alternate(SplitAny,1,1,1),
        ListSelect = List.Select(ListAlternate, each _<>""),
        TextCombine = Text.Combine(ListSelect, "")
    in
        TextCombine


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    • Marked as answer by RobVanWinkle Monday, October 9, 2017 2:25 PM
    Saturday, October 7, 2017 12:26 PM
    Moderator

All replies

  • This function might work for you:

    (HTML as text) =>
    let
        Source = Text.From(HTML),
        SplitAny = Text.SplitAny(Source,"<>"),
        ListAlternate = List.Alternate(SplitAny,1,1,1),
        ListSelect = List.Select(ListAlternate, each _<>""),
        TextCombine = Text.Combine(ListSelect, "")
    in
        TextCombine


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    • Marked as answer by RobVanWinkle Monday, October 9, 2017 2:25 PM
    Saturday, October 7, 2017 12:26 PM
    Moderator
  • That worked perfectly! Thank you for your help. 

    Monday, October 9, 2017 2:26 PM
  • Hi,

    I have a silly question! Where do you add your function in Power BI to make it work? Do I need to add it to the formula bar at the top?

    Thanks for your help!

    Raj

    Thursday, June 14, 2018 10:36 PM
  • Very often when trying to clean a column you may have the case the column contains null

    With the following small fix you will be able to handle this case

    let
       TextFromHtml = (HTML as any) =>
    let
        Source = if HTML = null then
            ""
        else
            Text.From(HTML),
        SplitAny = Text.SplitAny(Source,"<>"),
        ListAlternate = List.Alternate(SplitAny,1,1,1),
        ListSelect = List.Select(ListAlternate, each _<>""),
        TextCombine = Text.Combine(ListSelect, "")
    in
        TextCombine
    in
        TextFromHtml

    For beginners: create a blank query an copy paste the code above; rename the query as "TextFromHtml"

    Example of calling this function on a column called Comment:

    = Table.TransformColumns(#"Previous Step",{{"Comment", TextFromHtml, type text}})

    Thank you very much for your function, it's really a good one

    Tuesday, July 24, 2018 9:35 AM
  • It is presumptuous for any function to arbitrarily convert a null (which represents the absence of a value) with an empty string (which represents the presence of a value, i.e. an empty string).

    If the table is loaded to the data model or to Excel, a null is converted to a blank value, whereas an empty string is not blank. The choice will affect some calculations.

    If the user really wants to convert a blank to an empty string, then the decision should be made outside of the function.

    Edit:

    Forgot to mention - the function would be better by using:
    Source = if HTML = null then null else ...

    Tuesday, July 24, 2018 3:14 PM
  • Very often when trying to clean a column you may have the case the column contains null

    With the following small fix you will be able to handle this case

    let
       TextFromHtml = (HTML as any) =>
    let
        Source = if HTML = null then
            ""
        else
            Text.From(HTML),
        SplitAny = Text.SplitAny(Source,"<>"),
        ListAlternate = List.Alternate(SplitAny,1,1,1),
        ListSelect = List.Select(ListAlternate, each _<>""),
        TextCombine = Text.Combine(ListSelect, "")
    in
        TextCombine
    in
        TextFromHtml

    For beginners: create a blank query an copy paste the code above; rename the query as "TextFromHtml"

    Example of calling this function on a column called Comment:

    = Table.TransformColumns(#"Previous Step",{{"Comment", TextFromHtml, type text}})

    Thank you very much for your function, it's really a good one

    Thanks You so much..

    I´m a beginner.. ;)

    Friday, September 14, 2018 4:25 PM
  • Note also that a SharePoint list has a meta-data column called "FieldValuesAsText".  If you expand this column, I believe you'll find that SharePoint has done the work for you :-)

    Dale

    Friday, September 14, 2018 7:59 PM
  • Worked great for me, thank you. Any way to remove / replace HTML encoding? For example, &aacute; into á

    Thanks again!

     

    dp

    Monday, January 7, 2019 6:37 PM
  • You mean multiple replacements from a table for example? 

    Then check out this blogpost: https://www.thebiccountant.com/2016/05/22/multiple-replacements-in-power-bi-and-power-query/


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Monday, January 7, 2019 6:41 PM
    Moderator
  • How do you implement this function in power query, please help.

    Microsoft SharePoint community.

    Thursday, June 13, 2019 1:47 PM
  • Super beginner here. Where am I to call the function?
    = Table.TransformColumns(#"Previous Step",{{"Comment", TextFromHtml, type text}})
    Sunday, August 18, 2019 1:39 AM