none
Remove Html Tags in text full of them RRS feed

  • Question

  • Hello Folks

    I'm looking for removing all the HTML tags on a field coming from Project Server.

    I guess that all the people who have to deal with reporting project server text field have the same issue.

    I tried the following code but is only removing the first occurrence of <*> (coming from https://social.technet.microsoft.com/Forums/en-US/7ec64d6d-c3fc-4110-94c7-2e0087171475/how-to-remove-html-tags-from-a-column?forum=powerquery) ).

    let
         fnRemoveHTML_ = (TheNameOfYourColumn) =>
             let
                 text = Text.From(TheNameOfYourColumn),
                 length = Text.Length(text),
                 position = Text.PositionOf(text, "<"),
                 positionEnd = Text.PositionOf(text, ">"),
                 range = positionEnd-position+1,
                 new = if position >= 0 then Text.ReplaceRange(text, position, range, "") else TheNameOfYourColumn
             in
                 new,

        Source = OData.Feed("https://xxx.sharepoint.com/sites/pwa/_api/ProjectData/Projects?$select=ProjectName,ProjectStartDate,ProjectFinishDate,Justification"),
        Last = Table.AddColumn(Source, "CleanJustif", each fnRemoveHTML_([Justification])),
        ChangedType = Table.TransformColumnTypes(Last,{{"CleanJustif", type text}}),
        #"Removed Other Columns" = Table.SelectColumns(ChangedType,{"Justification", "CleanJustif"})
      in
          #"Removed Other Columns"

    any help to remove all of them.

    additionnaly, there are some other characters like &#160, &amp, etc..  that I would like to get rid of as well.

    Thanks


    Jérome Charlon - Senior Project Server Architect - MCP-MCTS


    Thursday, October 8, 2015 10:20 PM

Answers

All replies