none
PowerQuery split collumn by #(cr)#(lf) problems RRS feed

  • Question

  • Hi,

    I am trying to query an online web page with entries looking like this:

    The Orange Box

    Valve Software/EA Games, 2007

    Pasting the above text in NP++ gives me this:

    The Orange Box CR LF

    • Valve Software/EA Games, 2007

    In PowerQuery I then try to split collumns by custom delimiter #(cr)#(lf)

    I have also tried only #(cr) and only #(lf), but the outcome unfortunately, is the same - all text in 1 collumn:

    The Orange BoxValve Software/EA Games, 2007

    I have browsed through various tutorials regarding this, and understand that it is/has been at least, a common problem in PowerQuery, to split corrrectly by these special delimiters. there were also some suggestions that I have tried without any luck.

    I hope it makes sense, what I am trying to accomplish. I tried to paste screen snips of each step, but I get an error that I can't use images when my account is not verified (which I believe it is). Anyway, let me know if you want the snips.

    I would appreciate any help on this, but please bear in mind, that I am a beginner at M.

    Thanks.

    Rasmus


    • Edited by Ramoten Sunday, May 29, 2016 8:10 AM spelling
    Sunday, May 29, 2016 8:02 AM

Answers

  • Thanks, looks like we have a bug :/ Thanks for reporting it to us! We'll fix it right away and it'll be available to you publicly in a few releases later.

    • Marked as answer by Ramoten Wednesday, June 1, 2016 8:33 AM
    Tuesday, May 31, 2016 7:27 PM
    Moderator
  • Here's a workaround for you :) I did a similar workaround for the last column as well in case you need that one too:

    let
        Source = Binary.Buffer(Web.Contents("http://www.gamerankings.com/browse.html")),
        #"Imported Text" = Text.Replace(Text.Replace(Text.FromBinary(Source,1252), "<br />", "#"), "<br clear=""left"" />", "#"),
        Custom1 = Web.Page(#"Imported Text"),
        Data = Custom1{0}[Data],
        #"Changed Type" = Table.TransformColumnTypes(Data,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}),
        #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type","Column3",Splitter.SplitTextByDelimiter("# ", QuoteStyle.Csv),{"Column3.1", "Column3.2"}),
        #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column3.1", type text}, {"Column3.2", type text}}),
        #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1"}),
        #"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns","Column4",Splitter.SplitTextByDelimiter("#", QuoteStyle.Csv),{"Column4.1", "Column4.2"}),
        #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column4.1", type text}, {"Column4.2", type text}})
    in
        #"Changed Type2"



    Tuesday, May 31, 2016 10:13 PM
    Moderator

All replies

  • Not quite sure what you want to accomplish here:

    1) Keep the items in one cell but show them with a line-feed in the Excel-output or

    2) Actually separate the cells where this linefeed has been

    If you're after 1) - have a look here: http://www.thebiccountant.com/2016/04/16/how-to-display-power-query-results-with-line-feed/


    Imke Feldmann TheBIccountant.com

    Monday, May 30, 2016 7:43 PM
    Moderator
  • Sorry for not making it clear. I want to separate into 2 collumns:

    1: The Orange Box

    2: Valve Software/EA Games, 2007

    Thanks.

    Monday, May 30, 2016 7:46 PM
  • Did you put "" around them: "#(cr)" or "#(lf)" ?

    Imke Feldmann TheBIccountant.com

    Monday, May 30, 2016 7:54 PM
    Moderator
  • Thanks. I tried both with and without "", but no luck.
    • Edited by Ramoten Tuesday, May 31, 2016 11:54 AM none
    Tuesday, May 31, 2016 11:54 AM

  • Did you try this?

    Tuesday, May 31, 2016 6:43 PM
    Moderator
  • Yes - but for some reason it does not split the collumn.

    Here is the page I am querying:

    http://www.gamerankings.com/browse.html

    So now it should be easier to replicate what I am trying to do.

    Thanks again.

    Tuesday, May 31, 2016 6:54 PM
  • Thanks, looks like we have a bug :/ Thanks for reporting it to us! We'll fix it right away and it'll be available to you publicly in a few releases later.

    • Marked as answer by Ramoten Wednesday, June 1, 2016 8:33 AM
    Tuesday, May 31, 2016 7:27 PM
    Moderator
  • Thats great to hear, I'll be looking forward to it :-)
    Tuesday, May 31, 2016 7:57 PM
  • Here's a workaround for you :) I did a similar workaround for the last column as well in case you need that one too:

    let
        Source = Binary.Buffer(Web.Contents("http://www.gamerankings.com/browse.html")),
        #"Imported Text" = Text.Replace(Text.Replace(Text.FromBinary(Source,1252), "<br />", "#"), "<br clear=""left"" />", "#"),
        Custom1 = Web.Page(#"Imported Text"),
        Data = Custom1{0}[Data],
        #"Changed Type" = Table.TransformColumnTypes(Data,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}),
        #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type","Column3",Splitter.SplitTextByDelimiter("# ", QuoteStyle.Csv),{"Column3.1", "Column3.2"}),
        #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column3.1", type text}, {"Column3.2", type text}}),
        #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1"}),
        #"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns","Column4",Splitter.SplitTextByDelimiter("#", QuoteStyle.Csv),{"Column4.1", "Column4.2"}),
        #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column4.1", type text}, {"Column4.2", type text}})
    in
        #"Changed Type2"



    Tuesday, May 31, 2016 10:13 PM
    Moderator
  • Nice one, thanks. I might want to wait for the fix though, since it's mostly a hobby-project of mine, so I am not in a hurry :-) furthermore, it is iterating over 100s of sub pages, so not sure how the above will work in that context - again I am very new to M.
    Wednesday, June 1, 2016 8:33 AM
  • You can turn this to a function where you pass the url as a parameter =>

    let
        ReadPage = (url as text) as table =>
        let
            Source = Binary.Buffer(Web.Contents(url)),
            #"Imported Text" = Text.Replace(Text.Replace(Text.FromBinary(Source,1252), "<br />", "#"), "<br clear=""left"" />", "#"),
            Custom1 = Web.Page(#"Imported Text"),
            Data = Custom1{0}[Data],
            #"Changed Type" = Table.TransformColumnTypes(Data,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}),
            #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type","Column3",Splitter.SplitTextByDelimiter("# ", QuoteStyle.Csv),{"Column3.1", "Column3.2"}),
            #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column3.1", type text}, {"Column3.2", type text}}),
            #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1"}),
            #"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns","Column4",Splitter.SplitTextByDelimiter("#", QuoteStyle.Csv),{"Column4.1", "Column4.2"}),
            #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column4.1", type text}, {"Column4.2", type text}})
        in
            #"Changed Type2"
    in
        ReadPage

    And then from other queries you can just invoke this (like, ReadPage("http://example")), it would make it easy to run it for multiple urls

    Wednesday, June 1, 2016 6:01 PM
    Moderator
  • You can turn this to a function where you pass the url as a parameter =>

    let
        ReadPage = (url as text) as table =>
        let
            Source = Binary.Buffer(Web.Contents(url)),
            #"Imported Text" = Text.Replace(Text.Replace(Text.FromBinary(Source,1252), "<br />", "#"), "<br clear=""left"" />", "#"),
            Custom1 = Web.Page(#"Imported Text"),
            Data = Custom1{0}[Data],
            #"Changed Type" = Table.TransformColumnTypes(Data,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}),
            #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type","Column3",Splitter.SplitTextByDelimiter("# ", QuoteStyle.Csv),{"Column3.1", "Column3.2"}),
            #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column3.1", type text}, {"Column3.2", type text}}),
            #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1"}),
            #"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns","Column4",Splitter.SplitTextByDelimiter("#", QuoteStyle.Csv),{"Column4.1", "Column4.2"}),
            #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column4.1", type text}, {"Column4.2", type text}})
        in
            #"Changed Type2"
    in
        ReadPage

    And then from other queries you can just invoke this (like, ReadPage("http://example")), it would make it easy to run it for multiple urls

    Hi Oguz,

    I have just tested this in the June 2016 release, and it now works witht the simple #cr, #lr delimter! Thanks alot for bringing this through so fast, the agility of the Power BI team is amazing..

    The May release however, raised another big problem for me:

    I have set up 2 parameterized queries in Power BI, both using functions to iterate over multiple urls to return all the pages. This has worked fine on both queries till The May 2016 update of Power BI. They give me the following errors respectively:

    1. "OLE DB or ODBC Error: [Expression.Error] The column 'Column 1' of the table wasn't found"
    2. "OLE DB or ODBC Error: [Expression.Error] The column 'Pos' of the table wasn't found"

    let me proceed with the later, for the rest of this example.

    The code for the function looks like this:

    let
        VGChartz = (page as number) as table =>
    let
        Kilde = Web.Page(Web.Contents("http://www.vgchartz.com/gamedb/?page=" & Number.ToText(page) & "&results=1000&name=&platform=&minSales=0&publisher=&genre=&sort=GL")),
        Data1 = Kilde{1}[Data],
        #"Ændret type" = Table.TransformColumnTypes(Data1,{{"Pos", Int64.Type}, {"Game", type text}, {"Platform", type text}, {"Year", type text}, {"Genre", type text}, {"Publisher", type text}, {"North America", Int64.Type}, {"Europe", Int64.Type}, {"Japan", Int64.Type}, {"Rest of World", Int64.Type}, {"Global", Int64.Type}})
    in
        #"Ændret type"
    in
        VGChartz

    The code for the main query looks like this:

    let
        Kilde = {1..10000},
        #"Konverteret til tabel" = Table.FromList(Kilde, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
        #"Omdøbte kolonner" = Table.RenameColumns(#"Konverteret til tabel",{{"Column1", "Page"}}),
        #"Tilføjet brugerdefineret" = Table.AddColumn(#"Omdøbte kolonner", "VZ", each #"GetData - VGChartz"([Page])),
        #"Udvidet VZ" = Table.ExpandTableColumn(#"Tilføjet brugerdefineret", "VZ", {"Pos", "Game", "Platform", "Year", "Genre", "Publisher", "North America", "Europe", "Japan", "Rest of World", "Global"}, {"VZ.Pos", "VZ.Game", "VZ.Platform", "VZ.Year", "VZ.Genre", "VZ.Publisher", "VZ.North America", "VZ.Europe", "VZ.Japan", "VZ.Rest of World", "VZ.Global"}),
        #"Fjernede fejl" = Table.RemoveRowsWithErrors(#"Udvidet VZ", {"VZ.Pos"}),
        #"Erstattet værdi" = Table.ReplaceValue(#"Fjernede fejl","PSV","VITA",Replacer.ReplaceText,{"VZ.Platform"}),
        #"Erstattet værdi1" = Table.ReplaceValue(#"Erstattet værdi","XB","XBOX",Replacer.ReplaceText,{"VZ.Platform"}),
        #"Replaced Value" = Table.ReplaceValue(#"Erstattet værdi1","N/A","1950",Replacer.ReplaceText,{"VZ.Year"}),
        #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Replaced Value", {"Page", "VZ.Pos", "VZ.Game", "VZ.Platform", "VZ.Year", "VZ.Genre", "VZ.Publisher"}, "Attribute", "Value"),
        #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Columns",{{"Attribute", "Region"}, {"Value", "Unit Sales"}}),
        #"Replaced Value1" = Table.ReplaceValue(#"Renamed Columns","VZ.","",Replacer.ReplaceText,{"Region"}),
        #"Filtered Rows" = Table.SelectRows(#"Replaced Value1", each ([Region] <> "Global")),
        #"Renamed Columns1" = Table.RenameColumns(#"Filtered Rows",{{"Unit Sales", "Unit Sales SW"}}),
        #"Replaced Value2" = Table.ReplaceValue(#"Renamed Columns1","Rest of World","Australia (et al)",Replacer.ReplaceText,{"Region"}),
        #"Renamed Columns2" = Table.RenameColumns(#"Replaced Value2",{{"VZ.Game", "Title"}, {"VZ.Platform", "Platform"}, {"VZ.Genre", "Genre"}, {"VZ.Publisher", "Publisher"}})
    in
        #"Renamed Columns2"

    Why would this suddenly stop working in the May 2016 release of Powerr BI (and still be broken in June 2016 release)?

    I am only interested in answers providing a quick fix to this, I am not interested in building a new solution from the bottom.

    Thanks alot!

    Sunday, July 3, 2016 8:06 PM
  • Not fixed as of today in Excel 365 Pro Plus.
    The line feed step instruction is translated
    incorrectly as "#(#)(lf)"
    Change it to "#(lf)" in the Advanced Editor.

    Thursday, July 26, 2018 10:43 PM