none
How to automate/parameterize/loop power query? RRS feed

  • Question

  • Is it possible to write a single generic powerquery command that'll loop through a list of URLs and load the data to a single table? the webpages in question have the same format & table names, just the data contained within is different. I verified this by manually defining a powerquery from web import, importing, changing the url, and hitting refresh

    Is it possible to write a generic powerquery import to pull the data in and have it loop to append the data to a target worksheet?

    I tried macro recording some of the powerquery actions of defining an import but it didn't record anything so im assuming i cant use a vba loop to do this.

    I want to avoid having to manually define each from web import explicitly. There's a few hundred i'd need to do.


    Jakub @ Adelaide, Australia

    Wednesday, January 15, 2014 5:58 AM

Answers

  • Here's a quick walkthrough for how to do something like this. I recommend reading through to the end before trying this, because there will a step later on which you'll actually want to do first to simplify things.

    Let's say I want to get presidential election results for multiple US States from Wikipedia. So to start, I use the "From Web" option inside Power Query and specify an URL of http://en.wikipedia.org/wiki/California. This lets me pick between several sections of the web page, and I choose the one called "Presidential elections results" and edit it. The results look good in the preview, so I go to the "advanced editing" view that shows me the M query:

    let
        Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/California")),
        Data5 = Source{5}[Data],
        ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
    in
        ChangedType

    The numeric index "5" worries me a little because I suspect that there's not enough regularity in the Wikipedia pages for the US states for the 6th section to always be the one that I want. So I edit the query to index by the section name instead and confirm that it still produces the results I want:

    let
        Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/California")),
        Data5 = Source{[Caption="Presidential elections results"]}[Data],
        ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
    in
        ChangedType

    Now I need to turn this into a function that can be applied to multiple pages. The only way to do that is to edit the M query directly, but the transformation is relatively straightforward. Again, I confirm that the output is still what I expected after making these changes.

    let
        Election.Results = (state) => let
            Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/" & state)),
            Data5 = Source{[Caption="Presidential elections results"]}[Data],
            ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
        in
            ChangedType
    in
        Election.Results("California")

    Now, I want to apply this function to multiple web pages. These might come from a different Excel sheet or from an external file, but I'm just going to build a table of web pages manually. I'll keep the function around, but won't use it just yet in the overall formula. Once more, I confirm that the output is what I expect, which is a single table listing three states.

    let
        Election.Results = (state) => let
            Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/" & state)),
            Data5 = Source{[Caption="Presidential elections results"]}[Data],
            ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
        in
            ChangedType,
        States = Table.FromRows({{"California"}, {"Idaho"}, {"Massachusetts"}}, {"State"})
    in
        States

    Now I can use the "Insert Custom Column" feature of the UI to apply my function to the State column. This looks like "=Election.Results([State])" in the dialog. But when I click OK, something unpleasant happens: I get prompted several times for "data privacy" settings for each of the URLs I'm trying to access. This is quite annoying, and we're looking for ways to improve the experience -- perhaps by letting you pick that the privacy settings apply to all of http://en.wikipedia.org and not just to the specific URL you're trying to access.

    Meanwhile, you can achieve the same thing manually with a kind of underhanded trick by creating a throwaway query which references http://en.wikipedia.org and one other website, and using the "data privacy" prompt from that query to define a privacy rule for the entire domain. Exit the editor and create a new "Blank Query" with this single formula: "= Binary.Length(Web.Contents("http://en.wikipedia.org")) + Binary.Length(Web.Contents("http://www.bing.com"))" This will prompt you to pick a privacy setting for those two websites (I would suggest that "Public" is the right choice). Once you've done that, you can discard this query; its work is done.

    Now if you go back and repeat the steps for the "elections" query, you'll avoid the privacy prompt.

    At this point, you'll have a table with two columns: one that has the state, and one that has the results as Table values. We want to expand out those values, so we click on the expansion indicator in the column header for Custom and pick all fields. The result is a table with four columns: State, Custom.Year, Custom.Republican and Custom.Democratic. The latter three fields can be renamed from either the UI or by editing the M code. In my case, I ended up with a program which looks like this:

    let
        Election.Results = (state) => let
            Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/" & state)),
            Data5 = Source{[Caption="Presidential elections results"]}[Data],
            ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
        in
            ChangedType,
        States = Table.FromRows({{"California"}, {"Idaho"}, {"Massachusetts"}}, {"State"}),
        InsertedCustom = Table.AddColumn(States, "Custom", each Election.Results([State])),
        #"Expand Custom" = Table.ExpandTableColumn(InsertedCustom, "Custom", {"Year", "Republican", "Democratic"}, {"Year", "Republican", "Democratic"})
    in
        #"Expand Custom"

    And the job is done!

    Actually, the hard part might just be getting started. Finding a set of web sites which is consistent enough for a single parsing function to work can be rare. As an example, when I tried to add "Washington_(U.S._state)" to my list of states, I discovered that its table was formatted differently than the others. And when I tried to add "New_york_state" to the list of states, I found that its section was named "New York State Presidential Election Results" instead of just "Presidential election results". So it's possible that you'll have to do an awful lot of tweaking.

    Nonetheless, this should cover the basic steps and help you get started.


    • Edited by Curt Hagenlocher Wednesday, January 15, 2014 8:52 PM small fix
    • Marked as answer by jakubk Wednesday, January 15, 2014 11:34 PM
    Wednesday, January 15, 2014 8:51 PM

All replies

  • Also interested in performing a loop that results in several queries being accumulated into the same table.

    I wonder if the M-language that PowerQuery uses has a "For Each ... In... Let Source (...), Do blah" function.

    Eric.

    Wednesday, January 15, 2014 5:08 PM
  • Something like this should work:

    let
        Source = Table.FromColumns({{"firstURL", "secondURL", "etc."}}, {"URLS"}),
        InsertedCustom = Table.AddColumn(Source, "Custom", each Web.Page(Web.Contents([URLS])))
    in
        InsertedCustom

    You can fiddle around with the custom formula to get the data you need from the tables in the web page.

    You will probably get a lot of prompts about the privacy level of each web page. You can disable those prompts by turning on Fast Combine, but only do this if you are certain that you trust all of the websites you are getting data from.

    As for accumulating several queries into the same table, do you just want to append the tables together? Table.Combine should be able to do that.

    Also, List.Transform might suite your for-each needs.

    -Alejandro (MSFT)

    Wednesday, January 15, 2014 8:34 PM
  • Here's a quick walkthrough for how to do something like this. I recommend reading through to the end before trying this, because there will a step later on which you'll actually want to do first to simplify things.

    Let's say I want to get presidential election results for multiple US States from Wikipedia. So to start, I use the "From Web" option inside Power Query and specify an URL of http://en.wikipedia.org/wiki/California. This lets me pick between several sections of the web page, and I choose the one called "Presidential elections results" and edit it. The results look good in the preview, so I go to the "advanced editing" view that shows me the M query:

    let
        Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/California")),
        Data5 = Source{5}[Data],
        ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
    in
        ChangedType

    The numeric index "5" worries me a little because I suspect that there's not enough regularity in the Wikipedia pages for the US states for the 6th section to always be the one that I want. So I edit the query to index by the section name instead and confirm that it still produces the results I want:

    let
        Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/California")),
        Data5 = Source{[Caption="Presidential elections results"]}[Data],
        ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
    in
        ChangedType

    Now I need to turn this into a function that can be applied to multiple pages. The only way to do that is to edit the M query directly, but the transformation is relatively straightforward. Again, I confirm that the output is still what I expected after making these changes.

    let
        Election.Results = (state) => let
            Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/" & state)),
            Data5 = Source{[Caption="Presidential elections results"]}[Data],
            ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
        in
            ChangedType
    in
        Election.Results("California")

    Now, I want to apply this function to multiple web pages. These might come from a different Excel sheet or from an external file, but I'm just going to build a table of web pages manually. I'll keep the function around, but won't use it just yet in the overall formula. Once more, I confirm that the output is what I expect, which is a single table listing three states.

    let
        Election.Results = (state) => let
            Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/" & state)),
            Data5 = Source{[Caption="Presidential elections results"]}[Data],
            ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
        in
            ChangedType,
        States = Table.FromRows({{"California"}, {"Idaho"}, {"Massachusetts"}}, {"State"})
    in
        States

    Now I can use the "Insert Custom Column" feature of the UI to apply my function to the State column. This looks like "=Election.Results([State])" in the dialog. But when I click OK, something unpleasant happens: I get prompted several times for "data privacy" settings for each of the URLs I'm trying to access. This is quite annoying, and we're looking for ways to improve the experience -- perhaps by letting you pick that the privacy settings apply to all of http://en.wikipedia.org and not just to the specific URL you're trying to access.

    Meanwhile, you can achieve the same thing manually with a kind of underhanded trick by creating a throwaway query which references http://en.wikipedia.org and one other website, and using the "data privacy" prompt from that query to define a privacy rule for the entire domain. Exit the editor and create a new "Blank Query" with this single formula: "= Binary.Length(Web.Contents("http://en.wikipedia.org")) + Binary.Length(Web.Contents("http://www.bing.com"))" This will prompt you to pick a privacy setting for those two websites (I would suggest that "Public" is the right choice). Once you've done that, you can discard this query; its work is done.

    Now if you go back and repeat the steps for the "elections" query, you'll avoid the privacy prompt.

    At this point, you'll have a table with two columns: one that has the state, and one that has the results as Table values. We want to expand out those values, so we click on the expansion indicator in the column header for Custom and pick all fields. The result is a table with four columns: State, Custom.Year, Custom.Republican and Custom.Democratic. The latter three fields can be renamed from either the UI or by editing the M code. In my case, I ended up with a program which looks like this:

    let
        Election.Results = (state) => let
            Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/" & state)),
            Data5 = Source{[Caption="Presidential elections results"]}[Data],
            ChangedType = Table.TransformColumnTypes(Data5,{{"Year", type number}, {"Republican", type text}, {"Democratic", type text}})
        in
            ChangedType,
        States = Table.FromRows({{"California"}, {"Idaho"}, {"Massachusetts"}}, {"State"}),
        InsertedCustom = Table.AddColumn(States, "Custom", each Election.Results([State])),
        #"Expand Custom" = Table.ExpandTableColumn(InsertedCustom, "Custom", {"Year", "Republican", "Democratic"}, {"Year", "Republican", "Democratic"})
    in
        #"Expand Custom"

    And the job is done!

    Actually, the hard part might just be getting started. Finding a set of web sites which is consistent enough for a single parsing function to work can be rare. As an example, when I tried to add "Washington_(U.S._state)" to my list of states, I discovered that its table was formatted differently than the others. And when I tried to add "New_york_state" to the list of states, I found that its section was named "New York State Presidential Election Results" instead of just "Presidential election results". So it's possible that you'll have to do an awful lot of tweaking.

    Nonetheless, this should cover the basic steps and help you get started.


    • Edited by Curt Hagenlocher Wednesday, January 15, 2014 8:52 PM small fix
    • Marked as answer by jakubk Wednesday, January 15, 2014 11:34 PM
    Wednesday, January 15, 2014 8:51 PM
  • Many thanks curt! That should get me what i'm after in no time


    Jakub @ Adelaide, Australia

    Wednesday, January 15, 2014 11:40 PM
  • hmm a few more questions..

    My list of urls is in an excel table of the format Identifier (ie state) | URL

    I figured out how to reference this table directly by doing

    Source= Excel.CurrentWorkbook(){[Name="TeamSource"]}[Content],

    -- where TeamSource is the name of the excel table & substituting source for where you reference your States table

    What your code is doing is appending the content of the url to the 'Source' table. Is there another way? I'm asking because my TeamSource table has a bunch of extra columns that I dont need in my powerquery output. I'm running a table.removecolumns() to remove all of them except the identifier but was wondering if it's possible to do this the other way round; load the data from the url and add the current identifier for each url. It's a lot of data and im just thinking memory footprint, it's always quicker/less resource intensive to only load what you want rather than load too much then remove. Plus im using 32bit excel so I reckon i might get outofmemoryexception errors. I know i do when playing with powerpivot and power maps.

    Obviously if "traditional" loops aren't possible in powerquery then your ingenious approach has certainly got around that by using a control table and table.column, but it means sometimes pulling in more data than necessary


    Jakub @ Adelaide, Australia

    Thursday, January 16, 2014 1:12 AM
  • M is a functional language; functional languages typically don't have loops. But M also typically does lazy evaluation -- so despite what the intermediate results look like, it's likely that the URLs are really being processed one-at-a-time, and that the entire contents of all the URLs are not in memory at once.

    Does that address your concern? If not, I think you'll need to be a little more explicit about what you perceive as the problem.

    Thursday, January 16, 2014 1:28 AM
  • ok thanks, that's all i need to know

    The other PowerBI excel pieces don't seem to be as memory friendly. PowerPivot and Powermaps are definite memory hogs which can cause issues with the 32bit 2gb limit


    Jakub @ Adelaide, Australia

    Thursday, January 16, 2014 2:56 AM
  • Dear Curt.

    I saw your threat on this issue - well written - and thought you might be at help. I have asked the question below at Kasper de Jongs blog (http://www.powerpivotblog.nl/nba-team-spending-and-their-results-with-excel-powerpivot-and-data-explorer/).

    Thanks in advance,

    Jens Ole

    Hi Kasper and other.

    Maybe this is too custom specific and you there will not spend time on it – and that is of cause ok.

    As I described earlier do like the NBA example and I am testing it against a Danish Real Estate database (http://www.boliga.dk/salg/resultater?so=1&sort=omregnings_dato-d&maxsaledate=today&type=&iPostnr=&gade=&minsaledate=2011)

    When I do invoke query 1 with any page e.g. page=10 the query return 40 lines (the website provide 40 lines per result set) in a snap – very fast!

    Query 1 look like this:
    = (page) => let
    Source = Web.Page(Web.Contents(“http://www.boliga.dk/salg/resultater?so=1http://www.boliga.dk/salg/resultater?so=1&minsaledate=2006&p=” & Number.ToText(page))),
    Data1 = Source{1}[Data]

    in
    Data1

    The problem appears in the second query. In the query editor I would like to return 5000 pages. The script would look like this:

    = let
    Source = Table.Combine(List.Transform({1..5000}, Query1))
    in
    Source

    If I run the query as:

    Source = Table.Combine(List.Transform({1}, Query1))

    then the editor will return the result fast and as expected.

    If I add another page that is –
    Table.Combine(List.Transform({1, 2}, Query1)) – then it works fine too.

    However, adding more pages like – Table.Combine(List.Transform({1..12}, Query1)) – the result set is often not returned to the Excel sheet.

    It seems to me that either 1) the webpage does not respond (I believe it does) og 2) Power Query fail combining many table sets.

    Any idea how to debug this?

    If you like I can forward the spreadsheet to those who would like to see the example.

    Hope to get some troubleshooting idea – thanks in advance.

    /Jens Ole

    Thursday, July 31, 2014 1:26 PM
  • I can't think of any reason that Power Query would fail in this case. The best way to troubleshoot this is probably with a tool called Fiddler which will show you what's happening with your HTTP requests.
    • Proposed as answer by Eric Vogelpohl Thursday, January 22, 2015 3:22 PM
    Thursday, July 31, 2014 9:02 PM
  • Perhaps you can design your load around Table.Repeat () ?
    Thursday, January 22, 2015 3:38 AM
  • I have some questions after reading your answer.

    1. What parts of HTML document are retained by the returned value of Web.Page?
    2. How can I select a page by ID or html attributes or a CSS selectors?
    3. How should be debug the return values? Is there something like a scripting console where I can try and evaluate the return values? Something like developer tools in Internet Explorer?

    You are selecting a table which has a caption element. See below:

    <table>

    <caption>Some text</caption>

    <tr>

    <td>...</td>

    ...

    </tr>

    ...

    </table>

    For example if I have something like:

    <table id="observation" class="table obs">

    ...

    </table>

    My question was how do I select this kind of table?

    And yes, I am a web developer :).


    • Edited by Raviteja L Friday, October 9, 2015 11:01 AM Adding more information
    Friday, October 9, 2015 10:57 AM
  • Hi curt , is there a way to pass the state names to this function from an existing table , other than hard coding it into the function
    • Edited by Uthpala EW Tuesday, July 12, 2016 8:58 AM
    Tuesday, July 12, 2016 8:47 AM
  • Many thanks for this solution Curt. Definitely beat the cascading set of queries I'd been using before hand!

    Because this solution only allows for static data sources, I ended up replacing it with a tweaked version of a fix posted by Excelerator BI's Matt (http://exceleratorbi.com.au/consolidate-multiple-excel-files-in-sharepoint-using-powerquery/) that employs a custom function to dynamically open the common data table in a designed folder path and file set. Worked great for me, but my source data was uniform and I didn't have to deal with things like table name variance.

    Shoulders of giants and all. Thanks again!

    Wednesday, October 12, 2016 8:36 PM
  • Thank you so much for this. There's no way I could have figured this out without your example. You've helped educate me!

    Once I understood how this works, I did a change that deals with the fact that the different states' wiki pages use different variations on "Presidential elections results" as you mentioned (and many include a footnote at the end). So instead of narrowing for that in the function, I just used the function to get the whole source and filtered for "Presidential elections" among the captions. It works well (but then there are other data issues in some of the pages!)

    let
        Election.Results = (state) => let
            Source = Web.Page(Web.Contents("http://en.wikipedia.org/wiki/" & state))
        in

        Source,

        le_table = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],

    // The above Table5 is my list of states in a worksheet table

        States = Table.TransformColumnTypes(le_table,{{"State", type text}}),

        InsertedCustom = Table.AddColumn(States, "Custom", each Election.Results([State])),

        #"Expanded Custom" = Table.ExpandTableColumn(InsertedCustom, "Custom", {"Caption", "Source", "ClassName", "Id", "Data"}, {"Custom.Caption", "Custom.Source", "Custom.ClassName", "Custom.Id", "Custom.Data"}),

        #"Added Custom" = Table.AddColumn(#"Expanded Custom", "FindPresElectionTable", each Text.Contains([Custom.Caption],"Presidential election")),

        #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([FindPresElectionTable] = true)),

        #"Expanded Custom.Data" = Table.ExpandTableColumn(#"Filtered Rows", "Custom.Data", {"Year", "Democratic", "Republican", "Democrat"}, {"Year", "Democratic", "Republican", "Democrat"}),

    // additional shaping  and transformation steps omitted on social.technet.microsoft.com - last one is #"Reordered Columns"

    in
        #"Reordered Columns"

    Monday, May 15, 2017 8:33 PM
  • What about when the URLs are within the organizational domain, on a SharePoint site as a feed, or a Project Web App OData feed... 
    Wednesday, September 27, 2017 8:15 PM
  • I disagree about the loops concept.

    I believe that M does loops almost entirely. The each part is the row of the current set it is being fed. List of URLS, rows of current table. In general, we are applying a function to each row. We even have a loop for files in a folder. That is one of the things to consider in M design, what do you have to do to which column(s) to obtain the result you are looking for. 

    Friday, September 29, 2017 11:09 AM
  • hey!

    The principle will be the same:

    1. Create a function that will be able to get your data from a single element
    2. Define a logic to "loop" through each page
    3. Apply the function inside that Loop

    Here's a video showcasing how to somewhat of a similar thing to a 'DO / WHILE' for pagination or looping scenarios.

    This is also some official documentation, by Microsoft, on handling Pagination scenarios with Power Query.

    Saturday, September 30, 2017 4:42 PM
  • I don’t know much apart from M, but it could be that M has a method that makes explicit loops and iterations (like the ones Miguel has referenced) obsolete: If you have a list or a column with things like URLs, on which you want to perform a similar operation/function, then a simple Table.AddColumn which executes the operation like Curt has described is enough: A new column will be created that hold the results for each row/URL and you simply expand it to have all results in one table.

     

    So there will be no command that hints any loop or iteration like you might expect from other languages.

     

    The only cases where you need an explicit looping/iterating command like List.Generate or List.Aggregate or a recursion (using @) is when you don’t have the “objects” on which you want to perform your operations on beforehand. So everytime when the function arguments are dynamic and don’t come in a predefined list or table you need one of these explicit loop/iterating functions.

     

    I believe this goes with what geraldartman meant.


    Imke Feldmann - MVP Data Platform (PowerBI) - http://www.TheBIccountant.com 

    Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to vote it as helpful - Thanks!

    Saturday, September 30, 2017 9:15 PM
    Moderator
  • I'm trying to do a similar function and your solution seems to be very helpful, but I'm still not able to make it work. 
    I've tried to reproduce your example but it results in a table with the name of states instead of the web data combined for the 3 states. 
    What I want to do is similar but using SQL statements getting data from a ms-access database. 
    I want to select some ms-access files (same database structure and different data) and, loop through them executing the same "SELECT XXXX", adding a column with the file name and combining all data. Is this possible?
    I could do something similar using full table, but I need to use SQL statement.  I built the string to "Formula" property, but I guess your solution is much better if I could make it work. 

    Following there is an example with two files of the code what I tried to do using vba (the loop to build the strSource string is not presented):

    strSource = "let
        Fonte = Access.Database(File.Contents("PathName\File01Name.mdb")),
        _TbEx_1 = Fonte{[Schema="",Item="TableExample"]}[Data],
        _TbEx_C1 = Table.AddColumn(_TbEx_1, "Case", each "File01Name", type text),
        Fonte_2 = Access.Database(File.Contents("PathName\File02Name.mdb")),
        _TbEx_2 = Fonte_2{[Schema="",Item="TableExample"]}[Data],
        _TbEx_C2 = Table.AddColumn(_TbEx_2, "Case", each "File02Name", type text),
        _TbExComb = Table.Combine({_TbEx_C1 , _TbEx_C2})
    in
        _TbExComb"
    
    ActiveWorkbook.Queries("PwQry01").Formula = strSource
    ThisWorkbook.Connections("Qry-PwtQry01").Refresh

    I don't want this "Item=TableExample" but a SQL statement. ...

    Thanks for your help.

    Thursday, April 25, 2019 8:29 PM