none
Power Query How to Speed up Code when Downloading Data from 1000 different Wiki API's RRS feed

  • Question

  • Hi All,

    I am looking for 2 suggestion for my query :-

    1) How to Speed up code performance.

    1) What is the best way if I only need to refresh only 100 records out of 10000 records which are previously imported. So that Power Query not run on all API links and only refresh the new 100 API links. (Details mentioned below)

    1) I am importing data using Power Query and my source is Wiki API JSON links like (https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=1),  https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=2) and so on......

    Below given query is working fine but its very slow. 

    Can anybody suggest me how to speed up the code performance ?

    First Function

    let GetResults=(Tag) =>
    let
        Source = try Json.Document(Web.Contents("https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles="& Tag))
        otherwise "Error",
        query = Source[query],
        pages = query[pages],
        #"Converted to Table" = Record.ToTable(pages),
        #"Expanded Value" = Table.Buffer(Table.ExpandRecordColumn(#"Converted to Table", "Value", {"title", "extract"}, {"title", "extract"})),
        #"Removed Columns" = Table.RemoveColumns(#"Expanded Value",{"Name"}),
        #"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"extract", type text}, {"title", type text}})
    in
        #"Changed Type"
    
    in GetResults

    Second Function 

    let
        Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Page ID", type text}}),
        #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Page ID", "API_Detail"}}),
        #"Inserted Text After Delimiter" = Table.AddColumn(#"Renamed Columns", "Text After Delimiter", each Text.AfterDelimiter([API_Detail], "=", 4), type text),
        #"Renamed Columns1" = Table.RenameColumns(#"Inserted Text After Delimiter",{{"Text After Delimiter", "Tag"}}),
        #"Invoked Custom Function" = Table.AddColumn(#"Renamed Columns1", "Data", each api([Tag])),
        #"Removed Errors" = Table.RemoveRowsWithErrors(#"Invoked Custom Function", {"Data"}),
        #"Expanded Data" = Table.ExpandTableColumn(#"Removed Errors", "Data", {"extract"}, {"extract"}),
        #"Changed Type1" = Table.TransformColumnTypes(#"Expanded Data",{{"extract", type text}}),
        #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Tag"})
    in
        #"Removed Columns"

    Rajender

    Saturday, May 18, 2019 4:46 PM

Answers

All replies

  • Hi Rajender,

    Welcome to the Office 2013 and Office 365 ProPlus IT Pro General Discussions forum. This IT Professional forum is for general questions, feedback, or anything else related to the RTM release versions of Office 2013 and Office 365 ProPlus.

    Your question is mainly related to Power Query. To help you better resolve the problem, I will move the thread to Power Query Forum. You may get more helpful replies there.

    Thanks for your kind understanding.

    Best Regards,

    Herb


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.

    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Office 2019.

    Monday, May 20, 2019 2:01 AM
  • Hi Rajender. For only downloading new URLs, you'll need to do something similar to what Imke demonstrates here:

    https://exceleratorbi.com.au/self-referencing-tables-power-query/

    Except that instead of loading comments from the previously loaded version of the query results, you'll need to determine whether a URL needs to be pulled or not based on whether its results are already loaded to the current workbook.

    The above approach is pretty advanced. If you're looking for a simpler solution, the only thing I can think of would be to break the URLs into groups (in separate queries) so that they can be refreshed independently.

    Ehren


    Tuesday, May 21, 2019 5:43 PM
    Owner