Answered by:
Engine/IO/Web/Request/GetResponse and Rest API rate limits

Question
-
( originally posted to Power Pivot by mistake : https://social.msdn.microsoft.com/Forums/en-US/c4edffc2-4c1f-4559-8838-ee635326b34a/engineiowebrequestgetresponse-and-rest-api-rate-limits?forum=sqlkjpowerpivotforexcel )
Hello,
I'm working with an API that does not expose data access methods in the dimension that I need so I need to serialize requests to the API (this is straightforward in .net or java). I'm experimenting with Power Query to see if I can avoid writing a dedicated scraper application. Unfortunately, the Microsoft mashup engine is not behaving the way I would expect.
I started by converting the api request to a function and then calling it over a data column. However, I noticed that as my domain grew the power query data refresh would inevitably crash with a request timeout error. This is fairly typical when hitting a remote server so I tried to catch the errors and handle them. I also added a timing mechanism to make sure I wasn't hitting the API ratelimit. See original code below:
let myFunct = ( param_a as text, param_b as date , retry_attempt as number ) => let y = Function.InvokeAfter( () => true , #duration(0,0,0,5)), x = Web.Contents( "endpoint URL", [Content=Text.ToBinary("<soapenv:Envelope>requestBody(including param_a & param_b)</soapenv:Envelope>"), Timeout = #duration(0,0,0,30) , IsRetry = true ] ), myTrace = Diagnostics.Trace(TraceLevel.Warning, "process_name " & param_a & "|" & Date.ToText(param_b) & "|" & Number.ToText(retry_attempt) , true ), longWait = Function.InvokeAfter( () => true , #duration(0,0,0,60)) in if (y and myTrace) then ( try Xml.Tables(x) otherwise ( if ( ( retry_attempt < 3 ) and longWait ) then @myFunct( param_a , param_b , (retry_attempt + 1) ) else null ) ) else null in myFunct
( I am aware that calling Xml.Tables on the return value in a recursive function is broken, need to rewrite )
At this point I was surprised to find out that this had very little to no impact on the power query data refresh crashes. I started digging around the mashup log files and it seems that Web.Contents() is a method that invokes .Net TryExecuteAction and has it's own error handling routines. Unfortunately, I can't find much in terms of documentation for Web.Contents() logic, nor can I find any means to configure it's behavior. My next step would be to discard Web.Contents() in favor of the more basic WebAction.Request(). However, even though MSDN documents the action function - Power Query does not seem to recognize them at all (nevermind invoking the actions once they are created).
P.S. - I have also learned that Web.Contents and Web.Page may behave differently and there may be value in persuing Web.Page ( http://stackoverflow.com/questions/37838007/power-bi-power-query-web-request-results-in-cr-must-be-followed-by-lf-error ).
Exception Thrown:
ExceptionType: System.Net.WebException, System, Version=4.0.0.0, Culture=neutral, Message: The operation has timed out StackTrace: at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context) at System.Net.HttpWebRequest.GetRequestStream() at Microsoft.Mashup.Engine1.Library.Web.Request.GetWebResponse(WebRequest webRequest) at Microsoft.Mashup.Engine1.Library.Web.Request.CreateResponse(ResourceCredentialCollection credentials) at Microsoft.Mashup.Engine1.Library.Web.Request.GetResponseCore(ResourceCredentialCollection credentials) at Microsoft.Mashup.Engine1.Library.Web.Request.<>c__DisplayClass42.<GetResponse>b__40() at Microsoft.Mashup.Engine1.Library.Common.RetryPolicy.Execute[TResult](IEngineHost host, Func`1 func) at Microsoft.Mashup.Engine1.Library.Web.Request.GetResponse(ResourceCredentialCollection credentials, RetryPolicy retryPolicy, AuthorizationThrowFunction authorizationFunction)
and Log excerpt:
Library/Common/RetryPolicy/TryExecuteAction 3 Library/Common/RetryPolicy/TryExecuteAction 2 Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Engine/User Library/Common/RetryPolicy/TryExecuteAction 3 Library/Common/RetryPolicy/TryExecuteAction 2 Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Engine/User Library/Common/RetryPolicy/TryExecuteAction 3 Library/Common/RetryPolicy/TryExecuteAction 2 Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Engine/User Library/Common/RetryPolicy/TryExecuteAction 3 Library/Common/RetryPolicy/TryExecuteAction 2 Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Engine/User Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Engine/User Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Engine/User Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Library/Common/RetryPolicy/TryExecuteAction 1 Library/Common/RetryPolicy/ExecuteAction Engine/IO/Web/Request/GetResponse Engine/User
( I am just noticing that two actions are executed for every getresponse )
Thank you for your time,
-A
Tuesday, January 17, 2017 3:17 PM
Answers
-
I agree that Power Query should be the right tool to make this work! I'll answer some of your questions about how things work today, but please understand that the internal implementation can change :)
Correct, Web.Contents() uses the .NET HTTP client. Today, Web.Page() bypasses the Web.Contents() HTTP client and instead uses the Internet Explorer HTTP client, which tends to be much more tolerant of various kinds of errors. However, I'd caution you against relying on IE for a myriad of reasons. Specifically, IE has its own HTTP timeout policies and Web.Page() doesn't attempt to increase the timeout IE uses. Web.Contents() will use the timeout you configure, and will internally do some exponential backoff for a few retries. If the web server responds with a HTTP 429 we should probably have a different retry policy, but I'm unsure if that's supported today.
Today, WebAction.Request() is not publically accessible today in the "M" library in Power Query and Power BI Desktop, so sorry that MSDN has led you astray. (If you have thoughts for how to change MSDN to make this more clear, it helps to submit feedback on the MSDN site. If you have specific recommendations or complaints, feel free to email me at carlwa at microsoft . com.)
To debug this further, I'd recommend looking deeper into when the timeouts happen. Does each request time out at random, and then work if you try them again? Or do you need to set the timeout to 2 minutes for some URLs? You could probably answer these questions using the logs you have, but I'd recommend setting up Fiddler to get a better understanding.
A few things that might help:
- Increase the Web.Contents timeout to 2 minutes or 20 minutes.
- Call Binary.Buffer over the binary returned by Web.Contents to ensure that the XML parsing doesn't cause repeated HTTP requests. Lazy binary values can cause a lot of trouble!
- Use Fiddler logging or look at product logs across all Container EXEs to understand how many HTTP requests are actually being made.
If this doesn't help, if you could send-a-frown we'd be able to see better diagnostic information, and possibly we'd have some tips for working with this specific API.
- Marked as answer by tanstaafl3487 Monday, January 23, 2017 4:30 PM
Thursday, January 19, 2017 7:24 PM
All replies
-
I agree that Power Query should be the right tool to make this work! I'll answer some of your questions about how things work today, but please understand that the internal implementation can change :)
Correct, Web.Contents() uses the .NET HTTP client. Today, Web.Page() bypasses the Web.Contents() HTTP client and instead uses the Internet Explorer HTTP client, which tends to be much more tolerant of various kinds of errors. However, I'd caution you against relying on IE for a myriad of reasons. Specifically, IE has its own HTTP timeout policies and Web.Page() doesn't attempt to increase the timeout IE uses. Web.Contents() will use the timeout you configure, and will internally do some exponential backoff for a few retries. If the web server responds with a HTTP 429 we should probably have a different retry policy, but I'm unsure if that's supported today.
Today, WebAction.Request() is not publically accessible today in the "M" library in Power Query and Power BI Desktop, so sorry that MSDN has led you astray. (If you have thoughts for how to change MSDN to make this more clear, it helps to submit feedback on the MSDN site. If you have specific recommendations or complaints, feel free to email me at carlwa at microsoft . com.)
To debug this further, I'd recommend looking deeper into when the timeouts happen. Does each request time out at random, and then work if you try them again? Or do you need to set the timeout to 2 minutes for some URLs? You could probably answer these questions using the logs you have, but I'd recommend setting up Fiddler to get a better understanding.
A few things that might help:
- Increase the Web.Contents timeout to 2 minutes or 20 minutes.
- Call Binary.Buffer over the binary returned by Web.Contents to ensure that the XML parsing doesn't cause repeated HTTP requests. Lazy binary values can cause a lot of trouble!
- Use Fiddler logging or look at product logs across all Container EXEs to understand how many HTTP requests are actually being made.
If this doesn't help, if you could send-a-frown we'd be able to see better diagnostic information, and possibly we'd have some tips for working with this specific API.
- Marked as answer by tanstaafl3487 Monday, January 23, 2017 4:30 PM
Thursday, January 19, 2017 7:24 PM -
Carl,
Wrapping Web.Contents with Binary.Buffer had a great impact - it got rid of the duplicate API requests and I have not seen any request timeouts since.
However, I did notice that Binary.Buffer strips the metadata from the return value. It seems like you can have performance or error handling but not both. Is there a way around this? I.e. is there a way I can exploit Manual Status Handling and Binary.Buffer at the same time?
Thanks again,
-A
Monday, January 23, 2017 4:34 PM -
Aha, yes, most transformation functions don't preserve metadata.
You can write a different buffer function to copy the Web.Request binary's metadata back onto the buffered binary. Be aware that accessing some of the fields in the metadata record will be lazy and cause a new HTTP request.
let BufferWithMetadata = (binary) => (try Binary.Buffer(binary) otherwise null) meta Value.Metadata(binary), Source = Value.Metadata(BufferWithMetadata(Web.Contents("httpbin.org/status/404"))) in Source
Monday, January 23, 2017 5:42 PM