none
Queries on external tables are significantly slower

    Question

  • Hi there, 

    We have performance issues when we query external tables. We created a set of external tables in one Azure SQL Database referencing tables in another Azure SQL Database. When we run a query on the external tables it's very slow (>5-10 minutes). The same query executes very fast in the referenced database (1-5 seconds). What might be causing this issue?   

    Thanks!

    Friday, June 09, 2017 6:37 AM

All replies

  • Where is the second database located, same region as the first or different?

    Sam Cogan Microsoft Azure MVP
    Blog | Twitter

    Friday, June 09, 2017 7:50 AM
  • Hello,

    Please examine the query plan and let us know if the most costly operator is the “remote query”. How many rows are involved on that operator? Please parameterized the query on the WHERE clause and specially values that are compared to columns on the remote database or external tables.This may avoid transferring millions of rows and improve query performance.



    Hope this helps.



    Regards,

    Alberto Morillo
    SQLCoffee.com

    Friday, June 09, 2017 1:28 PM
  • So the whole architecture looks something like this:

    RESOURCE GROUP 1:
    a) PRIMARY (West US)
    - SQL Server 1:
    -- DB02.Shard01 (Elastic std)
    -- DB02.Shard02 (Elastic std)
    -- DB02.Shard03 (Elastic std)

    - SQL Server 1:
    -- DB02 (S1 std)
    -- DB03 (S1 std)

    - SQL Server 3:
    -- ShardMapManagerDB (S1 std)

    a) SECONDARY (Central US)
    - SQL Server 3:
    -- DB02.Shard01 (Elastic std)
    -- DB02.Shard02 (Elastic std)
    -- DB02.Shard03 (Elastic std)

    - SQL Server 4:
    -- DB02 (S1 std)
    -- DB03 (S1 std)

    - SQL Server 5:
    -- ShardMapManagerDB (S1 std)

    RESOURCE GROUP 2:
    - SQL Server6 (West US):
    -- ReportsDB (Standard S0) 

    Production databases are in the West US region. Secondary geo-replicated databases are read-only in the Central US region. The ReportsDB database uses the read-only secondaries as external data sources.

    So is the fact that the databases used as external sources are in the Central US region whereas the database itself is in West US?

    I understand that to perform cross-database queries the majority of the workload should be performed on the data tier. But even when I run a query on the ReportsDB (using external data source) that only targets one database it's very slow compared to when I run it directly on that database (10 seconds vs. 20 minutes). This particular query is an aggregation that only returns few rows. But when I examine the execution plan, Remote query accounts for 64% and returns 17k rows and then it sorts locally (36%) and after a filter operation, it only returns 6 rows. Why is it doing that? Why doesn't it just do all the work first and then just transfer the 6 rows accross?

    Thanks for your help!







     
    Sunday, June 11, 2017 11:46 AM
  • Hello,

    Could you share with us the query plan or the query we are talking about?


    Thank you in advance.


    Regards,

    Alberto Morillo
    SQLCoffee.com


    Monday, June 12, 2017 2:28 PM
  • Hi Alberto,

    I am afraid I cannot share the query. But it seems to be a general problem. It seems like our queries run much slower when we execute them using external data source vs. directly on the server. Is this normal? Is there something wrong with the architecture? 

    And one question - I cannot find much info about Azure performance best practices, query tuning and querying sharded databases? Is there some resources online I could look at?

    Thanks!

    Monday, June 12, 2017 11:16 PM
  • Hello,

    The following article shows you how to improve performance of cross database queries:

    https://azure.microsoft.com/en-us/blog/querying-remote-databases-in-azure-sql-db/?v=17.23h



    Hope this helps.



    Regards,

    Alberto Morillo
    SQLCoffee.com

    Tuesday, June 13, 2017 4:19 AM
  • Hello,

    Getting involved memory-optimized tables may help.

    https://docs.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/cross-database-queries



    Hope this helps.



    Regards,

    Alberto Morillo
    SQLCoffee.com

    Wednesday, June 14, 2017 3:57 PM
  •  I have the same question. Is there some solutions?
    Wednesday, November 08, 2017 7:21 AM
  • I remember when it was introduced there was a difference between Standard and Premium pricing tier.

    Since Standard pricing tier has less DTU, it usually takes approx.. 1 min to initialize the elastic query for the first time and then it will run faster for the consecutive queries however I believe after some point the cache is cleared so again you may face the same issue. However in premium this doesn't happen.

    I read Microsoft is working on improving initializing latency for Standard tiers, but I'm not sure whether that is done. Make sure you run latest version of database [V14] and then see if that fixes it in Standard edition or else you might need to upgrade to get rid of initialization latency.


    Mark as ANSWER if I helped you today :-) www.sql-articles.com

    Friday, November 10, 2017 8:49 AM