none
Query optimization with multi-tenant database?

    Question

  • I am converting a database to work in a multi-tenant environment.  I have read the articles on how to set this up with the modified connection code, and the Row Level Security process, and adding TenantId to each table and the primary keys.

    The articles then give some simple samples of SELECT commands to show how the RLS is transparent to the application programs, i.e. the application does not need to add "TenantId" into existing SELECT (or update or delete) statements for the RLS to work.  That is all well and good.

    However, if we have a table with a large number of rows, belonging to multiple tenants, a simple SELECT col1 FROM tablename WHERE col2 = 'abcd' is always going to do a full table scan.  As far as I can tell.

    Thus it seems all of the application programs indeed must be updated to add TenantId as a WHERE condition into every database statement to be able to utilize any indexes.

    I am looking for confirmation of this, as I have not found any article that really pinpoints this issue.  Thanks!

    Friday, April 21, 2017 5:49 PM

All replies

  • Hello,



    Instead of scans you may find nested loops or semi-joins (where the matching for the tenant occurs) are added on the query plans.

    Hope this helps.



    Regards,

    Alberto Morillo
    SQLCoffee.com

    Friday, April 21, 2017 7:00 PM
  • OK.  But still, it seems Joins will not be optimized unless and until all queries throughout the application are changed to include TenantId in Where clauses and/or Join..On clauses.  Else the indexes and/or cluster keys with TenantId as leading column seem not likely to be utilized.
    Friday, April 21, 2017 7:57 PM
  • Hello,



    My suggestion is to create a lab and see how the query plans behave in your scenario. Take in consideration query plans change when run on the different tiers and depending of Azure features enabled also.

    Hope this helps.



    Regards,

    Alberto Morillo
    SQLCoffee.com


    Friday, April 21, 2017 8:09 PM
  • If this is the article you read:

    https://docs.microsoft.com/en-us/sql/relational-databases/security/row-level-security

    then, hmm, ...

    But I wonder if that example is the best way to go.  If you have a multi-tenant database then you probably - certainly! - have a field that determines which tenant a row belongs to.  That field should be mentioned in your function, and then it should be used properly to filter the data efficiently.

    Josh

    Monday, April 24, 2017 2:19 AM
  • Yes - that article among many.  The SELECT examples are always simplistic, and leave off any reference to TenantId.  I guess they are trying to emphasize that this RLS implementation is "transparent" to the application thus it needs no edits to any SQL statement.

    But any table with large row count is going to need TenantID added to every SQL command.  At least it looks that way.  However none of the writers seem to mention this.

    As soon as we get our multi-tenant Azure account set up I will do testing but I am pretty sure my assumption is correct.

    I was also thinking that a new primary key for each table of (originalpkeycol, TenantId) might be efficient as it puts the more discriminating column first, but not having TenantId first may cause other issues.  have to see what happens.

    Thanks for the reply.

    Jack Hummer

    Monday, April 24, 2017 7:39 PM