You can divide the task of "executing a query process around T-SQL" into two groups: (1) Processes that occur in the relational engine, and (2) Processes that occur in the storage engine.
The relational engine is responsible for several processes: (1) parsing the query text, (2) processing the information using the Query Optimizer, which is responsible to generate the Execution plan, and (3) sending the plan (in a binary format) to the storage engine.
The storage engine retrieves or updates the underlying data. The storage engine includes processes such as locking, index maintenance, and transactions.
This article discusses the execution plan, which is generated by the Query Optimizer, while executing queries with Join operations. We will focus on one specific issue: in what order our Join operations will be executed, and how we can control it. We will show you techniques that you can put in your toolkit!
We will discuss several different specific cases, from the simple query to more complex situations where the execution plan will depend on a lot of variables, such as the table's size and number of rows, the availability of statistics, indexes, key distribution, etc. In the last paragraph, we will try to summarize the golden rules that explain most of the common issues that you'll face.
For this demonstration, we will create three tables. We will populate two tables with a very low number of rows (100 and 500) and the third table with a large number of rows (1000000).
/************************************************************/
/*********
Join
Order
,
By
Ronen Ariely *********/
USE master ;
GO
CREATE
DATABASE
JoinOrderDB
ON
(
NAME
= Sales_dat,
FILENAME =
'G:\SQL\Databases\JoinOrderDB.mdf'
SIZE
= 2 GB,
MAXSIZE = 3GB,
FILEGROWTH = 250MB )
LOG
= Sales_log,
'G:\SQL\Databases\JoinOrderDB.ldf'
= 1GB,
MAXSIZE = 2GB,
FILEGROWTH = 100MB ) ;
use JoinOrderDB
/**************************************************** DDL -
Create
tables */
IF OBJECT_ID(
'T1'
'U'
)
IS
NOT
NULL
DROP
TABLE
T1;
'T2'
T2;
'T3'
T3;
T1 (
ID
INT
IDENTITY
CONSTRAINT
PK_T1
PRIMARY
KEY
MyValue
bigint
T2 (
PK_T2
T3 (
PK_T3
/**************************************************** DML - Populate Tables*/
SET
NOCOUNT
;
insert
T1 (MyValue)
select
top
100
CAST
(CHECKSUM(NEWID())
AS
from
dbo.Numbers
T2 (MyValue)
500
T3 (MyValue)
1000000
Let's examine the influence of the Join order in the query, to the Join operation order in the execution plan. First, we will check the JOIN using two small tables.
Please execute the next queries and examine the Execution plan.
SELECT
T1.ID, T2.MyValue
FROM
T2
INNER
JOIN
T1
T1.ID = T2.ID
>> You can repeat the execution of each query in this article with the setting "SET SHOWPLAN_ALL ON". This causes Microsoft SQL Server not to execute Transact-SQL statements. Instead, SQL Server returns detailed information about how the statements are executed and provides estimates of the resource requirements for the statements.
Notice that while joining the two small tables of the same size order, the join order in the execution plan fits to the join order in our query. Will it be the same if we join tables that have a substantial difference in their amount of rows?
T1.ID, T3.MyValue
T3
T1.ID = T3.ID
It's obvious that in this case the Query Optimizer chooses to use the small table first, and in both cases we get the same Execution plan, regardless of the join order in the query. Let's now join three tables together and examine the execution plan:
T1.MyValue, T2.MyValue, T3.MyValue
join
on
T2.ID = T3.ID
Notice that the Query Optimizer has changed the join order from T3->T2->T1 into T1->T2->T3. This behavior fits our preview conclusion in Demo_01, that the execution order fits to the number of rows in our tables and our join operations are ordered from the smallest to the largest SET.
The query optimizer should be intelligent enough to figure out the best order in which to execute the steps necessary to translate your logical instructions into a physical result. It uses the stored statistics that the database maintains on each table to determine what to do (what type of join logic to use for example), as well as what order to perform the operations in, in order to minimize Disk IOs and the processing costs.
In some cases, we will need to have the Query Optimizer keep our query order of operations.
The query optimizer uses different rules when exploring different plans to evaluate. For example: SelectToFilter, AppIdxToApp , JoinToIndexOnTheFly, JoinCommute, and so on. We can turn rules off using the rule name and the QUERYRULEOFF hint. The rule called JoinCommute is responsible to the join order in our case, and we can actually turn it off.
The execution plan doesn’t expose the ability to selectively enable or disable rules available to the optimizer. We can, however, use a couple of undocumented DBCC commands and the undocumented dynamic management view (sys.dm_exec_query_transformation_stats) to explore the way the optimizer uses rules. (Check the "Resources and More Information" section, Inside the Optimizer: Constructing a Plan, by Paul White). The execution plan doesn’t expose the ability to selectively enable or disable rules available to the optimizer.
It specifies that the join order indicated by the query syntax is preserved during query optimization. Using FORCE ORDER does not affect a possible role-reversal behavior of the query optimizer. We can examine the use of the "FORCE ORDER" hint on our queries.
Let's execute the same query, which uses the big table first, but in this case we will use the "FORCE ORDER" hint:
OPTION
FORCE
ORDER
Using Force Order in our three-tables join:
As expected, using the Force Order hint, both queries executed in the order that the queries were written.
In this section, we are going to examine the behavior of three tables in relation to one another. We will start with DDL + DML, and then we will move to discuss several JOIN cases related to those tables.
'Entities'
'Countries'
'EntityOrders'
Entities
EID
ENAME NVARCHAR(10)
Countries
CID
CNAME
VARCHAR
(50)
EntityOrders(
EOID
EntityID
CountyID
EOPrice
DECIMAL
(18,3)
);
-- Foreign Key Constraints
ALTER
EntityOrders
ADD
FK_Entity
FOREIGN
(EntityID)
REFERENCES
Entities(EID);
FK_Country
(CountyID)
Countries(CID);
INSERT
INTO
Entities (ENAME)
VALUES
'E-1'
),(
'E-2'
'E-3'
Countries (CNAME)
'IL'
), (
'US'
'FR'
EntityOrders(EntityID, CountyID,EOPrice)
(1, 1, 20),(2, 3, 200);
Query 1: Execute a simple Join query using the relations:
-- Here we try to use: [Entities] JOIN [EntityOrders] JOIN [Countries]
*
a
b
a.EID = b.EntityID
c
b.CountyID = c.CID;
-- but we get: [EntityOrders] JOIN [Entities] JOIN [Countries]
Query 2: Execute the same Join query with a QUERYRULEOFF JOINCOMMUTE hint:
b.CountyID = c.CID
( QUERYRULEOFF JoinCommute);
-- Now we get: [Entities] JOIN [EntityOrders] JOIN [Countries]
Query 3: Execute the same Join query with a FORCE ORDER hint:
* The execution plan is the same as in the preview query.
Query 4: Execute the same Join query with a JOIN type hint:
(MERGE
-- Now we get: [Countries] JOIN [Entities] JOIN [EntityOrders]
Query 5+6: Execute the same Join query using Brackets and Filter to control the JOIN order:
as
where
c.CID < 10
) T
a.EID = T.EntityID
-- Now we get: [EntityOrders] JOIN [Countries] JOIN [Entities]
* in this query we use a filter which actually do not filter any row out. It is only used for re-ordering the JOIN order.
Same solution can be done using this query:
Countries c
b.CountyID = T.CID
Query 7: Execute the same Join query using GROUP BY to control the JOIN order:
MIN
(c.CID) CID,CNAME
group
by
c.CNAME
We can force the optimizer to process a derived table by simply placing a GROUP BY in your derived table. The optimizer is then obligated to resolve the GROUP BY aggregate before it can consider resolving the join between the two tables.
* We can find many more ways to influence the order of operations. As long as we control the understanding of order of operations on the server, then we have a way to change this order. To start out, it would be good to look up the order in which Query Optimizer basically build the execution plan:
* This article is based on SQL Server 2014 Enterprise edition, using a small virtual machine with 2 CPUs and 4 GB of shared memory, on Windows Server 2012 R2.
* We can't always predict the join order, and you might get different results from those shown in this article!
We saw that the execution plan depends on lot of variables, such as table size and number of rows, availability of statistics, indexes, key distribution, etc. Moreover, we have a lot of options to change order of operation or even the whole Execution plan by using hints and/or writing the same query with a slightly different format.
This list summarize several options we can put in our toolkit for future use:
Since the Execution plan depended on huge amount of variables which part of them we can control or know but lot of them we have no way to control or even know about, this is impossible to write rules that will be applied always. Remember that same query on different database can result in different execution plan, and different order of joins. We can however write some General Rule of Thumb (Golden rules) which work in most cases: