What is the correct order of execution of Nested joins

Could you explain to me the correct order of execution of the “from clause” in this query.

I need to recereate it in PowerBiPowerQuery and PowerBiDax and the both PowerBi tables differ from sql server table.

Select ....
From (
    tabA full outer join tabB 
        on (tabA.column1=tabB.Column1) and (tabA.column2=tabB.column2)
) 
inner join tabC on tabB.Column1=tabC.Column1

As far as I understand,

Step1: tabB is joined to tabA by column1 and column2 using full outer join

Step2: after that tabC is joined to the above combined table with column1 and column1

Is it correct?

Here is the original code:

Select 
InvWarehouse.StockCode,
InvWarehouse.UnitCost,
InvInspect.Grn,
InvInspect.Lot,
InvMaster.Description,
InvMaster.Supplier
From (InvWarehouse InvWarehouse full outer join InvInspect InvInspect on (InvWarehouse.StockCode=InvInspect.StockCode) and (InvWarehouse.Warehouse=InvInspect.Warehouse)) inner join InvMaster InvMaster on InvInspect.StockCode=InvMaster.StockCode

  • 4

    Do not post images of your code. Post the actual code/sql using the appropriate “` tags for formatting.

    – 

  • It helps if you explain your understanding, for instance, why are you asking the question?

    – 

  • I changed my post. Thank you.

    – 

  • What’s the definition of tabC.Column1?

    – 

As far as I understand,

Step1: tabB is joined to tabA by column1 and column2 using full outer join

Step2: after that tabC is joined to the above combined table with column1 and column1

Practically, the database is free to re-order things in ways that may be more efficient, as long as it doesn’t change the results. However, this is almost correct when thinking logically/semantically; you can generally assume or act as if the database does things in that order.

The “almost” comes into play because the INNER JOIN also acts on one of the columns from the prior FULL JOIN. This has the practical effect of turning the earlier FULL JOIN into another INNER JOIN (or at least a RIGHT JOIN), because the INNER JOIN in the code will now exclude any rows from the FULL JOIN you might otherwise have kept with NULL values for the unmatched table.

Additionally, you don’t really need the parentheses around the joins, or those particular conditions, and in SQL the excess parentheses can tend to confuse things; save them for when you really need them, like with OR conditions. That is, the code in the question is equivalent to this, which is much easier to read and change:

SELECT ....
FROM tabA a
FULL JOIN tabB b ON a.column1=b.Column1 AND a.column2=b.column2
INNER JOIN tabC c ON b.Column1=c.Column1

In SQL the order of precedence of criteria is LEFT to RIGHT, TOP to BOTTOM. That doesn’t always mean that the clauses are strictly evaluated in that order, but the rules for matching will strictly adhere to that precedence. So if a row is explicitly excluded by the first join, it cannot be included by a following join statement

Step1: tabB is joined to tabA by column1 and column2 using full outer join

Full Outer join means that for all rows from tabA if there is a match in tabB on column1 AND column2 then the values from tabB will be returned, otherwise the tabB column values will be null, we then do the same for all the rows in tabB that do not have a match on both column1 and column2, except that the values for tabA will be null

If there are no matches, the number of rows will equal the number of rows in both tabA and tabB. If there are matches, then the number of rows is equal to the sum of row in both tables, minus the number of matched rows.

inner join tabC on tabB.Column1=tabC.Column1

Now the next part, the inner join constrains all of this. Because it is using the output from tabB, then final result will only be the rows from tabB that match on Column1. Those rows will only have values from tabA if tabA matched Column1 AND Column2 (to tabB)

The FULL OUTER JOIN in this case is likely to be redundant, unless you need to support all the null values for the unmatched rows or this is a deliberate performance hack.

You might consider executing the INNER JOIN first by re-arranging the order of the joins. As a general rule performance is often better by filtering by the most restrictive clause first. So if the INNER JOIN will match the least amount of rows, you might want that first to optimise the execution plan. However the FULL JOIN in this case has additional criteria and looks like a good candidate to be more restrictive (aka match less rows) than the INNER JOIN so this might be a performance hack.

The only way to know if to try it out and compare the execution timings and plans.

If you rearrange the query the result is the INNER JOIN first and a LEFT OUTER JOIN to include any matching data from tabA:

...
FROM tabB 
INNER JOIN tabC ON tabB.Column1 = tabC.Column1
LEFT OUTER JOIN tabA ON tabB.column1=tabA.Column1 AND tabB.column2=tabA.column2

The optimizer can (note, not “will”) rearrange the join order both to satisfy the logic of the query as well as make things run faster (where possible). In order to know exactly the order of operations, you can’t simply look at the logic. Instead, use the execution plan (estimated plan is perfect for this scenario) to see exactly the order of operations. No more guessing. Same thing is going to happen to any queries run directly against the database from PowerBI. The optimizer can change the join order.

A full outer join is going to take all rows that match your join criteria, all the rows from the left table that do not match, and all the rows from the right table that do not match. Simply put, it matches the rows that it CAN, but it doesn’t exclude anything.

Thus, TabB Full Outer Join TabA ON some criteria returns all rows from both tables, matching the ones that it can.

The Inner Join will include rows from TabB that have a match with TabC. This is exclusionary. Take the following tables for example…

Table A

Col1 Col2
A 1
B 2
C 3

Table B

Col1 Col2
A 1
B 2
D 4

Table C

Col1 Col2
D 4
E 5
E 6

TableA FULL OUTER JOIN TableB ON col1 AND col2 yields

A.Col1 A.Col2 B.Col1 B.Col2
A 1 A 1
B 2 B 2
C 3
D 4

Now you Inner join TableC ON tableB.Col1 = tableC.col1

A.Col1 A.Col2 B.Col1 B.Col2 C.col1 C.col2
A 1 A 1
B 2 B 2
C 3
D 4 D 4

Now take the excluded rows out and your resulting dataset is…

A.Col1 A.Col2 B.Col1 B.Col2 C.col1 C.col2
D 4 D 4

Leave a Comment