Yesterday I attended my first UK SQL Server User Group meeting.
Event web page
- SQL Server I/O – Tony Rogerson
- Understanding how Sql performs Joins with NestedLoops, Merge & Hash – Ami Levin
- DbSophic QURE – Workload Tuning – Ami Levin
- SQL Streaminsight – Allan Mitchell – twitter: @allanSQLIS
I was most interested in the first topic but missed the talk as I did not arrive till 18:30 and the evening started @ 17:30. I don’t finish work till 18:00 and had to get across town. Of the three talks that I caught the content and delivery was at a very high standard.
Understanding how Sql Server performs Joins with NestedLoops, Merge & Hash – Ami Levin
If you write a lot of SQL queries or are often called upon to tune SQL Server then you should learn how/why the query optimizer selects from three physical operators to perform joins and how sometime it can be caught out. This talk was given by Ami Levin who is a SQL MVP and works for DBSophic. It was really interesting and the first time I have had any understanding of what these terms that appear in the execution plan mean. One really interesting thing that Ami pointed out is that sometimes the method Sql uses to estimate the number of rows and actual number of rows will differ by a large amount, this can result in a sub optimal operator being selected to perform the join. If when tuning you spot this sort of discrepancy further investigation is warranted. The following is a scenario that Ami gave when this can occur:
FROM SalesLT.SalesOrderHeader SOH
INNER JOIN SalesLT.SalesOrderDetail SOD ON SOH.SalesOrderID = SOD.SalesOrderID
INNER JOIN SalesLT.Product P ON P.ProductID = SOD.ProductID
WHERE SOH.TotalDue > 1500
AND SOD.UnitPrice > 300
AND P.StandardCost > 300
So in the SQL above the optimizer will use statistics to estimate roughly how many rows it will return by looking at each table in turn and estimating how many rows each where clause will return. In this case say that each Where clause will return ~1% of the rows in a table containing 100k rows so the optimizer estimates the number of rows to be 1% of 100,000 = 1000 rows then, 1% of 1000 = 10, then 1% of 10 < 1. So the optimizer thinks 1 row or less will be in the results set. Unfortunately what the optimizer cannot know is that the where clauses are releated to the data, in this case ORDERS that cost more and have more expensive unit cost and a higher so the actual number of rows returned will be nearer to 1000, hence why sometimes the optimizer can choose a sub optimal physical operator to perform a join.
If you think the wrong operator is being chosen you can use the OPTION keyword this article discusses in more detail: