Bloggin in the UK RSS 2.0
 Saturday, July 17, 2010
Yesterday I attended my first UK SQL Server User Group meeting.
Event web page

Topics

- SQL Server I/O - Tony Rogerson
- Understanding how Sql performs Joins with NestedLoops, Merge & Hash - Ami Levin
- DbSophic QURE - Workload Tuning - Ami Levin
- SQL Streaminsight - Allan Mitchell - twitter: @allanSQLIS

I was most interested in the first topic but missed the talk as I did not arrive till 18:30 and the evening started @ 17:30. I don't finish work till 18:00 and had to get across town. Of the three talks that I caught the content and delivery was at a very high standard.

Understanding how Sql Server performs Joins with NestedLoops, Merge & Hash - Ami Levin

If you write a lot of SQL queries or are often called upon to tune SQL Server then you should learn how/why the query optimizer selects from three physical operators to perform joins and how sometime it can be caught out. This talk was given by Ami Levin who is a SQL MVP and works for DBSophic. It was really interesting and the first time I have had any understanding of what these terms that appear in the execution plan mean. One really interesting thing that Ami pointed out is that sometimes the method Sql uses to estimate the number of rows and actual number of rows will differ by a large amount, this can result in a sub optimal operator being selected to perform the join. If when tuning you spot this sort of discrepancy further investigation is warranted. The following is a scenario that Ami gave when this can occur:

SELECT *
FROM SalesLT.SalesOrderHeader SOH
INNER JOIN SalesLT.SalesOrderDetail SOD ON SOH.SalesOrderID = SOD.SalesOrderID
INNER JOIN SalesLT.Product P ON P.ProductID = SOD.ProductID
WHERE SOH.TotalDue > 1500
AND SOD.UnitPrice > 300
AND P.StandardCost > 300

So in the SQL above the optimizer will use statistics to estimate roughly how many rows it will return by looking at each table in turn and estimating how many rows each where clause will return. In this case say that each Where clause will return ~1% of the rows in a table containing 100k rows so the optimizer estimates the number of rows to be 1% of 100,000 = 1000 rows then, 1% of 1000 = 10, then 1% of 10 < 1. So the optimizer thinks 1 row or less will be in the results set. Unfortunately what the optimizer cannot know is that the where clauses are releated to the data, in this case ORDERS that cost more and have more expensive unit cost and a higher so the actual number of rows returned will be nearer to 1000, hence why sometimes the optimizer can choose a sub optimal physical operator to perform a join.

If you think the wrong operator is being chosen you can use the OPTION keyword this article discusses in more detail:

Saturday, July 17, 2010 12:01:54 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] -

Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):

Live Comment Preview
Archive
<September 2010>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2010
Charlie Barker
Sign In
Statistics
Total Posts: 172
This Year: 14
This Month: 1
This Week: 0
Comments: 77
Themes
Pick a theme:
All Content © 2010, Charlie Barker
DasBlog theme 'Business' created by Christoph De Baene (delarou)