Limitations of SQL Server histogram based row count estimates

Recently, I came across a situation where a query with n-joins was occasionally slow. Here’s how I tracked down the problem…

Profile Overall Application #

First, profile the application with Glimpse. This gives a baseline for expected performance. Preliminary stats show total request time around 6 seconds, the server action method was taking around 1.5 seconds to process, and database queries of 250ms.

After that I replicated the issue, which was a database timeout. Glimpse doesn’t load on error pages in our application by default so I moved on to the next level of profiling.

Profile Database Access #

SQL Server Profiler. By setting it to only capture completed SQL batches and stored procedures, I isolated the offending query.

Running the procedure in SSMS and including the actual execution plan showed a time to complete of 2.5 minutes (well over the 30 second timeout window). Problem found!

My next move was to break the query apart into smaller steps to see which parts were taking the longest. After converting just one section out into populating a temp table, the total time dropped to 5 seconds! Time to think.

Hypothesis #

Hypothesis: SQL Server uses histograms to predict the number of rows expected from each set operation. Based on the size of the result set, SQL Server will choose the optimal query plan and sequence of steps. When you populate a temp table SQL Server gets a 100% accurate estimate for number of rows. What if in the original query SQL Server was working with terrible estimates for the number of rows and thus doing expensive lookups?

Confirm or Reject Hypothesis #

Time to put it to the test! Here’s part of the execution plan from the original query:
2016-08-16 16_33_34-Settings.png
Confirmed: estimated rows: 1, actual rows: 6,982. This will definitely be enough to coax SQL Server into a plan with many expensive nested loop lookups.

Conclusion #

Always profile the application to efficiently find the problem. Test the application design under expected and unexpected load profiles. Knowing the quirks of your tech stack can help fine-tune your intuition and enable you to quickly narrow in on isolated performance problems.

More Reading #


Now read this

Why do my views take so long to render?

I’ve optimized my database queries with indexes and am caching output for 10 minutes at a time, but occasionally I have to recalculate the page, and it’s taking way longer than I like. Tools like firebug only show that the request spent... Continue →