Sql-server – Proactively Determining Bad Plans/Poor Performance at a Query Level

sql serversql server 2014

tldr; I'm looking for ways to monitor server performance to alert when individual, small, fast queries get a bad plan that will in aggregate hurt the performance of the business.

Over the past week we've suddenly had a few stored procedures grab bad query plans, and drag the server to slow downs. It hasn't been the same query, but it has involved the same tables for one group (which I imagined was a statistics problem), and when we FREEPROCCACHE'd the server another query got a bad plan, and has picked it up at random again.

Being the guy who is held responsible for the server, I'm being asked to do things like ensure no query will ever get a bad plan again, or that I react to it before the users notice the performance degradation.

I have monitoring tools, I'm monitoring long running queries – but this is a case of many queries that run inside 10ms changing to taking 5s, and that's not immediately noticeable, it barely even blips the performance monitoring CPU and memory-wise (apart from more frequent buffer cache dumps perhaps).

For any individual known query I can set up a baseline and an agent job that alerts me, or FREEPROCCACHEs its own plan when it gets a bad plan – but what is my best way to determine this for a busy server?

Should I be baselining run stats for all queries and checking the query stats for all? Can BlitzCache help me out? Will this sort of checking be too intrusive on the performance of the server?

Best Answer

Instead of developing a solution yourself you could consider upgrading to SQL Server 2017 and trying out the automatic tuning feature. Reproducing part of the documentation:

Database Engine automatically detects any potential plan choice regression including the plan that should be used instead of the wrong plan. When the Database Engine applies the last known good plan, it automatically monitors the performance of the forced plan. If the forced plan is not better than the regressed plan, the new plan will be unforced and the Database Engine will compile a new plan. If Database Engine verifies that the forced plan is better than regressed one, the forced plan will be retained until a recompile (for example, on next statistics or schema change) if it is better than the regressed plan.

Note: Any plans auto forced do not persist on a restart of the SQL Server instance.

This could make it easier to meet the "react to it before the users notice the performance degradation" option.