Mysql – Can a query that executes quickly still be a poorly written query?


This is more of a theoretical question and here is some background. Our system uses MySQL and was experiencing some slowdowns late in the day. Running SHOW FULL PROCESSLIST showed me around 10 rows where the state was "Sending data" and the time value was ranging from 400 to 800. For all of those rows, it was the same exact query. As time passed, some would finish and some would run again because it's a query that runs on a user's dashboard. If I run the query in MySQL Workbench, the query executes pretty quickly, around 0.27 seconds on average.

So I'm wondering how to troubleshoot something like this and figure out where the issue came from. Could a query execute quickly, but still be inefficient in some other area? Or could there have been some other issue aside from the query itself? Nothing was showing up in the slow-query log. What would cause a query, which seems to run quickly, have those kinds of stats in the process list? I thought it was unusual for everything to be working fine throughout the peak times, and then have issues at the end of the day when less users were active. But I'm sure there's a good explanation somewhere. IT also reported that the server experienced high CPU usage around that time.

About the query itself, I'm sure I can optimize it further. I think the biggest thing about it is that it uses a case statement on a join, which I haven't had to do before. The idea is basically a conditional join because we need to join a table based on certain values. Since MySQL doesn't seem to support recursion, this was the path we took. But again, the query itself at least executes quickly in Workbench.

If anyone has any insight to this, I'd appreciate any help.

Best Answer

Simply put: yes. Something being fast does not mean it is necessary good as there are other factors, some objective and some very subjective, to consider.

Some examples:

  • The query could be unnecessarily complex: doing more work than needed to get the same result as a more optimal version. Maybe what it is doing in ~0.27s could actually be done in ~0.07s a different way around?
  • The query could be difficult to maintain, either because of excess complexity as above or due to bad coding conventions and naming conventions.
  • The query could be fast now, with current data, but have awful complexity growth as your data grows.
  • The query could be fast in itself, but for some reason cause too many locks to be held which will make it slow when run multiple times concurrently or make other queries less unresponsive while it runs.
  • The query could be written in a way that causes spills of temporary data to disk, which due to IO contention can have similar effects by way of imposing concurrency limits as holding locks.
  • "Quickly" itself is a very subjective term. Is 0.27s really quick? If the query is run tens of times a minute it could be using hours of CPU resource or IO bandwidth each day that could otherwise be used by other processes (on fixed hardware) or is imposing a cost (on virtual/cloud hardware where you are paying everything that is used).