I'm having no end of trouble with one of our production SQL boxes – users are experiencing massive drop offs in performance and this is having severe impacts on the business. I've been investigating mainly from a SQL/dba perspective and using sp_BlitzFirst I've been seeing intermittent reports of very slow read/write times (typically ~100-300ms but at points as high as 1000ms+).
The real head scratcher is that after getting our infrastructure guys involved they can't see anything amiss from their end.
The server is a virtual machine (Windows Server 2008 R2, SQL 2008 R2 Standard with 12 cores and 32Gb of RAM providing the databases for a Dynamics CRM 4.0 instance and another web application that is integrated with CRM) running under vmware with storage being provided by a Dell EqualLogic SAN and they are seeing times more in the 30-40ms range. Now it's entirely possible that the samples taken by BlitzFirst and vmware are simply not coinciding and causing the disparity but if not could there be any other explanation? I've used perfmon on the server and while the averages for the disk read/write are pretty reasonable I see definite spikes that are much higher then vmware is reporting – much closer to the figures that BlitzFirst is reporting. Is there anything else I should be looking at?
I'm reasonably certain that the problems BlitzFirst is reporting are real as they always occur at the same points the users experience major problems. To be honest the application performance of this CRM has always been pretty crap but even with a pretty low bar it's still falling well short.