We recently separated out our tempdb files to a new SSD and have started seeing:
5348 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [T:\tempdb\tempdb4.ndf] .
We have multiple occurrences of this error. We did not see the errors when tempdb was back on its original RAID 5 home. I followed a tutorial on SQLIO and I think the SSD should be much faster, when doing 8kb random read/writes, than the previous RAID 5 disks. So why are we seeing these errors?
Also, by way of more proof that not all is well, the batch file we run overnight (which is when these errors occur) is taking 7 hours. It took 6.25 hours on the old disks.
The disks sit in a directly attached array. The RAID5 for data, RAID 10 for logs and a spare slot that we used for the SSD. The RAID 5 and SSD are formatted for 64kb block size. The log is incorrectly set to 4KB block size (I know – will fix when I get a chance).
These are the results of SQLIO:
T drive (ssd)
Ios= 8KB random writing,IOs/sec=31847.48,MBs/sec=248.8
Ios= 8KB random reading,IOs/sec=76391.66,MBs/sec=596.8
S drive (RAID 5)
Ios= 8KB random writing,IOs/sec=2601.3,MBs/sec=20.32
Ios= 8KB random reading,IOs/sec=3138.45,MBs/sec=24.51
For 64K sequential read/writes they were around the same.
Tempdb is split into 4 1.5Gb files (this is the same before and after the move).
SQL server 2012 is patched to SP3.
Do you have any idea what could be causing all these I/O errors getting reported by SQL Server?
Is it possibly an Array or HBA driver issue? Does a single disk added into a spare slot on an directly attached array needs careful configuration in terms of cache?