NVMe ssd: Why is 4k writing faster than reading

performancessd

I have a Samsung 960 Pro 512 GB SSD on NVMe with PCIe Gen.3×4 running. I use the Samsung NVMe Driver 2.0.0.1607. The SSD is running fine. However, I don't understand why the writing of 4k is faster than the reading of 4k. I am using AS Benchmark:

enter image description here

It is a factor of 3! Is there something wrong (with my system or AS Benchmark) or is this normal?

Best Answer

4k reads are going to be about the hardest thing the drive can do. They are amongst the smallest block sizes the drive is going to be able to handle, and there's no way for the drive to preload large quantities of data, in fact they are probably quite inefficient if the drive load-ahead logic is intending to read anything larger than 4kb.

"Normal" drive reads are more likely to be larger than 4kb as there are very few files that are that small, and even the page file is likely to be read in large chunks as it would be odd for a program to have "only" 4KB of memory paged out. This means that any preloading that the drive tries to do will actually penalise the drive throughput.

4K reads might pass through the drive buffer, but the "random" part of the test makes them entirely unpredictable. The controller won't know when the drive might need the more usual "large" reads again.

4K writes on the other hand can be buffered, queued, and written out sequentially in an efficient manner. The drive buffer can do a lot of the catch-and-write work that it was designed for, and the wear leveller might even allocate all those 4K writes to the same drive erase block, occasionally turning what is a 4K "random" write into something closer to a sequential write.

In fact I suspect that this is what is happening in the "4K-64Thrd" writes, the "64-Thrd" is apparently using a large queue depth, thus signalling to the drive that it has a large amount of data to read or write. This triggers a lot of clustering of writes and so approaches the sequential write speed of the drive. There is still an overhead to performing a 4K write, but now you are fully exposing the potential of the buffer. In the Read version of the test the drive controller, now recognising that it is under very constant heavy load, stops preloading data, possibly avoids the buffer and instead switches to a "raw" read mode, again approaching the sequential read speed.

Basically the drive controller can do something to make a 4K write more efficient, especially if a cluster of them arrive at a similar time, while it can't do anything to make a single 4K read more efficient, especially if it is trying to optimise dataflow by pre-loading data into the cache.