Linux – Fault tolerant RAID6 / RAID10 design for home server – not performance critical

home-serverlinuxmdadmraid-10

I want to balance disk space against fault tolerance. I would like fault tolerance to be able to handle multiple disk failures (as I may not be able to afford replacements disks or have the time for weeks possibly)

The main purpose of the home Linux server shall be a place to backup other machines too, store/share large amounts of data. So the data will be re-creatable most of the time. Storage of media (ie backup of my DVD's, Cd's etc)

I had a RAID10 array of 6 x 1.5TB but due to operator incompetence and laziness, I now have 6 empty disks 🙂 and a clean start.

One of the disks is definitely failing (over 55 error from smartctl and short and long test errors) so it will be sent away for warranty replacement – but I would still like to include it in the final array. Lets call the bad disk /dev/sdc

Machine has 6 sata ports and 2 IDE (with 2 CD drives). Dual Quad core Xeon, 16Gb RAM. And really 1 user most of the time.

[NB I may be able to remove a CD drive and add a 7th IDE disk just for the OS only to separate data/OS] Otherwise plan is to save 100GB partition and put the OS there (maybe mirror between disks)

Option A)
RAID 6 sd[abdef], sdc as hot-spare (but gets sent for replacement soon) raid-devices=5 spare=1

Option B)
RAID 6 sd[abdef], sdc as missing (but gets sent for replacement soon)
raid-devices=6 spare=0

Option C) RAID 10 sd[abdef], sdc as hot-spare (but gets sent for replacement soon) raid-devices=5 spare=1

Option D) RAID 10 sd[abdef], sdc as missing (but gets sent for replacement soon) raid-devices=6 spare=0

Option A is seeming to be the best at the moment because I'll get 4.5TB of space and room for 3 disk failures if I calculate it correctly.

This will all be done with mdadm soft raid.

Which do you recommend or are there better possibilities layouts that I could use?

Best Answer

  • Raid10 may not be able to handle two disks failing, so raid6 would be more reliable. It also gives more storage capacity. Option A only can handle the third failure if it happens after the rebuild onto the hot spare completes, but you only get 50% of the total space.

    The odds of having 3 out of 6 disks fail are very slim, so I would be inclined to think that the loss of space ( and throughput ) is not worth having a hot spare. What you could do as a compromise is run without a hot spare, and in the event that you have a failure, and know you can not replace it for some time, and are worried that you might have two more failures, you could then reshape the array to a 5 disk raid6 and be back to being able to handle two more failures.

    This requires that you use a filesystem that you can shrink since reshaping the array will reduce its capacity.

  • Related Question