Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's the advantage of using ZFS RAIDZ over mdadm? I thought that mdadm was more flexible in growing your RAID array.


I have been doing lots of research on this recently and here is the main thing that makes ZFS win every time:

When you have a RAID of any kind you need to periodically scrub it, meaning compare data on each drive byte by byte to all other drives (let's assume we are talking just about mirroring). So if you have two drives in an mdadm array and the scrubbing process finds that a block differs from drive A to drive B, and neither drive reports an error, then the scrubber simply takes the block from the highest numbered drive and makes that the correct data, copying it to the other drive. What's worse is that even if you use 3 or more drives, Linux software RAID does the same thing, despite having more info available. On the other hand, ZFS does the scrubbing by checksums, so it knows which drive has the correct copy of the block.

How often does this happen? According to what I have been reading, without ECC RAM and without ZFS, your machines get roughly one corrupt bit per day. In other words, that could be a few corrupt files per week.

My conclusion is that as I am building my NAS, I want ECC RAM and ZFS for things I cannot easily replicate.


Just to make it clear. raid-5/6 mdadm arrays does the right thing when repairing/checking/scrubbing data. It writes the correct data if one of the drives has a corrupted block.

https://raid.wiki.kernel.org/index.php/RAID_Administration

  How often does this happen? According to what I have been reading, without ECC RAM and without ZFS, your machines get roughly one corrupt bit per day. In other words, that could be a few corrupt files per week.
This is complete nonsense without more data to back it up.


> Just to make it clear. raid-5/6 mdadm arrays does the right thing when repairing/checking/scrubbing data.

This is inherent to RAID-5/6. Doesn't really have anything to do with mdadm other than mdadm implements RAID-5/6. And now you probably have a write hole.


Here is one of the sources: http://linas.org/linux/raid.html


Just to make it clear: on raid 5/6 parity isn't checked on reads, so to get your "right thing when repairing/checking/scrubbing data" you'd have to do a full parity rebuild. This isn't anything like what ZFS does.


Thank you, this was very helpful!


The documentation for raidZ is wonderful, and the commands are logical.

mdam is a pain, and doesn't provide block level checksums, so you can easily get silent corruption.


It's really the integrity checking. As you say, mdadm is much more suited if you need to change the geometry of you array, add disks and so on. It's much handier for the smaller business, who can't afford a second send of disks to build a second array on when they want to reshape.

We ran btrfs on top of mdadm, getting both integrity checking and flexibility (although this just tells you that something is wrong).


There is a great advantage to combining the filesystem with the disk mapper. You don't have to use different commands to add and grow disks and the partitions upon those disks. Your filesystem knows about what it's living on and stores data accordingly. ZFS has more advanced file system properties, like sending snapshots, even of block devices. BTRFS is still working on feature parity with this. ZFS is much more stable than other FS with similar features.

The big disadvantage is the memory and CPU requirement. If your server has plenty of memory and CPU, I'd use ZFS. If you're running on an ARM NAS with 128MB RAM, I'd use something less fancy.


I think the primary advantage is that ZFS collapses all the standard filesystem abstractions. With mdadm or hardware raid you have a raid controller (which could be mdadm), volume manager (i.e. lvm), and filesystem (ext4, xfs, etc). ZFS combines all of that into one. It's really a different philosophy, but means that things like creating a new filesystem is almost instant (and CoW, snapshots, replication are all easy - although perhaps that's possible with the traditional abstractions as well).


I think the RAIDZ is also less susceptible to data loss due to the RAID layer and the filesystem being tightly integrated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: