Facebook runs their entire stack using Btrfs [0]. I would encourage anyone who is stuck in the "oh btrfs is so buggy and loses data" mindset (not helped by articles like this [1] that play off btrfs as some half-baked contraption, when it's really btrfs raid that needs a LOT more time to bake) to look into things and realize that large companies (OpenSuse, Redhat, Faceboook) have poured a lot of time to get it to work well.
I don't know about it's multi-disk story (I do use ZFS for that personally), but for single disk options it is great. You get so many of the ZFS benefits (snapshots, rollback, easily create and delete volumes, etc) with MUCH lower memory usage (at least in my own experiments to try this out).
Facebook has stacks of thousands of spare nodes ready at any moment to replace a failed node. All essential data will be replicated across many different boxes so if a box fails you just replace it with a fresh node and replicate the data there.
This is much different to the consumer usecase where computers are pets and not cattle. A failed filesystem the night before you need to turn in your thesis may have a much larger impact on your life.
Another thing to consider is that Facebook runs btrfs on enterprise hardware (including SSDs with battery backups) which is going to be much more reliable than some chromebook which lives in the bottom of your backpack that you bring on transit every day.
Finally, I will say that the copy on write features of btrfs can result is some wildly different behaviour based upon how you use it. You can get into some very bad pathological cases with write amplification, and if you run btrfs on top of LUKS it can nearly be impossible to figure out why your disk is being pegged despite very little throughput at the VFS layer.
So much FUD in this discussion. Christ Mason talked publicly that they use the cheapest SSDs they can find (even worse things than what he would be willing to put in his laptop), and that they investigate every instance of btrfs corruption. You're saying the exact opposite of the main btrfs guy at Facebook. I wonder who is right...
Who is right, one guy whos reputation relies on something not breaking or a bunch of end users who report the thing broke for them?
I experienced issues with write amplification within the past few months in Ubuntu 22 so it isn’t like all the issues are gone. I do agree that there are less issues now than there was before, but I will still say that btrfs still breaks or behaves unexpectedly much more often than ext4 or xfs.
Meta does a lot of things that don't scale for reliable/trustworthy systems and aren't suitable for all use-cases. (I also used to work there too.)
ZFS is only reliable where it was battled-tested: on Solaris. ZoL has been infinitely tinkered with and smashed up that it's nothing like running a Thumper as a NAS.
XFS + mdadm on Linux is, without a doubt, far more reliable than ZoL. Ask me how I know. I have the scars to prove it.
Not hardly, and not in the way you think. They replaced their arguably purer ZFS port to replace it with ZoL. As such, it's nowhere near as tested and proven as existing solutions like ext4 and xfs Redhat has deployed to millions of machines for decades. ZFS has too many religious fanboys who hype it without considering that boring and reliable are less risky than betting on code that hasn't had nearly the same scale of enterprise experience.
Yeah, my setup too, XFS + mdadm (+ eventually LVM2). Rock solid. It might not have HW raid performance, but in terms of stability, flexibility and recovery its absolutly unbeatable!
I am stuck in the btrfs-is-buggy mindset precisely because it managed to lose my root partition on a single disk machine. It might also have raid problems, but not exclusively.
That happened recently? A few years ago they added a reserved area used for emergency purposes that should solve situations like that. Can't say I've run into these problems, although I don't tend to run btrfs very heavily because performance becomes unacceptable long before that due to CoW.
> I don't know about it's multi-disk story (I do use ZFS for that personally), but for single disk options it is great.
I can reliably, across vendors and drives, break RAID10 on BTRFS where MD+LVM are totally fine. Simply pull power. Discovered this when building out my latest workstation.
I haven't tried other configurations; after finding this pattern I decided to leave BTRFS for single-disk configurations where I want CoW
I use btrfs a lot but I'm not sure if I'd use it for production servers. The I/O bandwidth is just a lot lower and I get weird latency problems on desktop Linux when BTRFS is very busy that I don't get on other file systems. Then again, I probably wouldn't use ZFS for anything but a NAS setup either.
The default is to coalesce trim requests into large batches and issue them once per minute or so. Most other filesystems don't use online trim. This can cause latency spikes. If you'll ever decide to try it out again, try disabling online trim.
Yeah, and when I was there machines would run out of disk space at 50% usage and it took months to figure out why. In the mean time, they'd just reimage the machine and hope. I don't recall any issues with data loss, but it didn't have the air of reliability.
But my team was weird at FB, our uptimes of 45 days were way above the average, and we ran into all sorts of things because we operated outside the norm.
Last time they talked about it (that I know of -- when Fedora was contemplating using btrfs and asked Chris Mason et al for their opinion), FB were running databases on xfs and were looking for ways to place them on raw disks for maximum performance. So not the entire stack.
I don't know about it's multi-disk story (I do use ZFS for that personally), but for single disk options it is great. You get so many of the ZFS benefits (snapshots, rollback, easily create and delete volumes, etc) with MUCH lower memory usage (at least in my own experiments to try this out).
[0] https://lwn.net/Articles/824855/ [1] https://arstechnica.com/gadgets/2021/09/examining-btrfs-linu...