[rant] everytime i give it a chance, btrfs lets me down

beleza pura@lemmy.eco.br · 5 months ago

[rant] everytime i give it a chance, btrfs lets me down

dingdongitsabear@lemmy.ml · edit-2 13 days ago

deleted by creator

WalnutLum@lemmy.ml · edit-2 5 months ago

I run btrfs on every hard drive that my Linux boxes use and there’s the occasional hiccup but I’ve never run into anything “unrecoverable.”

I will say that compared to extfs, where the files will just eat shit if there’s a write corruption, because btrfs tries to baby the data I think there appear to be more “filesystem” issues.

Endymion_Mallorn@kbin.melroy.org · 5 months ago

This is just telling me that my loyalty to FAT(32) is valid.

zarkanian@sh.itjust.works · 5 months ago

Have you tested your RAM?

beleza pura@lemmy.eco.br · 5 months ago

not sure what the relation would be. my ram is fine afaik

zarkanian@sh.itjust.works · 5 months ago

Run memtest86+. I had similar issues and it was due to faulty RAM.

WalnutLum@lemmy.ml · 5 months ago

Typically when there are “can’t mount” issues with btrfs it’s cause the write log got corrupted, and memory errors are usually the cause.

BTRFS needs a clean write log to guarantee the state of the blocks to put the filesystem overlay on top of, so if it’s corrupted btrfs usually chooses to not mount until you do some manual remediations.

If the data verification stuff seems more of a pain in the ass than it’s worth you can turn most of those features off with mount options.

beleza pura@lemmy.eco.br · 4 months ago

oh wow, that’s crazy. thanks for the info, but it’s a little fucked up that btrfs can make a memory failure cause a filesystem corruption

Atemu@lemmy.ml · 4 months ago

It’s the other way around: The memory failure causes the corruption.

Btrfs is merely able to detect it while i.e. extfs is not.

ky56@aussie.zone · edit-2 4 months ago

Not really. Even TrueNAS Core (ZFS) highly recommends ECC memory to mitigate this possibility from occurring. After reading more about filesystems in general and when money allowed, I took this advice as gospel when upgrading my server from junk I found laying around to a proper Supermicro ATX server mobo.

The difference I think is that BRTFS is more vulnerable to becoming unmountable whereas other filesystems have a better chance of still being mountable but contain missing or corrupted data. The latter usually being preferable.

For desktop use some people don’t recommend ZFS as if the right memory corruption conditions are met, it can eat your data as well. It’s why Linus Torvalds goes on a rant every now and then about how bullshit it is that Intel normalized paywalling ECC memory to servers only.

I disagree and think the benefits of ZFS on a desktop without ECC outweigh a rare possibility that can be mitigated with backups.

yum13241@lemm.ee · 4 months ago

I literally daily drive btrfs. Just don’t use a crappy drive or use raid5/raid6.

FuckBigTech347@lemmygrad.ml · 4 months ago

BTRFS RAID5/6 is fine as long you don’t run into a scenario where your machine crashes and there was still unwritten data in the cache. Also write performance sucks and scrubbing takes an eternity.

yum13241@lemm.ee · edit-2 4 months ago

Just do a search on your favorite search engine for “btrfs raid5/6 write hole bug” and you’ll see. If power gets cut, any file on the set of disks could be missing, or just have bunch of garbage.

FuckBigTech347@lemmygrad.ml · edit-2 4 months ago

That’s literally what I’m saying; It’s fine as long as there wasn’t any unwritten data in the cache when the machine crashes/suddenly loses power. RAID controllers have a battery backed write cache for this reason, because traditional RAID5/6 has the same issue.

Markaos@discuss.tchncs.de · 4 months ago

My two cents: the only time I had an issue with Btrfs, it refused to mount without using a FS repair tool (and was fine afterwards, and I knew which files needed to be checked for possible corruption). When I had an issue with ext4, I didn’t know about it until I tried to access an old file and it was 0 bytes - a completely silent corruption I found out probably months after it actually happened.

Both filesystems failed, but one at least notified me about it, while the second just “pretended” everything was fine while it ate my data.

sadTruth@lemmy.hogru.ch · 4 months ago

I am running BTRFS on multiple PCs and Laptops since about 8-10 years ago, and i had 2 incidents:

Cheap SSD: BTRFS reported errors, half a year later the SSD failed and never worked again.
Unstable RAM: BTRFS reported errors, i did a memtest and found RAM was unstable.

I am using BTRFS RAID0 since about 6 years. Even there, i had 0 issues. In all those years BTRFS snapshoting has saved me countless hours when i accidentially misconfigured a program or did a accidential rm -r ~/xyz.

For me the real risk in BTRFS comes from snapper, which takes snapshots even when the disk is almost full. This has resulted in multiple systems not booting because there was no space left. That’s why i prefer Timeshift for anything but my main PC.

Jay🚩@lemmy.ml · 5 months ago

Get on FreeBSD + ZFS

Mwa@lemm.ee · 5 months ago

You can use Linux with zfs if you install open zfs.

beleza pura@lemmy.eco.br · 4 months ago

you can, but from what i heard, maybe you shouldn’t, bc openzfs is much more unreliable than true zfs

paperd@lemmy.zip · 4 months ago

OpenZFS is the zfs now. There is no difference.

data1701d (He/Him)@startrek.website · 5 months ago

Sad to hear. I don’t know if it’s luck or something else.

I’ve been running Debian on btrfs on my laptop for 3 months without issue; I still use ext4 on my desktop, as I just went with defaults when I installed the operating system.

blackstrat@lemmy.fwgx.uk · 5 months ago

You’re right to give up on btrfs. It’s been so long in development and it just isn’t ready. Ext4 or ZFS are mature and excellent file systems. There’s no need for btrfs these days. It always has and always will disappoint.

Everyone singing the praises of it are the sysadmin equivalent of the software engineer yelling ‘it works on my machine’ when a user finds an issue.

Liam Mayfair@lemmy.sdf.org · 4 months ago

I can’t comment on its server use cases or exotic workstation setups with RAID, NAS, etc. but I’ve been running Fedora on Btrfs for quite a few years now and I’ve had zero issues with it. Am I deliberately using all of its features like CoW, compression, snapshots…? No, but neither would your average Linux user who just wants something that works, like ext4.

I don’t miss ext4, Btrfs worked for me since day 1.

blackstrat@lemmy.fwgx.uk · 4 months ago

So, what you’re saying is, “it works on my machine”

kyoji@lemmy.world · 4 months ago

And everyone else’s that uses Fedora?

cool_pebble@aussie.zone · 4 months ago

Aren’t we all? Aren’t Ext4 and ZFS considered mature because so many people have said “it works on my machine”?

I agree this person’s experience may contrast to your own, but I don’t think the fact that something has worked well for some people, and perhaps not for yourself, is a reason to discount it entirely.

blackstrat@lemmy.fwgx.uk · 4 months ago

I’ve never heard anyone say ZFS broke, corrupted their data or failed in any way at all. With btrfs it’s a consistent complaint. And btrfs literally has modes of operation that are known to be broken. I could understand if it was a new file system, but it can almost drink in pubs.

WalnutLum@lemmy.ml · 5 months ago

I’ve had btrfs go into an error state because of a bad write before, but it was pretty easy to recover from

BCsven@lemmy.ca · 5 months ago

My system has been btrfs since 2017. No issues. Maybe you have random powerloss?

dwt@feddit.org · 5 months ago

You know, protecting against Powerloss was the major feature of filesystems in a time gone by…

BCsven@lemmy.ca · 4 months ago

Yep, this entry explains about btrfs zfs and powerloss reovery, but that buggy hardware could mess with that system. https://unix.stackexchange.com/questions/340947/does-btrfs-guarantee-data-consistency-on-power-outages#520063

Atemu@lemmy.ml · edit-2 4 months ago

It only works if the hardware doesn’t lie about write barriers. If it says it’s written some sectors, btrfs assumes that reading any of those sectors will return the written data rather than the data that was there before. What’s important here isn’t that the data will forever stay in-tact but ordering. Once a metadata generation has been written to disk, btrfs waits on the write barrier and only updates the superblock (the final metadata “root”) afterwards.

If the system loses power while the metadata generation is being written, all is well because the superblock still points at the old generation as the write barrier hasn’t passed yet. On the next boot, btrfs will simply continue with the previous generation referenced in the superblock which is fully committed.
If the hardware lied about the write barrier before the superblock update though (i.e. for performance reasons) and has only written e.g. half of the sectors containing the metadata generation but did write the superblock, that would be an inconsistent state which btrfs cannot trivially recover from.

If that promise is broken, there’s nothing btrfs (or ZFS for that matter) can do. Software cannot reliably protect against this failure mode.
You could mitigate it by waiting some amount of time which would reduce (but not eliminate) the risk of the data before the barrier not being written yet but that would also make every commit take that much longer which would kill performance.

It can reliably protect against power loss (bugs not withstanding) but only if the hardware doesn’t lie about some basic guarantees.

FuckBigTech347@lemmygrad.ml · 4 months ago

I had a drive where data would get silently corrupted after some time no matter what filesystem was on it. Machine’s RAM tested fine. Turned out the write cache on the drive was bad! I was able to “fix” it by disabling the cache via hdparm until I was able to replace that drive.

ReversalHatchery@beehaw.org · 4 months ago

Maybe you have random powerloss?

who doesn’t? even if rarely, but it just happens

BCsven@lemmy.ca · 4 months ago

I lost data on Windows in 2010. Since them I have had a decent UPS. Cheap insurance

bad_news@lemmy.billiam.net · 5 months ago

I’ve had some issues with btrfs. I have a Fedora 41 box where the 1TB drive just magically lost 300+GB of capacity that shows up in use but there is nothing using it, and I used to have a 39/40 box where booting would take like 3+ mins that was all btrfs being unhappy with the primary drive on initial mount. Snapshots and the duplicate stuff are pretty killer features, though…

Atemu@lemmy.ml · 4 months ago

the 1TB drive just magically lost 300+GB of capacity that shows up in use but there is nothing using it

How did you verify that “nothing” is using it? That’s not a trivial task with btrfs because any given btrfs filesystem can contain an arbitrary amount of filesystem roots and that filesystem roots can be duplicated in seconds.

If you have ever done a snapshot or enabled automatic snapshots via e.g. snapper or btrbk, data that you have since deleted may still be present in a snapshot. Use btrfs subvolume list / to list all subvolumes and snapshots.

If you ever feel lost in analysing btrfs data usage, you can use btdu to visually explore where data is located. Note that it never shows 100% accurate usage as it’s based on probabilistic sampling. It’s usually accurate enough to figure out what’s using your data after a little while though, so let it scan for a good minute.

bad_news@lemmy.billiam.net · 4 months ago

I looked at net use in whatever the KDE utility is to find current use (this was also a jarring shift, suddenly everything was broken because the drive was full overnight when it should have been 30% free and only 600-ish GB WERE per the utility). Figuring it was snapshots, I installed btrfs assistant and there were no snapshots. Btrfs subvolume just shows my root, home, and a snapshot I made just before I upgraded to Fedora 41, which was well after the memory disappeared. My theory has been it is somehow NFS caching gone wrong because this is connected to an NFS volume and maybe it’s FUSE or whatever the modern equivalent is fucking up below btrfs even? I don’t have btdu, what package provides that on Fedora?

Atemu@lemmy.ml · 4 months ago

Please show sudo btrfs filesystem usage.

https://github.com/CyberShadow/btdu?tab=readme-ov-file#installation

bad_news@lemmy.billiam.net · 4 months ago

I get not enough arguments: 0 but 1 expected, am I mis-parsing your suggestion?

Atemu@lemmy.ml · 4 months ago

You need to point it at your btrfs, so i.e. /.

bad_news@lemmy.billiam.net · 4 months ago

Well that kiiind of shows the problem, 235GB are “Device unallocated” and my volume shows up as /dev/nvmeblah as 711GB

fmstrat@lemmy.nowsci.com · 5 months ago

Won’t say it… Won’t say it… ZFS!! Oops.

beleza pura@lemmy.eco.br · 5 months ago

i’ve been meaning to try it, but i installed freebsd to an ufs partition instead of zfs because ufs was marked by default in the installer 🤦

fmstrat@lemmy.nowsci.com · edit-2 5 months ago

It’s fantastic, IMO. Still use LUKS and software raid for root, but everything else is encrypted raidz.

ReversalHatchery@beehaw.org · 4 months ago

what’s the point in using software raid? or do you mean the raidz setups of zfs?

unknowing8343@discuss.tchncs.de · 4 months ago

Been using BTRFS since I learned I could squeeze more data on my cheap-ass drive and… It’s been 3 years, no problem at all, and I have backups anyway.

Yozul@beehaw.org · 4 months ago

I mean, unless you really like one of the weird bells and whistles btrfs supports ext4 is just faster and more reliable. If you don’t have weird power user needs then anything else is a downgrade. Even ZFS really only makes a significant difference if you’re moving around gigabytes of data on a daily basis. If you’re on a BSD anyway feel free to go for it, but for most people there is no real benefit. Every other fancy new file system is just worse for typical desktop use cases. People desperately want to replace ext4 because it’s old, but there’s just really nothing to gain from it. Sometimes simple and reliable is good.

ProgrammingSocks@pawb.social · edit-2 4 months ago

Copy on write, compression, and snapshots are really good whistles, though.

Yozul@beehaw.org · 4 months ago

Copy on write is pretty overrated for most use cases. It’d be nice to have, but I don’t find it’s worth the bother. Disk compression and snapshots have had solutions for longer than btrfs has existed, so I don’t understand why I’d want to cram them into an otherwise worse file system and call it an improvement. I will admit that copy on write and snapshots do at least have a little synergy together, but storage has gotten to be one of the cheapest parts of a computer. I’d rather just have a real backup.

ProgrammingSocks@pawb.social · 4 months ago

Myself and many others have found lots of use in these features. If it’s not important to you that’s fine, but there ARE reasons many of us default to btrfs now.

ReversalHatchery@beehaw.org · edit-2 4 months ago

being able to revert a failed upgrade by restoring a snapshot is not a power user need but a very basic feature for everyday users who do not want to debug every little problem that can go wrong, but just want to use their computer.

ext4 does not allow that.

Unmapped@lemmy.ml · 4 months ago

By using NixOS I can do this on ext4. Just reboot back to the previous image before the update. Not saying everyday users should be running nixos but there are other Immutable distros that can do the same.

ReversalHatchery@beehaw.org · 4 months ago

will that also restore your data? what happens when a program updates its database structure with the update, and the old version you restore won’t understand it anymore?

Unmapped@lemmy.ml · 4 months ago

That is a good point. I’ve only had to rollback twice and nether time had any issues. But from my understanding of how it works, you are correct, the data wouldn’t rollback.

ReversalHatchery@beehaw.org · 4 months ago

I’ve learned this lesson with my Android phone a few years ago. There it was actually about sqlite databases of a system app (contacts I think?), but this can happen with other formats too. Worst is if it doesn’t just error out, but tries to work with “garbage”, because it’ll possibly take much more time to debug it, or even realize that your data is corrupt.

Yozul@beehaw.org · 4 months ago

You know file systems are not the only way to do that, right? Heck, Timeshift is explicitly designed to do that easily and automatically without ever even having to look at a command line. Backup before upgrade is a weird thing to cram into a file system.