XFS has got to go
May 25th 2009 04:57 pm
So the other day I had suffered probably the third dataloss in a year due to the XFS filesystem in use on my desktop + power loss.
Update: I just realized that I failed to mention that the dataloss is my fault, not XFS’s. Had I done more diligence in my research when I set it up in the first place I would have been aware of that particular shortcoming (shared with ext4 in its 2.6.29 and earlier configuration) where metadata updating before data, merged with powerloss could result in zero-length files. The files I lost were always ones being written to at the time, never just “random” files. Anyways…
I had already setup fairly well I thought, given that I was using RAID 1 (i.e. two redundant hard drives to hold one hard drive of data) and had UPS protection on the computer anyways.
The failures in my setup were notable though:
- My backup strategy depended on me having enough time to actually hook up the external drive, run the backup, unhook it, etc. Scripting it so that it would take less time still required too much time to initially setup. Therefore my backups were fairly out of date. Compounding this, when I finally did setup the external and try to perform a backup, I found to my displeasure that it was ntfs formatted and therefore I couldn’t write to it. (I had the read/write ntfs-3g driver available but had forgotten about it by then, and didn’t have time to troubleshoot further).
- UPS only works when your 2 year old son doesn’t go hard resetting the power on you.
- UPS + disabling the front power switch only saves you until your next kernel panic induced by bleeding-edge video drivers (the fact that they were open source didn’t save me here ;)
- The RAID worked — both hard drives were in the same inconsistent state, with the same lost files after the fsck… :(
The corrective action for all of this is complete however. Last night I used ntfsresize on the external drive (since it had data I wanted to retain) and used the free space to make a new ext3 partition for backups. From there I booted up a Gentoo LiveCD and copied all of the hard drive’s data to the external drive. I then reformatted the hard drives with ext4 (with appropriate mount options to avoid having it cause the same issues I have now with xfs) and copied all the data back.
There were some hiccups relating to me using manual mdadm commands to bring the hard drives online from the LiveCD. I apparently changed some of the md device names on the hard drives in the process which required some more manual mdadm –assemble action to fix. But everything is working so far, although I’m sure there’s a few packages that need to be reinstalled to account for the dataloss over time.
Luckily the major reason for my lack of free time looks like it will be rectified within the next week, more on that later.

I never had problems with XFS — unlike ext3.
“The RAID worked — both hard drives were in the same inconsistent state, with the same lost files after the fsck… :(”
I make the conclusion that you already know this. RAID is not a backup solution ever. It’s only protection is against HD’s hardware malfunctions.
I know too many people who believes that RAID is good backup.
I think we need to have KDE4 a good backup application what allows automatic backupping when removable drive is attached and it would get automounted and automatically get all wanted data backupped to it. And if user removed drive when backup is going, it will get continued when attached again.
Mandriva has nice snapshot features and backup applications for this.
Yeah, I knew about that risk of RAID, basically I was just using for exactly what you mentioned, to protect against HD failure.
@fri13
I’m working on a KDE solution to the backup problem. It’ll be some time, but I think it will fill a large gap when ready.
Yeah, I’ve lost data to that, eh, “feature”, too. It might be POSIX compliant, but it is not user-friendly, and I very much recommend avoiding any journaling filesystem that does not support a ordered journaling mode: That is, when the metadata is flushed to disk, so is any data pages pointed to by the metadata. Yes, it probably costs some performance, but the dataloss that is the alternative is unacceptable. If applications were to avoid this, they would have to do a flush, which is *also* unacceptable in many scenarios.
Top recap: This happens when an application flushes a file which shares metadata sector with a file that has been created (say renamed) but not yet flushed. What happens is that the metadata for the newly created/renamed file is flushed to disk before the data, overtaking the data on the way to the disk. If the filesystem is then improperly shut down in the next few minutes (in XFS it can be a very long time) the data is gone *and so is any previous contents of that file*.
So yeah, stay away from XFS and ext4 at least till 2.6.30. Unless, of course, you are sure that you do not end up in a situation like this.
One last thing: Evey file system can have problems. The most common causex is bad writeback during crash (nothing to be done here) and faulty memory. So if you have filesystem corruption issue, let memcheck run over the night. It is by far the most likely culprit.
And no, I am not an expert :)
Is it unusual that I am still using ReiserFS with no probs in 5 years of using it on two machines.
Yes, RAID is not the best solution for backups. A better use of the second HD is to make nightly filesystem-level backups on it. Of course this way you may loose a day of work, but on the other hand you get protection from filesystem crashes, accidental deletes, and (optionally) incremental versions for some period of time. As for myself, I am using rdiff-backup and a simple cron script:
echo “Backup started at” `date`
mount -o remount,rw /mnt/backup
rdiff-backup –include /etc –include /home –exclude ‘**’ / /mnt/backup
if [ $? = 0 ]; then
echo “Backup: / : successfully finished at” `date`
else
echo “Backup: / : error $? at” `date`
fi
sync
rdiff-backup –remove-older-than 1W –force /mnt/backup/
mount -o remount,ro /mnt/backup
@Dave Taylor – I second that. I’ve been using ReiserFS on a couple of servers 24/7 for 6+ years and never had an issue. They’ve survived power failure (and the occasional bit of admin stupidity) and still keep right on going.
I don’t think reiserfs suffers from that particular bug^Wfeature. The only issue I have had with reiserfs was declining performance over time. XFS, ext4 and JFS are, I think, the most popular proponents of bug^Wfeature.
Reiserfs is however a dead-end… it’s pretty much unmaintained. I have no idea about how much maintainance a stable filesystem actually needs.
I get rather sick of reading this kind of stuff and it’s like reading those endless bullshit ‘debates’ on Gentoo’s forums about personal anecdotes as to what happens when people have pulled their power cords. I have no idea what’s gone on here exactly, but XFS shares none of the issues that ext4 has with regard to committing metadata before real data. This was solved years ago. ext4 is merely repeating the process in order to gain a speed advantage after years of deriding XFS.
The bottom line is that if you have a power loss then you can suffer data loss with any filesystem, including various versions of ext. Really, your personal experiences with a particular filesystem don’t count at all. If you want to rest easy then use a UPS or a laptop running with a battery. If you’re allowing your two year old son to play with this then more fool you. Bugger. Kernel panicking with ‘bleeding edge’ video drivers. Why on Earth are you running them? If it’s a dev machine then accept data loss and bad things happening.
If you don’t take those steps that then your data doesn’t mean much to you. Anyone who thinks that RAID will help is, frankly, a bit of an idiot. All you’re doing is committing the same data to different disks. Of course they’re in the same inconsistent bloody state. What the hell did you think would happen?
Backup hard drives: Yes, they’re formatted with NTFS or FAT unfortunately because that’s what most of the world uses. Simply backup the data on it, reformat it with ext3, install ext filesystem support on Windows and do backups if your data is important to you.
I have no clue at all why idiots lose data and then post their own inane, pointless and inaccurate anecdotes on their own usage of a particular filesystem. It’s been going on for years.
Segedunum: Based on the fact that you apparently misunderstood some of what I wrote I came close to disemvoweling your post. But anyways:
The mere possibility of suffering data loss is present anywhere and with any filesystem, sure. There is a difference however between mere data corruption or missing data (especially with textual configuration files) and files which are zero length. Moving to ext4 (with appropriate mount options) gives me shortened/incomplete files at worst instead of zero-length useless rc files. A much easier alternative would be appropriate mount options for XFS but since I haven’t seen them forthcoming there we are.
“Really, your personal experiences with a particular filesystem don’t count at all.” LOL, both my hard disks seem to disagree at this point, as both are missing XFS.
“If you’re allowing your two year old son to play with this then more fool you.” Do you even have kids? Or do you really think that they don’t like playing with push buttons (especially near shiny power lights)? Every computer I’ve ever owned will power off after holding the button in for 5 seconds. Or do you go the method of locking your kids up anytime you’re unable to physically restrain them for more than 10 seconds? Of course I could just buy parts from Radio Shack and install a padlock around the power switch. But it’s much easier just to use a desktop file system for my desktop instead of using a server file system.
“If you want to rest easy then use a UPS or a laptop running with a battery” Did you even read the post?
“Kernel panicking with ‘bleeding edge’ video drivers. Why on Earth are you running them?” Gee I don’t know, because otherwise my session slows to a crawl? Why does anyone go through the hassle of dealing with bleeding edge drivers when they already have 50 other things to do?
“If it’s a dev machine then accept data loss and bad things happening” … or I could go the route where I don’t get anywhere near as much data loss or bad things happening instead of just taking what I get.
“Backup hard drives: Yes, they’re formatted with NTFS or FAT unfortunately because that’s what most of the world uses.” Which is fine, really, except that it makes my life harder because — oh wait, you already covered it:
“Simply backup the data on it, reformat it with ext3,” Did you even read the post? I repartitioned it and added a ext3 partition.
“I have no clue at all why idiots lose data and then post their own inane, pointless and inaccurate anecdotes on their own usage of a particular filesystem. It’s been going on for years.” Well, I have no clue at all why people skim a post, find something that offends them personally, and then spew forth with a diatribe which is half useless and obviously written without actually reading the post in question.
Another one bites the dust, and does not understand tradeoffs. XFS is very safe, if you understand what filesystem barriers are or are not, and their consequences, and if you use applications written by people who understand the same things.
Also, RAID is not a protection against hard disk failure — a cold copy drive works as well. RAID with redundancy is a technique for *continuous operation* despite limited damage.
As to write barriers, some collected links:
http://sandeen.net/wordpress/?p=34
http://sandeen.net/wordpress/?p=42
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
http://lwn.net/SubscriberLink/322823/e6979f02e5a73feb/
http://loupgaroublond.blogspot.com/2009/03/anecdote-about-why-doing-wrong-thing-is.html
http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/
http://mjg59.livejournal.com/108257.html
Blissex, sorry for leaving your comment in the spam queue for so long, apparently the collected links you provided tripped off the auto-spam filter. As far as being safe if you use applications written by people who understand filesystem barriers, that’s pretty much my point exactly. XFS is not safe for desktop systems (the software on which is predominantly by people who “don’t understand filesystem barriers”. I don’t run Oracle, or other enterprise software, and I have no intentions to. My fault, I just won’t use XFS for desktop systems in the future.
At least in my configuration, I was not able to boot up one day with one of the drives in an inconsistent state so it’s not even really much of a means for “continuous operation”. It’s hard for me to argue with the constant mirroring it does though — I was able to manually start the RAID on the one working drive without losing data (that day).
Sad part is that ext4 really does seem faster here, even with the safer mount options, so far it’s been a win-win.
Hi
Which is good file system for Linux (for desktop)?
ext3, ext4,xfs,jfs, reiserfs
thanks