XFS has got to go

May 25th 2009 04:57 pm

So the other day I had suffered probably the third dataloss in a year due to the XFS filesystem in use on my desktop + power loss.

Update: I just realized that I failed to mention that the dataloss is my fault, not XFS’s. Had I done more diligence in my research when I set it up in the first place I would have been aware of that particular shortcoming (shared with ext4 in its 2.6.29 and earlier configuration) where metadata updating before data, merged with powerloss could result in zero-length files. The files I lost were always ones being written to at the time, never just “random” files. Anyways…

I had already setup fairly well I thought, given that I was using RAID 1 (i.e. two redundant hard drives to hold one hard drive of data) and had UPS protection on the computer anyways.

The failures in my setup were notable though:

  1. My backup strategy depended on me having enough time to actually hook up the external drive, run the backup, unhook it, etc. Scripting it so that it would take less time still required too much time to initially setup. Therefore my backups were fairly out of date. Compounding this, when I finally did setup the external and try to perform a backup, I found to my displeasure that it was ntfs formatted and therefore I couldn’t write to it. (I had the read/write ntfs-3g driver available but had forgotten about it by then, and didn’t have time to troubleshoot further).
  2. UPS only works when your 2 year old son doesn’t go hard resetting the power on you.
  3. UPS + disabling the front power switch only saves you until your next kernel panic induced by bleeding-edge video drivers (the fact that they were open source didn’t save me here ;)
  4. The RAID worked — both hard drives were in the same inconsistent state, with the same lost files after the fsck… :(

The corrective action for all of this is complete however. Last night I used ntfsresize on the external drive (since it had data I wanted to retain) and used the free space to make a new ext3 partition for backups. From there I booted up a Gentoo LiveCD and copied all of the hard drive’s data to the external drive. I then reformatted the hard drives with ext4 (with appropriate mount options to avoid having it cause the same issues I have now with xfs) and copied all the data back.

There were some hiccups relating to me using manual mdadm commands to bring the hard drives online from the LiveCD. I apparently changed some of the md device names on the hard drives in the process which required some more manual mdadm –assemble action to fix. But everything is working so far, although I’m sure there’s a few packages that need to be reinstalled to account for the dataloss over time.

Luckily the major reason for my lack of free time looks like it will be rectified within the next week, more on that later.

Posted by mpyne under Computing Troubles & Personal | 14 Comments »

14 Responses to “XFS has got to go”

  1. Markus Identicon Icon Markus responded on 25 May 2009 at 17:29 #

    I never had problems with XFS — unlike ext3.

  2. Fri13 Identicon Icon Fri13 responded on 25 May 2009 at 18:27 #

    “The RAID worked — both hard drives were in the same inconsistent state, with the same lost files after the fsck… :(”

    I make the conclusion that you already know this. RAID is not a backup solution ever. It’s only protection is against HD’s hardware malfunctions.

    I know too many people who believes that RAID is good backup.

    I think we need to have KDE4 a good backup application what allows automatic backupping when removable drive is attached and it would get automounted and automatically get all wanted data backupped to it. And if user removed drive when backup is going, it will get continued when attached again.

    Mandriva has nice snapshot features and backup applications for this.

  3. mpyne Identicon Icon mpyne responded on 25 May 2009 at 18:38 #

    Yeah, I knew about that risk of RAID, basically I was just using for exactly what you mentioned, to protect against HD failure.

  4. astromme Identicon Icon astromme responded on 25 May 2009 at 18:42 #

    @fri13

    I’m working on a KDE solution to the backup problem. It’ll be some time, but I think it will fill a large gap when ready.

  5. Esben Mose Hansen Identicon Icon Esben Mose Hansen responded on 26 May 2009 at 02:00 #

    Yeah, I’ve lost data to that, eh, “feature”, too. It might be POSIX compliant, but it is not user-friendly, and I very much recommend avoiding any journaling filesystem that does not support a ordered journaling mode: That is, when the metadata is flushed to disk, so is any data pages pointed to by the metadata. Yes, it probably costs some performance, but the dataloss that is the alternative is unacceptable. If applications were to avoid this, they would have to do a flush, which is *also* unacceptable in many scenarios.

    Top recap: This happens when an application flushes a file which shares metadata sector with a file that has been created (say renamed) but not yet flushed. What happens is that the metadata for the newly created/renamed file is flushed to disk before the data, overtaking the data on the way to the disk. If the filesystem is then improperly shut down in the next few minutes (in XFS it can be a very long time) the data is gone *and so is any previous contents of that file*.

    So yeah, stay away from XFS and ext4 at least till 2.6.30. Unless, of course, you are sure that you do not end up in a situation like this.

    One last thing: Evey file system can have problems. The most common causex is bad writeback during crash (nothing to be done here) and faulty memory. So if you have filesystem corruption issue, let memcheck run over the night. It is by far the most likely culprit.

    And no, I am not an expert :)

  6. Dave Taylor Identicon Icon Dave Taylor responded on 26 May 2009 at 03:05 #

    Is it unusual that I am still using ReiserFS with no probs in 5 years of using it on two machines.

  7. K Identicon Icon K responded on 26 May 2009 at 07:23 #

    Yes, RAID is not the best solution for backups. A better use of the second HD is to make nightly filesystem-level backups on it. Of course this way you may loose a day of work, but on the other hand you get protection from filesystem crashes, accidental deletes, and (optionally) incremental versions for some period of time. As for myself, I am using rdiff-backup and a simple cron script:

    echo “Backup started at” `date`
    mount -o remount,rw /mnt/backup

    rdiff-backup –include /etc –include /home –exclude ‘**’ / /mnt/backup

    if [ $? = 0 ]; then
    echo “Backup: / : successfully finished at” `date`
    else
    echo “Backup: / : error $? at” `date`
    fi
    sync

    rdiff-backup –remove-older-than 1W –force /mnt/backup/
    mount -o remount,ro /mnt/backup

  8. Jammer Identicon Icon Jammer responded on 26 May 2009 at 20:13 #

    @Dave Taylor – I second that. I’ve been using ReiserFS on a couple of servers 24/7 for 6+ years and never had an issue. They’ve survived power failure (and the occasional bit of admin stupidity) and still keep right on going.

  9. Esben Mose Hansen Identicon Icon Esben Mose Hansen responded on 27 May 2009 at 02:53 #

    I don’t think reiserfs suffers from that particular bug^Wfeature. The only issue I have had with reiserfs was declining performance over time. XFS, ext4 and JFS are, I think, the most popular proponents of bug^Wfeature.

    Reiserfs is however a dead-end… it’s pretty much unmaintained. I have no idea about how much maintainance a stable filesystem actually needs.

  10. Segedunum Identicon Icon Segedunum responded on 27 May 2009 at 16:02 #

    I get rather sick of reading this kind of stuff and it’s like reading those endless bullshit ‘debates’ on Gentoo’s forums about personal anecdotes as to what happens when people have pulled their power cords. I have no idea what’s gone on here exactly, but XFS shares none of the issues that ext4 has with regard to committing metadata before real data. This was solved years ago. ext4 is merely repeating the process in order to gain a speed advantage after years of deriding XFS.

    The bottom line is that if you have a power loss then you can suffer data loss with any filesystem, including various versions of ext. Really, your personal experiences with a particular filesystem don’t count at all. If you want to rest easy then use a UPS or a laptop running with a battery. If you’re allowing your two year old son to play with this then more fool you. Bugger. Kernel panicking with ‘bleeding edge’ video drivers. Why on Earth are you running them? If it’s a dev machine then accept data loss and bad things happening.

    If you don’t take those steps that then your data doesn’t mean much to you. Anyone who thinks that RAID will help is, frankly, a bit of an idiot. All you’re doing is committing the same data to different disks. Of course they’re in the same inconsistent bloody state. What the hell did you think would happen?

    Backup hard drives: Yes, they’re formatted with NTFS or FAT unfortunately because that’s what most of the world uses. Simply backup the data on it, reformat it with ext3, install ext filesystem support on Windows and do backups if your data is important to you.

    I have no clue at all why idiots lose data and then post their own inane, pointless and inaccurate anecdotes on their own usage of a particular filesystem. It’s been going on for years.

  11. mpyne Identicon Icon mpyne responded on 27 May 2009 at 20:08 #

    Segedunum: Based on the fact that you apparently misunderstood some of what I wrote I came close to disemvoweling your post. But anyways:

    The mere possibility of suffering data loss is present anywhere and with any filesystem, sure. There is a difference however between mere data corruption or missing data (especially with textual configuration files) and files which are zero length. Moving to ext4 (with appropriate mount options) gives me shortened/incomplete files at worst instead of zero-length useless rc files. A much easier alternative would be appropriate mount options for XFS but since I haven’t seen them forthcoming there we are.

    “Really, your personal experiences with a particular filesystem don’t count at all.” LOL, both my hard disks seem to disagree at this point, as both are missing XFS.

    “If you’re allowing your two year old son to play with this then more fool you.” Do you even have kids? Or do you really think that they don’t like playing with push buttons (especially near shiny power lights)? Every computer I’ve ever owned will power off after holding the button in for 5 seconds. Or do you go the method of locking your kids up anytime you’re unable to physically restrain them for more than 10 seconds? Of course I could just buy parts from Radio Shack and install a padlock around the power switch. But it’s much easier just to use a desktop file system for my desktop instead of using a server file system.

    “If you want to rest easy then use a UPS or a laptop running with a battery” Did you even read the post?

    “Kernel panicking with ‘bleeding edge’ video drivers. Why on Earth are you running them?” Gee I don’t know, because otherwise my session slows to a crawl? Why does anyone go through the hassle of dealing with bleeding edge drivers when they already have 50 other things to do?

    “If it’s a dev machine then accept data loss and bad things happening” … or I could go the route where I don’t get anywhere near as much data loss or bad things happening instead of just taking what I get.

    “Backup hard drives: Yes, they’re formatted with NTFS or FAT unfortunately because that’s what most of the world uses.” Which is fine, really, except that it makes my life harder because — oh wait, you already covered it:

    “Simply backup the data on it, reformat it with ext3,” Did you even read the post? I repartitioned it and added a ext3 partition.

    “I have no clue at all why idiots lose data and then post their own inane, pointless and inaccurate anecdotes on their own usage of a particular filesystem. It’s been going on for years.” Well, I have no clue at all why people skim a post, find something that offends them personally, and then spew forth with a diatribe which is half useless and obviously written without actually reading the post in question.

  12. Blissex Identicon Icon Blissex responded on 30 May 2009 at 09:26 #

    Another one bites the dust, and does not understand tradeoffs. XFS is very safe, if you understand what filesystem barriers are or are not, and their consequences, and if you use applications written by people who understand the same things.

    Also, RAID is not a protection against hard disk failure — a cold copy drive works as well. RAID with redundancy is a technique for *continuous operation* despite limited damage.

    As to write barriers, some collected links:

    http://sandeen.net/wordpress/?p=34
    http://sandeen.net/wordpress/?p=42
    https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
    http://lwn.net/SubscriberLink/322823/e6979f02e5a73feb/
    http://loupgaroublond.blogspot.com/2009/03/anecdote-about-why-doing-wrong-thing-is.html
    http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-file-problem/
    http://mjg59.livejournal.com/108257.html

  13. mpyne Identicon Icon mpyne responded on 07 Jun 2009 at 19:27 #

    Blissex, sorry for leaving your comment in the spam queue for so long, apparently the collected links you provided tripped off the auto-spam filter. As far as being safe if you use applications written by people who understand filesystem barriers, that’s pretty much my point exactly. XFS is not safe for desktop systems (the software on which is predominantly by people who “don’t understand filesystem barriers”. I don’t run Oracle, or other enterprise software, and I have no intentions to. My fault, I just won’t use XFS for desktop systems in the future.

    At least in my configuration, I was not able to boot up one day with one of the drives in an inconsistent state so it’s not even really much of a means for “continuous operation”. It’s hard for me to argue with the constant mirroring it does though — I was able to manually start the RAID on the one working drive without losing data (that day).

    Sad part is that ext4 really does seem faster here, even with the safer mount options, so far it’s been a win-win.

  14. yes456 Identicon Icon yes456 responded on 12 Sep 2009 at 00:22 #

    Hi
    Which is good file system for Linux (for desktop)?
    ext3, ext4,xfs,jfs, reiserfs
    thanks

Trackback URI | Comments RSS

Leave a Reply