Journaling filesystems quick FAQ

Under contruction. I'm collecting facts and opinions about the various journaling filesystems for Linux (ext3 and ReiserFS in particular) and I thought I might as well make this available for everyone.

Most of this is distilled from messages on the Linux-kernel mailing list.

ext3

The ext3 filesystem finally made it into the official kernel as of version 2.4.15-pre2.
  1. What do the different data=... options do?

    ext3 by default [that is, with data=ordered] imposes stricter ordering than the other journalling filesystems in order to improve data consistency (as opposed to just the guarantee of consistent metadata as most other [journaling] filesystems do). If you mount the filesystem with

    mount -t ext3 -o data=writeback /dev/foo /mnt/bar
    
    will make it use the same level of guarantee as reiserfs does.
    mount -t ext3 -o data=journal /dev/foo /mnt/bar
    
    will do FULL data journalling and will also guarantee data integrity after a crash...
  2. When to use what mode?

    I would use data=journal on my CVS archive, and maybe writeback on a news server.

    Add to this that sync NFS mounts also are far better of with data=journal.

  3. What to use for a database like mysql?

    Well you used reiserfs before. data=writeback is equivalent to the protection reiserfs offers. Big databases such as Oracle do their own journalling and will make sure transactions are actually on disk before they finalize the transaction to the requestor. mysql... I'm not sure about, and it also depends on if it's a mostly-read-only database, a mostly-write database or a "mixed" one. In the first cases, mounting "sync" with full journalling will ensure full datasafety; the second case might just be faster with full journalling (full journalling has IO clustering benefits for lots of small, random, writes) but for the mixed case it's a matter of reliablity versus performance.....

    Arjan van de Ven (arjan at fenrus.demon.nl)

    For a database, your application will be specifying the write ordering explicitly with fsync and/or O_SYNC. For the filesystem to try to sync its IO in addition to that is largely redundant. writeback is entirely appriopriate for databases.

    Remember, the key condition that ordered mode guards against is finding stale blocks in the middle of recently-allocated files. With databases, that's not a huge concern. Except during table creation, most database writes are into existing allocated blocks; and the data in the database is normally accessed directly only by a specified database process, not by normal client processes, so any leaks that do occur if the database extends its file won't be visible to normal users.

    Stephen C. Tweedie (sct at redhat.com)

ReiserFS

  1. I've heard a lot of talk from all sorts of people about ReiserFS not being stable enough to use in a productional environment where high uptime is essensial.

    Can someone tell me if this is true?

    That all depends on what you mean by "stable". Reiser is certainly capable of high uptimes, but Reiser doesn't have a good history of working well with older UNIX tools/systems like NFS, due to Reiser's newer methods for handling inodes and such. If this isn't a problem for you, Reiser should work very well for you; it works great on my /var partition, which handles my Squid proxy. I don't use Reiser on my /home partition though; that FS has the user directories exported through NFS, as well as Samba. In fact, I use SGI's XFS on my /home partition, and that works well too. The main advantage to using XFS is that it handles NFS really well, and it has certain features Reiser doesn't, like extended attributes, and access control lists. YMMV, but Reiser seems stable for just that one specific duty . . . I'd recommend trying Reiser, JFS, XFS, and maybe even Ext3 to get a feel for how stable each is for your particular needs. HTH.

    Sean Elble (s_elble at yahoo.com)

Links