A Better ext4

Submitted by irbis 2008-02-01 Linux 38 Comments

“Ext4, being the successor to ext3, may well be the filesystem many of us are using a few years from now. Things have been relatively quiet on that front – at least, outside of the relevant mailing lists – but the ext4 developers have not been idle. Some of their work has now come to the surface with Ted Ts’o’s posting of the ext4 merge plans for 2.6.25.”

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

38 Comments

2008-02-01 9:37 pm
FunkyELF
Just took a look at http://en.wikipedia.org/wiki/Ext4
Not too excited about it.
Give me pooled storage, end to end data integrity, self healing, and free (as in time not storage) snapshots.
If only a FS could exist on Linux with those features

2008-02-01 10:09 pm
cjcox
uh… we’ve had those things in Linux for awhile. What are you talking about? Weird…

2008-02-02 12:39 pm
WereCatf
uh… we’ve had those things in Linux for awhile. What are you talking about? Weird…
No, we don’t. Name one single filesystem useable under Linux that does online self-healing? Or pooled storage? The LVM2 approach is not the same, and LVM2 is actually pretty cumbersome approach. F.ex. what do you do in the case where you have three hdds as a single LVM2 volume and one of the hdds is failing and you don’t have a replacement available? With f.ex. ZFS you could just issue one command which removes the hdd from the pool, automatically moving the files to other spots in the pool. And the free snapshots? Nope, LVM2 doesn’t provide that either. If you have all space reserved for your LVM2 volume then you can’t take snapshots. You need to specifically leave space for them. Under a pooled storage model you don’t need to reserve space. If there is space in the pool then you can take as many snapshots as needed without worrying about anything else, the amount of space needed is just reserved from the overall pool.

2008-02-02 6:34 pm
Wes Felter
F.ex. what do you do in the case where you have three hdds as a single LVM2 volume and one of the hdds is failing and you don’t have a replacement available? With f.ex. ZFS you could just issue one command which removes the hdd from the pool, automatically moving the files to other spots in the pool.
Actually, ZFS advertises this feature but they haven’t gotten around to implementing it yet.
2008-02-02 8:09 pm
CrLf
Under a pooled storage model you don’t need to reserve space.
Then what’s the point of having different filesystems? If space is one big pool shared by every filesystem, we just put everything under one big “/” and the effect is the same.

2008-02-02 8:37 pm
WereCatf
Then what’s the point of having different filesystems? If space is one big pool shared by every filesystem, we just put everything under one big “/” and the effect is the same.
I’m not sure what you mean here, but well… A pool allows one to combine all the physical storage devices to a single blob of space which you can either use as a single mount point (f.ex. “/”) or create several mount points in. The more hdds your machine has the more useful a pooled model actually is. LVM2 is a pooled model, ZFS is, and there’s several others. They just differ in details, like f.ex. LVM2 allows you to combine all physical devices to one pool (or several pools if you so desire) and then partition that as you would one really big physical device. ZFS acts mostly the same but you don’t partition it. You _can_ reserve space for your mount points if you wish, but you don’t need to. In the latter case all the available space in the pool is dynamically shared. These features are not that important for home use but in file servers and such they are very important.
2008-02-03 1:04 pm
bogomipz
From a security point of view, there are many reasons to use multiple file systems. Basically, it boils down to the fact that settings are done per FS. Each one can have different mount flags (like nodev/noexec/nosuid) and different quotas.

2008-02-03 2:29 am
Kokopelli
Name one single filesystem useable under Linux that does online self-healing? Or pooled storage? The LVM2 approach is not the same, and LVM2 is actually pretty cumbersome approach. F.ex. what do you do in the case where you have three hdds as a single LVM2 volume and one of the hdds is failing and you don’t have a replacement available? With f.ex. ZFS you could just issue one command which removes the hdd from the pool, automatically moving the files to other spots in the pool. And the free snapshots? Nope, LVM2 doesn’t provide that either.
I won’t say that LVM2 is better than ZFS but you CAN issue a single command (pvmove) to migrate all extents from one physical volume to another. If you wanted to be cautious it would take two commands (one to freeze allocations on the failing physical volume, then another to move). I would like to point out, given your example, that it is NOT possible to remove a top level vdev such as an unmirrored disk in ZFS while it is a simple procedure to migrate the extents in LVM2. ZFS has many features (RAIDz) that make this point sort of… pointless, but it was your example not mine.
That is not really what self-healing is about on ZFS though. The ability to self heal has more to do with the ability to have a fault tolerant system (usually a mirror) and recover the fault tolerance based on the internal checksums for the files to determine which copy was corrupted.
Personally I am comfortable with what LVM2 + filesystem choices gives me, even though they lack some very useful features that ZFS has. If zfs ever does make it into Linux (as a non FUSE FS) I am likely to migrate to it.

2008-02-01 11:07 pm
Wes Felter
Btrfs will provide these features in a few years.
2008-02-01 11:11 pm
superman
> If only a FS could exist on Linux with those features
Linux use device-mapper, lvm2, etc.
This works with ext3 and many over FS.

2008-02-02 2:30 pm
WereCatf
Linux use device-mapper, lvm2, etc.
As I already explained in my first post on this article LVM2 just ain’t even nearly the same as ZFS. LVM2 makes it possible to use several physical devices as one (or more) virtual one(s) but it works on block-level and thus it will never be able to do the same things. I’m not saying LVM2 is a bad design but it just can’t beat the ZFS approach.

2008-02-02 1:22 pm
diegocg
Give me pooled storage, end to end data integrity, self healing, and free (as in time not storage) snapshots.
Those are nice features indeed. For admins, that is. Guess what? pretty much all the desktop users don’t care (well, except snapshots, that may be used to build a nice backup program)
Ext4 will be really fast. It doesn’t has all those admin features, but it has things like delayed allocation, just like ZFS. It’ll be a great ext3 replacement and it’ll improve linux acceptance due to improved performance.

2008-02-02 2:23 pm
WereCatf
Those are nice features indeed. For admins, that is. Guess what? pretty much all the desktop users don’t care (well, except snapshots, that may be used to build a nice backup program)
They may not understand those features, but do you honestly believe a self-healing, self-repairing and completely foolproof filesystem is not beneficial for an average joe? Cos it sure is! It makes sure that their files will not get corrupted or lost and their filesystem is in healthy state at all times. They may not care to know what filesystem they use but they will surely care if they lose any of their files due to corruption. Those are important features for any user, no matter if it is at work or at home.

2008-02-02 3:27 pm
cmost
“They may not understand those features, but do you honestly believe a self-healing, self-repairing and completely foolproof filesystem is not beneficial for an average joe?”
And do you honestly believe an average Joe would even know what you’re blathering on about? All the bells and whistles you described would be nice indeed, but most people won’t take advantage of them, even if they’re present. How many people (especially so called average Joes) do you know even back up their data on a regular basis?

2008-02-02 3:34 pm
WereCatf
And do you honestly believe an average Joe would even know what you’re blathering on about? All the bells and whistles you described would be nice indeed, but most people won’t take advantage of them, even if they’re present.
I did say that they wouldn’t even understand those features, didn’t I? And that’s the beauty of it: it the filesystem their OS uses does support self-healing and self-repair they don’t have to understand it, they don’t need to know about it and they don’t have to do anything at all to use it! It’s there and it’s working for them anyway.
2008-02-02 3:52 pm
FunkyELF
I’ve told average joe’s that I’m planning on building a new file server. They ask what is wrong with my current one. I say that I want to run ZFS. They say what is that? Then I ask them if they have ever had an mp3 that sounded good, then all of a sudden got a skip on it, or some oscilating clicking sound. Some will say “yes I have”. Then I say that ZFS guards against that….actually it guarantees you’ll never get that. Stupid bit rot.

2008-02-02 4:23 pm
diegocg
It makes sure that their files will not get corrupted or lost and their filesystem is in healthy state at all times.
We’re in 2008, aren’t we? As far as I know, it has been a while since journalling was invented. Sure, the method ZFS uses to avoid those problems is better than journalling, but the result is the same: integrity (and unlike XFS, ext3 also ensures data-level integrity in the standard journaling mode, not just metadata). Only a software bug can corrupt your filesystem – just like in ZFS. Sometimes it’s amazing how well the Sun marketing works.
What ext3 misses here is block-level checksumming, which is mainly useful when your hard drive is broken. And you know what normal people, ie: non admins or geeks do when their hard drives break? Right, they send their computers to repair shop to get it fixed. Sure, ZFS block checksumming will handle those errors better than ext3, but it doesn’t fixes your hard disk nor it makes unnecesary to repair your computer, nor it will recover the data that has been lost in the broken sectors. In other words, while ZFS is undoubtely cool, Joe User isn’t going to miss it with ext4. Really.
Edited 2008-02-02 16:27 UTC

2008-02-02 6:57 pm
erast
you forgot about hot spares and self healing capabilities in ZFS…
2008-02-02 8:55 pm
diegocg
I didn’t forgot it, again you’re assuming Joe User is a datacenter admin. How many hard discs do Joe user have? One, and if he has two he’s not going to do raid on them.
Self healing is not useful for desktop users, unless you want to convice them they should make their PCs more expensive by adding extra hard disk for safety. Not that it wouldn’t be a good idea, but….good luck trying to convince millions of desktop users.
2008-02-02 9:29 pm
WereCatf
Self healing is not useful for desktop users, unless you want to convice them they should make their PCs more expensive by adding extra hard disk for safety. Not that it wouldn’t be a good idea, but….good luck trying to convince millions of desktop users.
You’re confusing things here. Self-healing doesn’t need any extra hard disks. It works just fine on only one.
2008-02-02 10:31 pm
WereCatf
Just replying to myself (to clarify things to anyone interested. Hopefully someone reads this and finds it informative : Redundancy is where you have a RAID array set-up so that there’s one disk which just simply has all the same data as the main disk and can be used to restore data if that data becomes corrupted on the main disk. There’s also parity disks which are used to save parity information that can be used to restore data on the other disk(s). Parity disks doesn’t need to be the same size as the other disk(s) cos parity information occupies less space and it’s a bit faster to write.
The ZFS approach is different: for every block written there are a few duplicates saved in unused space and used as a backup if the original gets corrupted. The space used by those duplicates however is not marked as used so they can be overwritten by actual data if needed. Though, if there is still space in the pool the data is saved on some other spot instead rather than overwriting those backup blocks. This approach is basically the same as the RAID redundancy yet it works even with one disk. Though, of course, if the disk suddenly just stops functioning then there’s nothing even ZFS can do about it. But it still protects you against single blocks failing or against bit-rot. And as always, if there are malfunctioning blocks on the disk and the number of them is increasing one should replace the disk..
2008-02-02 11:14 pm
diegocg
You’re confusing things here. Self-healing doesn’t need any extra hard disks. It works just fine on only one.
Sure, doing mirroring in the same disk. Another thing that Joe User isn’t going to do, unless you want to convince him that he only can use one half of his new disk. In order to do the “healing” part of “self healing”, ZFS needs to have a second copy of the corrupted data somewhere, period, unless you guys claim that ZFS can use magic to know it.
Can we stop this, please?
Edited 2008-02-02 23:18 UTC
2008-02-02 11:16 pm
WereCatf
Sure, doing mirroring in the same disk, simulating two disk with two partitions
Still, that is wrong. Read my other post for actual details.
2008-02-02 11:40 pm
diegocg
for every block written there are a few duplicates saved in unused space and used as a backup if the original gets corrupted.
Were you got that idea? ZFS doesn’t do that…I guess I need to quote Sun FAQs to prove ZFS does not what you suggest?
Q: Can I use a single disk with ZFS?
A: Yes. With a single disk, you can do one of the following:
* Use your disk as a single device, in which case you cannot benefit from the recovery capabilities provided by a ZFS mirrored or RAID-Z configuration, but will get the greatest capacity out of your device.
* Split your disk into multiple partitions and use them to build a ZFS mirrored or RAID-Z based pool. This options allows you to benefit from all of the ZFS recovery capabilities (unless your disk suffers a total failure), but you will have a smaller capacity in your storage pool.
Check any documentation from ZFS, data recoverity only works in raid-z/mirrored configurations.
Edited 2008-02-02 23:45 UTC

2008-02-02 8:06 pm
CrLf
Give me pooled storage, end to end data integrity, self healing, and free (as in time not storage) snapshots.
For the upteenth time… With LVM you also have pooled storage, and free snapshots (r/o and r/w).
You don’t have data checksumming, no, but if your disks silently corrupt data, chances are it will only be detected when it’s too late to do anything about it (other than replace the faulty disk, which is exactly the same one does with standard RAID).
ZFS fanboys should just stop polluting every topic with this crap. In the real world, nobody cares about ZFS. Any half-decent OS supports RAID and has a volume manager that does everything needed by 99.99% of users, and then some.

2008-02-04 12:45 am
phoenix
For the upteenth time… With LVM you also have pooled storage, and free snapshots (r/o and r/w).
Snapshots in LVM are not free. You have to leave free (unallocated) space on the drive(s) to later use for snapshot allocation. If you don’t plan ahead, you can’t use snapshots. Period!
You don’t have data checksumming, no, but if your disks silently corrupt data, chances are it will only be detected when it’s too late to do anything about it (other than replace the faulty disk, which is exactly the same one does with standard RAID).
Except that in ZFS, in a redundant setup (mirror/raidz), data is checked for errors on each read, and if the checksums don’t match on one disk, the data is loaded from another disk, and the correct data is then written out to the corrupt disk.
I have yet to see a software RAID1 setup (or even RAID5) setup that does anything similar.

2008-02-01 9:59 pm
_mikk
Would be nice if that got lifted. Totally artificial.

2008-02-02 1:15 pm
diegocg
That limit is already lifted in ext4

2008-02-02 1:07 am
psychicist
Even though there may be nothing earth shattering in ext4 compared to ZFS or Btrfs I think it’s still more than welcome and when it’s integrated, a fairly safe improvement over ext3.
However much I like XFS as a file system, it seems as if with the end-of-life of the IRIX/MIPS line, SGI has also abandoned all support for the processor architecture by design or neglect.
So what I’m saying is that even though XFS started out on big-endian MIPS running IRIX, it’s plain broken (a.o. corruption bugs) on my machine running little-endian Linux/MIPS. And it’s been that way from 2.6.18 up to the latest 2.6.24 kernel.
I’ve observed the same behaviour on ARM, so the Armedslack maintainer has finally and rightly so decided to pull XFS support from the kernels that he builds, so I won’t even try to install to an XFS partition.
I understand that SGI doesn’t have the manpower to support all architectures, but they could at least try to make XFS functional on architectures other than x86 and IA64 or clearly state that they only support it on the two aforementioned ones.
I haven’t been able to try it on SPARC and POWER yet, but if it isn’t stable there either, I have to seriously consider stop using it on x86 too. Maybe there’s an SGI developer reading this, but for the time being ext3 and ext4 look much safer for multi-architecture environments than XFS.

2008-02-02 1:27 pm
diegocg
I understand that SGI doesn’t have the manpower to support all architectures,
Well, since they have the manpower to support one, they have the manpower to support the rest of them. In Linux, filesystems are not developed for one architecture or other, they should work on all of them. If XFS doesn’t work in one architecture it’s probably due to lack of mainteinance in that architecture, not in XFS…

2008-02-02 3:11 am
Sodki
ext4 is just the successor of ext3, a true and tested, stable file system. It doesn’t pretend to be nothing more than that. There are dozens of file systems available on Linux, some with all the bells and whistles, some without. You just have to choose what’s best for you.

2008-02-02 4:17 am
kaiwai
True; that was actually one of the questions that constantly came up to why Fedora/Red Hat doesn’t include the ability to use ReiserFS/JFS/XFS without needing to pass boot time parametres (not too sure of the situation now though). The argument put forward by a Red Hat engineer was this (which I completely agree with by the way) – ext3/ext2 aren’t the sexiest, feature rich or trendy, but it does the job what it intends to do. File systems aren’t one of those things where you can risk stability and reliability for the sake of features. It is a core foundation of a system, if that goes, everything else goes with it.
I like the approach which ext4fs people have taken; evolve the file system gradually rather than taking radical revolutionary steps that could release untold file system issues at a later date.

2008-02-02 3:59 pm
FunkyELF
I like the approach which ext4fs people have taken; evolve the file system gradually rather than taking radical revolutionary steps that could release untold file system issues at a later date.
I like the approach the ZFS people have taken; Throw out all of those assumptions made in the earliest days of computing that are still at the heart of every other filesystem.

2008-02-02 1:35 pm
renhoek
ext4 has both forward and backwards compatibility (with ext3). this must make the code horrible complex and introduce compatibility problems.
i really don’t see the point in this, beside a possible smoother upgrade part (which could also be solved differently).
ext3 is backwards compatible with ext2, i didn’t understand back then why they did that. could anyone shed some light on this?

2008-02-02 4:43 pm
diegocg
ext4 has both forward and backwards compatibility (with ext3). this must make the code horrible complex and introduce compatibility problems.
Ext4 is not completely “backwards compatible” – ext3 will not be able to read the new structures used by ext4, in fact it won’t be able to mount ext4 filesystems.
Yes, ext4 can read ext3 structures, and that makes it more complex. But if you look at the size:
Ext4: 1016KB of C code (as of 2.6.25-pre).
Ext3: 732 KB.
XFS: 3.4 MB
Reiser 4: 2.3 MB
Reiser3: 888 KB
JFS: 952 KB
Linux in-kernel NTFS: 1.2 MB
OCFS2 1.9 MB
Is in fact quite simple and small. Is in fact one of the reasons why people went with ext4 instead of XFS and friends.

2008-02-02 11:53 pm
Doc Pain
Yes, ext4 can read ext3 structures, and that makes it more complex. But if you look at the size:
Ext4: 1016KB of C code (as of 2.6.25-pre).
Ext3: 732 KB.
XFS: 3.4 MB
Reiser 4: 2.3 MB
Reiser3: 888 KB
JFS: 952 KB
Linux in-kernel NTFS: 1.2 MB
OCFS2 1.9 MB
I’d like to see how BSD’s file system, the FFS or UFS / UFS2, fits into this comparison. It has been a stable and powerful file system for UNIX. I still hope the new EXT4 will be supported in BSD, allthough the support for the EXT[23] file systems was not that good…

2008-02-04 6:47 am
thecwin
I was satisfied with ext3 and I suppose ext4 will only improve it.
Off topic, but we sorely need a good, reliable, high (particularly read) performance cross platform filesystem with large file support and journalling. FAT32 isn’t cutting it. No fancy features or anything, just the bare minimal to have good support on the main operating systems. That’d be sweet…
2008-02-04 12:27 pm
NexGen
Why there is still no standard built-in data compression in Ext4?
I know there are ways to do it in Ext2/3, but those are hacks and workarounds with fuse.
NTFS does on-the-fly data compression as easy as possible, but tends to fragment.
An Ext* filesystem, and especially the Ext4 with smart extent allocation, should not ever suffer from that.
Well, one can say that HDDs nowdays are very large and cheap, but lzma-like compression would double their sizes. Power consumption and thermal conditions are also important things to consider.
Edited 2008-02-04 12:29 UTC