Ext4 Completes Development Phase

Guest post by Rahul 2008-10-18 Linux 46 Comments

While Ext4 was originally merged in 2.6.19, it was marked as a development filesystem. It has been a long time coming but as planned, Ext4dev has been renamed to Ext4 in 2.6.28 to indicate its level of maturity and paving the way for production level deployments. Ext4 filesystem developer Ted Tso also endorsed Btrfs as a multi-vendor, next generation filesystem and along with the interest from Andrew Morton, Btrfs is planned to be merged before 2.6.29 is released. It will follow a similar development process to Ext4 and be initially marked as development only.

46 Comments

2008-10-18 4:22 pm
adkilla
Is ext4 less prone to performance degradation on filesystems with less space for defragmentation?
I’ve read that it improves this situation somewhat but no specifics thus far.

2008-10-18 4:41 pm
NxStY
Ext4 has some features to reduce fragmentation, delayed allocation for example. There will also eventually be an online defragmenter.

2008-10-18 5:01 pm
adkilla
Doesn’t ext3 already have an online defragmenter? I recall that the major drawback of it that it needs a significant amount of freespace to be effective.
From wikipedia:
http://en.wikipedia.org/wiki/Ext4
“Ext4 will eventually also have an online defragmenter. Even with the various techniques used to avoid it, a long lived file system does tend to become fragmented over time. Ext4 will have a tool which can defragment individual files or entire file systems.“

2008-10-18 5:56 pm
segedunum
Doesn’t ext3 already have an online defragmenter?
No.
I recall that the major drawback of it that it needs a significant amount of freespace to be effective.
All defragmenters need free space, and how much depends a lot on the size of your files.

2008-10-18 5:00 pm
segedunum
No too sure what you mean, as ext3 had no official defragmentation tool (I’m assuming you’re comparing ext4 to something). It kept fragmentation to a minimum by allocating blocks that were closest to a file, but inevitably, this compromised other things and you still got fragmentation over time. This was alright in fairly static partitions such as /, /usr etc. but killed partitions with lots of file activity over time. ext4 has online defragmentation and also support for extents, which filesystems like XFS have provided for some time.
You can’t break the laws of physics though. If you are going to defragment then you have to have enough free space to move all your files around, and if you don’t then parts of your filesystem will simply stay fragmented with all the performance issues that entails.
However, defragmentation is merely a necessary evil with today’s storage technology and you can only do so much. If SSDs kick on over the next few years and we get the same random read and write access no matter where in a storage device a file block is allocated, then fragmentation issues will start to disappear. There are actually many ways you can defragment and order data with today’s mechanical disks with respect to performance. It really depends on the type of data and usage, and that’s what is so difficult. Any person who tells you that a modern filesystem can solve all that is well wide of the mark.

2008-10-18 5:11 pm
adkilla
Yeah, but the question is how much of space? Ext3 currently needs a 30% space requirement to reduce the effects of fragmentation.

2008-10-18 5:55 pm
segedunum
Yeah, but the question is how much of space?
It depends on how big your files are. If you have 10GB free space and a 15GB file somewhere then you’re going to struggle.

2008-10-18 6:33 pm
adkilla
In that case what benefit does an online defragmenter provide over the current situation?
2008-10-18 8:53 pm
OddFox
The online defragmenter gives you the ability to defrag the filesystem without taking it offline, that’s all, just a nice convenience not having to unmount or reboot to a “rescue” environment to do a defrag. Frankly I’m amazed it’s taken ’til Ext4 for this to become a possibility, though there have been projects such as Shake that try to get around the lack of an official online defrag util for many filesystems. http://vleu.net/shake/
2008-10-19 11:57 am
segedunum
I don’t think you get this. An online defragmenter simply allows you to defragment the filesystem while it is mounted, online and being used. That’s all.

2008-10-18 6:45 pm
Morph
You can’t break the laws of physics though. If you are going to defragment then you have to have enough free space to move all your files around…
Anyone remember the DOS defrag util that moved around individual clusters? And it displayed a nice big map of the disk showing which sectors it was currenly moving. Free space requirements: one free cluster

2008-10-18 8:55 pm
Earl Colby pottinger
Yes, I remember it well, and this thread has got me stumped because of that. While a it is a *VERY SLOW* way to fragment a drive, it will do it as long as there is more than two(2) sectors free.
While newer fragmentation software is faster because they more entire file around, a followup using a sector by sector defrag would insure that the filesystem can reach 100%.

2008-10-19 11:59 am
segedunum
That kind of defragmentation doesn’t fly today because the disks are just so large. It would take forever to go sector by sector with gigabytes worth of data, even after you’ve gone through a first-pass defragment.

2008-10-18 7:02 pm
dimosd
Is ext4 less prone to performance degradation on filesystems with less space for defragmentation?
I’ve read that it improves this situation somewhat but no specifics thus far.
I’ve been using ext4 for the last 3 months (Linux 2.6.26+latest ext4 patch). So far, no problems that me or fsck could detect, so it looks rather stable for a file system in development.
Generally speaking, directory operations (with lots of files) are a lot faster, and so is deleting large files. However, as the disk fills up, performance drops a lot.
Anyway, it is an improvement over ext3 and once it’s done I’ll probably switch completely to it.

2008-10-18 5:09 pm
adkilla
Seems like BTRFS is much more along the lines of a modern filesystem like ZFS or Reiser4.
If BTRFS is to be merged in 2.6.29, would it be stable enough for desktop use then? Though not suitable for critical production use.

2008-10-18 5:39 pm
kragil
It won’t be ready for the desktop. It is still in development.
Oracle, Red Hat, IBM and Novell want something that has even more features than ZFS (when combined with a logical volume manager).
But it will take until 2010 until you will see it in production. Maybe the next RHEL will ship it. That is just a guess though.

2008-10-18 6:11 pm
c0t0d0s0
Hmm …. i don’t think you will see brtfs in 2010 in real production. The development of a totally new filesystem is a longer process. Just think about the time ext needed to get production ready. Or the fact that reiserfs is still suspected to loose files when loaded with millions of files. brtfs it’s not like ext[1-4], which is a evolutionary development, it’s something completely new.
BTW: The whole brtfs development looks like a “Oh, heck, we need something similar to ZFS …”. I hope they make a better job with brtfs as with systemtap. And i do not believe that the ZFS development will stop … in 2010 ZFS will have more features, too.
Edited 2008-10-18 18:11 UTC
2008-10-18 6:31 pm
adkilla
Well according to the topic above, it’ll be merged into 2.6.29. So it must be ready in some shape or form to be merged into the mainstream kernel. The question is whether it’ll be EXPERIMENTAL or an initial release?
I think 2.6.29 will be ready sooner than 2010.

2008-10-18 6:40 pm
c0t0d0s0
Surely it will be experimental. There were several comments from kernel developers, who thought about copying the ext4 process with itÃ‚Â´s long existence as experimental code.
Filesystem are a relatively closed realm in itself (this is the reason for virtal filesystem layers) so you can even integrate bleeding edge code without harming the rest of the system.
2008-10-20 12:58 pm
MORB
It’ll be experimental. They want to merge it to increase its visibility and attract more developers.
In fact, the disk format is not predicted to be finalized until the end of the year, so if you use it before then, you’ll end up with partitions that the next version of btrfs won’t be able to mount.
After that it’ll still not quite yet be considered stable and safe for anything more than data that you don’t mind losing.

2008-10-18 9:52 pm
gilboa
… Maybe the next RHEL will ship it. That is just a guess though.
Doubt it.
I’d venture and guess that RHEL will continue using ext3 by default, but will offer ext4 is a certain command line option is used during installation/boot.
If indeed btrfs will be released within 2010, I -may- end up in RHEL 7.
– Gilboa

2008-10-18 6:29 pm
netpython
Is defragmentation an issue for non windows PCÃ‚Â´s?

2008-10-18 6:52 pm
Morph
Of course it is. Believe it or not, such technical challenges are OS-neutral

2008-10-18 8:30 pm
sbergman27
Of course it is. Believe it or not, such technical challenges are OS-neutral
Incorrect. Fragmentation and its consequences are filesystem (and workload) specific. Unix/Linux filesystems historically, with the notable exception of Reiser4, have been quite resistant to fragmentation.
For example, I just spot checked my busiest server, formatted ext3, which has a workload that consists of:
– 60 concurrent Gnome desktop sessions via XDMCP and NX (Web/Mail/Wordprocessing/Spreadsheet, etc.)
– 100 concurrent sessions of a point of sale package
– Intranet web/application server
– Database server
– Internal dhcp server
– Internal name server
– Samba file and print server for legacy Windows stations
– Other stuff
It has been operating for a little under a year and currently exhibits only 6.6% fragmentation.
That said, there are may be workloads that result in more fragmentation. But low to mid single digit percentages are what I typically see. In fact, in my 20+ years of administering various Unix/Linux systems, I have never at any time been in a situation in which I felt any need for a defragmenter. But as a friend of mine was fond of saying, “it’s better to have it and not need it than need it and not have it”.
Unfortunately, considering the number of new Linux users coming from a Windows background, I expect to see lots of senseless recommendations to “defrag the hard drive” in the not too distant future. For “performance” reasons… and even as an attempt to fix problems. Remember that Linspire was forced, by popular user request, to add a virus checker to their distro. Because “everyone knows” that its dangerous to run a computer without one because it might get a “computer virus”.
Edited 2008-10-18 20:33 UTC

2008-10-18 9:29 pm
Morph
Unix/Linux filesystems historically, with the notable exception of Reiser4, have been quite resistant to fragmentation.
Why is that? Someone more knowledgeable than I could probably point to some specific aspects of unix filesystem design that reduce fragmentation. But it was an issue for the designers to consider when the fs was designed, and is still an issue for people working on new filesystems today. How well that issue is dealt with by particular operating systems or particular filesystems is a separate question. (FAT certainly was notoriously bad.)
Edited 2008-10-18 21:30 UTC

2008-10-19 12:45 am
sbergman27
Why is that?
I think it likely has to do with respective history. Unix started out on the server and evolved onto the desktop. DOS/Windows started out on the desktop and evolved to the server. Unix filesystems were designed in an environment where the machine was expected to run, run, run. Downtime was expensive and to a great extent unacceptable. Defragmenting the filesystem would have been downtime, and thus unacceptable. Current community culture reflects that tradition.
Windows culture tends to look more to resigning one’s self to fragmentation (and viruses for that matter) and then running a tool (defragger, antivirus) to “fix” the problem. When NTFS was designed, Windows users were already used to the routine of regular defrags, and would likely do it whether the filesystem required it or not. So why make fragmentation avoidance a high priority?
Edited 2008-10-19 00:46 UTC
2008-10-19 8:47 pm
UltraZelda64
When NTFS was designed, Windows users were already used to the routine of regular defrags, and would likely do it whether the filesystem required it or not. So why make fragmentation avoidance a high priority?
True… it’s hard to break a habit of defragging all the time; it feels like something’s “wrong” or you’re missing something after coming from DOS-based Windows to XP in my experience. Still though, I found that I had to defragment every week, *still*, to keep the performance up. It doesn’t slow down near as bad as Win9x, but it does get noticeable.
On Linux, I use XFS primarily due to its efficiency at dealing with large files and the fact that it includes an online defragmenter. Fragmentation does still happen, and I run xfs_fsr occasionally, but it only really effects performance when I’m doing something extreme like using BitTorrent to download large, several-hundred-megabyte (or larger) files.
2008-10-19 2:24 pm
_txf_
It has to do with the fact the unix filesystems tend to allocate files on either side of the middle of the volume and not immediately one after the other.
This means that there is space after each file for edits and the appended gets allocated with the file as opposed to a fragment in the next empty space.
either way this article explains far better than I:
http://geekblog.oneandoneis2.org/index.php/2006/08/17/why_doesn_t_l…

2008-10-19 9:51 am
lemur2
– 60 concurrent Gnome desktop sessions via XDMCP and NX (Web/Mail/Wordprocessing/Spreadsheet, etc.)
– 100 concurrent sessions of a point of sale package
– Intranet web/application server
– Database server
– Internal dhcp server
– Internal name server
– Samba file and print server for legacy Windows stations
– Other stuff
Nice. Very nice.
What distro do you use, and how much RAM does all this take?
Have you thought of splitting the load between a small number of lesser servers?

2008-10-19 1:39 pm
hollovoid
Im kinda interested in that too, I consider my system kinda beastly and im positive it could not even come close to handling all of that.
2008-10-19 2:03 pm
sbergman27
What distro do you use, and how much RAM does all this take?
Currently F8 x86_64, though if I had it all to do over I would have stuck with CentOS. Fedora was pretty rough for the first couple of months we ran it but things have stabilized nicely.
8 GB of memory. I target about 128M per desktop user. 64 bit costs some memory up front, but has more sane memory management. I was running something like 50 desktops on 4GB on x86_32 CentOS 5, but sometimes zone_normal was touch and go. I had to reserve a lot of memory for it which cut into the buffer and page caches a bit. (Linux does a wonderful job with shared memory. Single user desktop admins don’t get to see all the wonders it can perform.)
BTW, the it’s a dual Xeon 3.2 GHz box. And the processor usage is only moderate. (That’s why I chuckle a bit when I hear people talk as if they think multicore is likely to benefit the average user. My 60 desktop users don’t even keep 2 cores overly busy!)
With x86_64, no, I don’t feel any great need to for more servers. I don’t have the luxury, for one thing. And more servers means more administrative overhead. That’s one reason that virtualization is such a buzz word today.
Edited 2008-10-19 14:14 UTC

2008-10-20 8:22 am
Buck
For example, I just spot checked my busiest server…
Talk about putting all your eggs in one basket.

2008-10-18 6:31 pm
Luminair
Btrfs is awesome. long live competition.

2008-10-18 6:36 pm
c0t0d0s0
Let’s wait and see
I start to trust a filesystem when someone is daring to deploy the mailservers of a large organisation on it. Many files, many accesses and a fierce lynch mob in front of the admins office in case of a data loss.
And when the same mailserver runs on the same filesystem half a year later, you can call it “acceptable” … after a year or so you can start to call it “awesome”
Edited 2008-10-18 18:37 UTC

2008-10-19 12:23 am
Luminair
that was a praise of the technology, sorry to confuse
2008-10-19 12:28 am
sbergman27
I start to trust a filesystem when someone is daring to deploy the mailservers of a large organisation on it. Many files, many accesses and a fierce lynch mob in front of the admins office in case of a data loss.
That’s why all my contracts include a provision for a trap door and secret passage from the server room. Keeping one’s passport current is also prudent.

2008-10-18 8:53 pm
sbergman27
Have a look at this benchmark:
http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.h…
With -O uninit_groups, fsck on ext4 runs 22 times faster than ext3 on that 5TB volume. Four minutes vs an hour and a half.
Presumably that will increase as volume usage increases, but still…
Edited 2008-10-18 20:55 UTC
2008-10-18 9:46 pm
BrendaEM
I wonder if there will be GUI tools for the new pre-allocation features of Ext4.
It would be cool to demand a file such as a database be contiguous.
I am not sold on the idea that a server will always fragment files faster that a desktop.
With near-instant virtual seeks. Solid state drives reduce one of the effects of fragmentation, but not the other. People often forget that fragmentation also reduces reliability, which is an issue with solid state drives, which brings a question: is it safer to move it or have multiple fragments.
2008-10-20 7:33 am
MrVain
But I wonder, does ext4 protect against silent corruption and flipped bits (due to current spikes, cosmic radiation, etc)? That is the most important issue for me. A normal hard drive always have a small percentage of flipped bits by random. The bigger the hard drive, the more flipped bits.
If ext4 protects against random bit flips, it becomes a viable alternative to ZFS indeed. All the ZFS snapshots, etc is just icing on the cake. It is the silent corruption I want to avoid on a file system.
The sad thing is, traditionally, all silent corruption is detected by the hardware, not the filesystem. Design principle: “who has the relevant information? The filesystem has. Then the filesystem should detect and correct”

2008-10-20 9:08 am
Morin
But what kind of information does the file system have that helps to fix flipped bits? To me this rather looks like an ideal candidate for an error-correcting code at the lowest (sector) level.

2008-10-20 10:35 am
MrVain
Yes, that is also a good alternative. The point is, the IO card should not manage the error correction. It should be done somewhere in the filesystem or so.

2008-10-20 11:35 am
Morin
Again: Why? Error correction is a simple but time-consuming task that can easily be done by the disk hardware (i.e. a co-processor). There’s no point in moving it to the filesystem layer – that would hurt performance because then the CPU had to do it, and not bring any advantage (at least I cannot see any advantage).
Edited 2008-10-20 11:39 UTC

2008-10-20 2:51 pm
sbergman27
Indeed. The hardware should guarantee correctness and throw an error. In fact, I believe that it already does sector CRCs. (I distinctly rememer that my Seagate ST- 4096 80MB drive did this back in the late 80s. A CRC-11 IIRC.) It may well be that the CRC is too short to be effective on today’s hardware, but that can be remedied. Why are we going out of our way to let the hardware manufacturers off the hook for selling what can only reasonably be described as defective hardware? (Hi, Western Digital! You lead the pack on this.) While I respect the ZFS feature set, I can’t help but feel that some people are so caught up in the “ZFS is cool!” mindset that they fail to recognize where it is actually taking us a step backwards.
Edited 2008-10-20 14:56 UTC

2008-10-20 3:54 pm
sbergman27
“””
The sad thing is, traditionally, all silent corruption is detected by the hardware, not the filesystem. Design principle: “who has the relevant information? The filesystem has.
“””
In what way does the hardware not have the relevant information? A CRC, of adequate length (CRC-32?), upon each sector is well within the capability of the hardware. The hardware already does one, possibly of inadequate length. What is sad is that we are seriously considering moving this thing which should be incumbent upon the hardware, back to the software, at significant processing cost.
I am not convinced that silent corruption is a real problem. Maybe it is, and maybe it isn’t. But if it is, it should be fixed at the proper layer. And that layer is the hardware, and not the filesystem. And if you *still* insist that the proper layer is somewhere in the OS, why not the block layer?
Edited 2008-10-20 15:59 UTC

2008-10-20 1:38 pm
FunkyELF
When I install a dual boot system I like to have a partition for storing data, music, pictures, movies etc…that can be accessible by both OS’s. With the driver from fs-driver.org I can use an ext2 or ext3 from within Windows. Will this work with ext4?

2008-10-20 3:05 pm
ba1l
Probably not. That driver is actually for ext2 only.
The on-disk format for ext3 is basically compatible with ext2. All it really does is adds a journal, and other features like extended attributes. An ext3 driver can mount an ext2 volume, and an ext2 driver can mount an ext3 volume.
I believe there are filesystem options that would render an ext3 volume unreadable by an ext2 driver, but I’m not 100% sure. If there are, they’re obviously turned off by default.
The same is basically true of ext4, with one exception. Extents, which are enabled by default, change the on-disk format and render it incompatible with ext2 and ext3 drivers. If you turned those off, you should be able to read an ext4 volume with an ext2 driver. I’m not sure I’d want to risk writing to an ext4 filesystem with an ext2 driver though.