It seems like we’re really on the subject of filesystems and related technology the past few days. We had an interview with the man behind BeServed, an item on WinFS’ current status, and now we – possibly – have news on ZFS coming to Linux. Possibly, because it’s all speculation from here on out.Practical Technology brings something really interesting under our attention. Jeff Bonwick, creator of Sun’s Zettabyte File System (ZFS), posted three photos of him chatting away with Linus Torvalds on a few comfy armchairs – while enjoying a few beers. The blog post is called “Casablanca”, and ends with the tentative “All I can say for the moment is… Stay tuned.”
One of the comments on the entry found out that Jim Grisanzio listed a link to this entry under “ZFS pics“, but Grisanzio himself immediately offered a less tinfoil-hat-like explanation: “Well, Jeff had a hand in writing ZFS, so that’s why I called it ZFS pics.”
The other comments on the entry debate the merits of possibly porting ZFS to Linux, which so far has been impossible due to the incompatibility between the GPL and CDDL licenses. One comment reads “Yeah, let’s decrease the reasons why people would want to pick up Solaris. Wonderful business strategy.” Others seem to agree: “If you let Linux have ZFS, it’s like porting Solaris to x86, the stupidest move you ever made. Now no one is forced to buy your computers, and your stock sucks.”
Others explain that allowing Linux to ‘have’ ZFS would mean a much broader user base for the advanced filesystem, which would mean more testing, more bug reports, and in the end, a better filesystem. It is also being pointed out that FreeBSD and Mac OS X already have ZFS, and that “the world hasn’t fallen apart just yet.”
Some people even think all this might indicate Linus Torvalds taking up a job working for Sun, but I personally think that’s rather unlikely. Then again, what’s unlikely in a world where OpenSolaris actually runs on x86 as a first-class citizen, and Apple uses Intel?
And there is good reason for that! I don’t know about the status of ZFS in MacOSX, but in FreeBSD it is yet an experimental feature. It has been suggested that it works best on 64bit systems with 2Gb or more RAM. At this point I believe few people know how to implement it, and even fewer will have any actual benefit from it. The rest of us are simply waiting for it to become mainstream. If it gets adopted by Linux, it will not take too long before everyone gets it.
Then you might want to read this comment about ZFS on BSD:
http://kerneltrap.org/FreeBSD/ZFS_Stability
>but in FreeBSD it is yet an experimental feature.
So what? Linux doesn’t care anything about quality, they would mark it stable at once.
>that it works best on 64bit systems with 2Gb or more RAM
It’s for server, not for the desktop.
>>but in FreeBSD it is yet an experimental feature.
>So what? Linux doesn’t care anything about quality, they would mark it stable at once.
Linux is a kernel, can it care ?
>>that it works best on 64bit systems with 2Gb or more RAM
>It’s for server, not for the desktop.
Oh, that’s why Sun made It the default for Opensolaris.
zfs is default on open solaris because they’re trying to show off everything solaris.
the only people looking at open solaris right now are people like us not the average desktop user.
A cloud storage system is useless to most people? that is so funny.
It is, if I need a 64 bit OS, 4G RAM and the “experimental” tag above my head. I will stick to what works, until this is ready, thanks.
I believe you mean “pooled” storage system.
You don’t *need* a 64-bit OS, nor do you *need* 4 GB of RAM. A 32-bit OS with 1 GB of RAM is plenty to run ZFS with FreeBSD 7.x.
Obviously, the higher spec’d your box, the better it will run. Putting it on an SMP server with gobs of RAM and oodles of fast SATA/SCSI disks will obviously run better than a single-Celeron system with a pair of ATA66 disks.
It is, the ZFS features are great for admins, datacenters and users who know what they are doing. For Joe user they are mostly useless…
I almost agree with you. Not that ext3 is good enough…
Do not compare ext3 with ZFS.
But ext3 + device-mapper + lvm with ZFS.
Btw, ext4 is coming.
A closer comparison would be MD + DM + LVM + XFS, but that still doesn’t really compare equally. And you don’t have nearly the flexibility as with zfs. Especially when it comes to snapshots (why do I have to leave unused space in my volume group for snapshots to work, and guess how much space the snapshot will need?).
The really nice thing about pooled storage, where you don’t have to “allocate X MB of space to filesystemX”, is that you can start to use separate filesystems for each user, or for each project, or for each special purpose. Then set the properties for each filesystem. And let the storage system manage the space. If you need to limit things, then you put a quota on the filesystem (or fs root, as sub-filesystems will inherit the properties of the root).
It’s a very liberating experience compared to LVM.
“It is, the ZFS features are great for admins, datacenters and users who know what they are doing. For Joe user they are mostly useless…”
I dont agree. Setting up a ZFS raid is like 2 simple commands. Setting up a raid with Linux + LVM is like 25 command lines. In my book, linux + LVM requires expertise. What happens if a disc gets faulty? Then you need a certificate to fix that.
In ZFS it is like 1-2 simple commands, and it is done.
An average joe needs a simple tool? Like ZFS? or does joe needs a complicated tool like LVM? I claim that joe would benefit greatly from a simple tool like ZFS. Try it yourself and you will see how amazingly simple it is.
> An average joe needs a simple tool?
You have bash and Nautilus/Konqueror (simple tools).
For lwn, there is system-config-lvm (GUI).
The documentation for RHEL 5 :
http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/Deploym…
Joe user still likes to have his pics intact when one of his cheap disks goes poof. Not to speak of backups and snapshots. Give him a good UI and he will find ZFS *very* useful.
You can use it just fine on 32-bit systems. You just have to tune it and the kernel correctly. Adding “vfs.zfs.arc_max” (set to between 1/3 and 1/2 of your RAM) and “vfs.zfs.prefetch_disable” (set to 1) will handle most of the known issues with running ZFS on FreeBSD 7.0 and 7-STABLE.
Other things that can be tuned are listed in the ZFS Tuning Guide on the FreeBSD wiki.
Having more than 1 GB of RAM available does make it run smoother, but several people are running it with only 512 MB of RAM.
Granted, it’s not perfect, and there are some issues. But it’s not the train wreck that people make it out to be.
If it was a train wreck, it wouldn’t be in FreeBSD AT ALL
I will wait until things clear up though. At this moment using ZFS is simply not justified on my systems. But it will be a nice “toy” for me on the next release.
>> works best on 64bit systems with 2Gb or more RAM.
Most 64-bit systems have more than 250MB of RAM.
I’m running FreeBSD 7 with ZFS (best… filesystem… EVER) on a dual P2 350MHz machine with 512MB of RAM.
I’ve seen 2-3 filesystem-related panics and sometimes the ZFS will deadlock, resulting in a hung filesystem until I reboot the system (a former BeOS development machine).
That said, I knew it was “experimental” still when I started using it. The benefits have outweighed the issues… it’s only a home file/print server, if it hangs it’s not a big deal.
Having 1GB or more of RAM on a server system isn’t a ridiculous requirement these days, although it has been hard for me to find some additional RAM for this box. 😛
ZFS on Fuse, using a port of the real Sun ZFS code. This gives you a big helping of ZFS compatibility on Linux without the licensing issues, since it runs in userspace. In development. http://zfs-on-fuse.blogspot.com/
Other than ZFS for Fuse, the closest thing to “ZFS on Linux” would seem to be Btrfs, principally by Chris Mason at Oracle.
Homepage here:
http://btrfs.wiki.kernel.org/index.php/Main_Page
Btrfs allows lightweight writeable snapshots through copy-on-write implementation (similarly to ZFS), checksumming of all data and metadata (similarly to ZFS) and is just starting to introduce support for spreading a filesystem across multiple volumes (again inspired by ZFS). Obviously this is a development project, not a complete product. It’s much more limited than ZFS in various ways at the moment. However, it’s looking like it could bring a lot of the benefits of ZFS.
I’d note that Btrfs isn’t a redundant effort, even if ZFS comes to Linux. The Linux developers seem to consider the ZFS code to be ill-suited to the structure of the Linux kernel; Btrfs may have an opportunity to “pave the way” in terms of determining what interfaces a filesystem of this type requires to integrate cleanly into Linux.
Finally, Matthew Dillon of DragonflyBSD is working on a new advanced filesystem for that operating system. http://kerneltrap.org/DragonFlyBSD/HAMMER_Filesystem_Design
(note, the design has changed in various ways since that post was made)
HAMMER is a very different filesystem in lots of ways to ZFS. It’s also serving a different purpose as it’s intended to support single system image clustering (i.e. build a bigger computer by networking inexpensive machines) so it has some very different priorities. The design of hammer includes some cool advanced features like extremely lightweight snapshots built into the filesystem.
“Some people even think all this might indicate Linus Torvalds taking up a job working for Sun, but I personally think that’s rather unlikely.”
I seriously doubt anyone actually thinks that.
Stop making up facts.
If you had actually read the comments I was referring to, you wouldn’t have doubt.
But then again, I guess that’s too much to ask from a troll.
People think a lot of crazy stuff, that doesn’t mean there’s anything to it.
Well that is from the original article on practical-tech.com, so if someone is making up facts it would be them. Given the multitude of people out in the world there was probably at least one person who made that speculation upon seeing the pictures though. Do I, or the vast majority of the community think so? No, but it does not mean that there are not some who would speculate so.
No, it’s from the actual set of comments on the photo blog entry.
Direct quote from the practicaltechnology article:
I guess both the photo blog entry (which I did not read) and the article had a similar quote.
If Sun’s indeed working towards pushing ZFS to Linux, then they’re f–king stupid. On one side, they’re holding DTrace and ZFS as main arguments for marketting OpenSolaris (disregarding a lot of Linux people asking what the purpose of OpenSolaris is), now they appear working towards removing one of the two arguments.
The original blog entry is tagged with ‘ZFS’. And we have a few other clues. Could “Chocolate on my peanut butter? No peanut butter on my chocolate!” refer to a good-natured disagreement on the way ZFS is layered? Wild guess, yes. But it’s better than pointless bickering. 🙂
“Chocolate on my peanut butter? No peanut butter on my chocolate!”
http://en.wikipedia.org/wiki/Reese%27s
This is a play of words made popular during the 80’s in a Reese’s Peanut Butter Cups advertisement which was aired on television in the US. In the commercial one person has peanut butter, the other has chocolate-each enjoying their respective treats, until, voila, the one with the chocolate places his candy into the jar of peanut butter-they both try the result and then the announcer comes on: “two great tastes come together, chocolate and peanut butter, try Reese’s Peanut Butter cups”…
So this reference in the photo caption is likely the opposite of a disagreement, ie. concurrence(agreement) -hinting at what it might be like if the two(ZFS and ?Linux) were combined….Of course the playfulness of this text-quoting a popular commercial of days gone bye-likely means that that this text was probably chosen to *maximize* speculation about some kind Linux/ZFS combo.
Probably 90% of all google hits for ZFS are about ZFS not being available on Linux, or how Linux needs ZFS, or when ZFS is going to have a new license to make it work with Linux, or … Sun really knows how to stimulate product fetishes(marxist/freudian manipulation of desire through unatainability-.ie.I have what you want and you can’t have it, and you want it because you can’t have it-tauto logon)- almost as good as Apple does it, the number of people who have heard the hype about ZFS outnumbers those who will ever use Solaris by at least a factor of 10
and for german speakers a little philosopher joke : Du denkst immer an das Eine, Du alter Metaphysiker
Edited 2008-05-19 00:55 UTC
My guess;
Open Solaris and Linux kernel will be GPLV3. ZFS will be dual licensed.
I doubt it’s that easy.
The OpenSolaris governance board already decided that GPLv3 is a nogo for the time being. Linus is also opposing the GPLv3, not to mention all the contributor agreements he’d have to get on a license change.
Maybe there’ll be binary modules with source code glue a la NVidia drivers. We’ll see. Maybe they’re just doing a prank.
I doubt it’s that easy.
The OpenSolaris governance board already decided that GPLv3 is a nogo for the time being. Linus is also opposing the GPLv3, not to mention all the contributor agreements he’d have to get on a license change.
Linus has been saying that he doesn’t like GPLv3 as much as GPLv2, but he also said that if Sun makes ZFS to be GPLv3 he will *probably* adopt it.
i hadn’t heard this before. Is there a link somewhere that has this quote in it’s full context?
IIRC, it was more along the lines of ZFS being one of the few things interesting enough that if OpenSolaris went GPLv3 it might be worth *considering* going through the pain of trying to relicense the Linux kernel to GPLv2 or later.
I, personally, think that such an effort would result in FUD vulnerabilities which would come back to haunt us. The kernel has, according to Linus, perhaps as many as 4000 contributors at this point. Linus and all other relevant contributors (or their heirs) would have to be found and formally agree to the change. Code whose owner disagreed, could not be found, or where the holder of the copyright was unclear, would have to have their code rewritten… tricky when it is intertwined with so many other people’s code. In short, It would be impossible to do right. What would happen would be a best effort attempt followed by a declaration that it was done. It would be an open invitation to FUD cast by anyone who could benefit from casting doubt on the Linux kernel. And what is worse, their claims would be perfectly valid. Imagine if SCO had actually had a real case?
The validity of Linux kernel licensing is the only thing in this world allowing anyone, anywhere, even the developers themselves, to use the Linux kernel.
Think about that.
Re-licensing, at this late date, would be an incredibly stupid thing to do, and we do not want to even think about going there.
Edit: http://lkml.org/lkml/2007/6/12/232
Edited 2008-05-19 14:04 UTC
Ah ok, thanks.
When looking at the public information about ZFS (I can’t read the source) from Sun blogs and other docs, I always wondered how they kept RAM requirements in check, because a log-structured filesystem has to keep a more-efficient block mapping structure in memory in order to get performance for common applications (the log by itself doesn’t give you a good searchable structure). I figured it would be a little more expensive (maybe 2X) of a traditional filesystem. But needing a 2GB machine to use ZFS?? Yikes!
I guess it makes sense… Sun is a hardware company after all. While they lack all the fancy CoW and transactional features, at least the Linux filesystems and NTFS won’t eat your machine.
All of this talk reminds me of when NT 3.1 came out. People were aghast that it wanted something like 16 MB to perform well, back when that extra 8 MB was around $300. However, keeping in mind the horrors of 16 bit Windows, it was better to see MS try to do better even if they were a little ahead of the hardware curve. The hardware curve caught up and the investment paid off.
According to the ZFS Admin Manual (p. 41), you can use ZFS on [Open]Solaris with only 0.5 GB of RAM. They recommend a minimum of 1 GB, though, for the best performance (more is always better). Not sure where you get the “need 2 GB” meme. Even on FreeBSD 7, you only “need” 1 GB of RAM.
I was being hyperbolic… sorry.
Well, All i would really care to know is this.
is it better, performance wise, than ReiserFS?
I use ReiserFS on several machines, because it is more stable than any of the others ones out there it seems, especially more so that ext2 and 3, and Linux defaults to them for some horrible reason.
try doing ReiserFS on an older PC VS ext3 on the same PC, ReiserFS is faster and more stable.
anyway, IS it, or will it be, better than ReiserFS? that is what i would like to know.
and no, just because a guy killed some one does not mean that i will stop using his code. if it works well, then why swap?
It doesn’t use less resources, so no, it isn’t “better” in the sense that MSDOS is “better” than any of those fancy modern OSes.
It is better in the sense that it has *wicked* features and is more reliable.
Details from Pawel’s talk at BSDCan 2008, including a nice breakdown of the various layers in the ZFS storage system:
http://kerneltrap.org/FreeBSD/BSDCan_2008_ZFS_Internals
Heh, is it me or you pretty much copied and pasted the whole article into the “Read more” summary? Well, that’s excusable, the article itself is so short and “sparse” on real content, you could probably describe it in details with 5 lines of text…
Thom, you might want to correct the spelling of Jeff’s name. It’s Bonwick, not Bronwick. The article you point to got it right.
Tp.
and a [email protected]. From my ZFS raid, I get like 20MB/sec. I am told it is slow because ZFS is 128bit and it doesnt like my 32 bit cpu. But everything works rock solid. I use OpenSolaris.
Another guy gets 120MB/sec from his ZFS raid, with 64bit cpu – on his blog. It seems that 64bit cpu is beneficial for getting speed.
But the killer feature of being able to rollback the /root to an earlier state, to choose from which snapshot to boot from via GRUB, etc, is really useful. And the snapshots work on a bit level, is is impossible to write to an snapshot. If you destroy a snapshot, every bit gets destroyed.
And, ZFS is so simple too. Ive heard to set ut a raid with linux + lvm is like 25 command lines with difficult syntax. With ZFS it is 2 simple commands. A server enterprise file system like ZFS is very suitable to the average joe, because it is so easy to administer and use. All other raid alternatives are difficult. With ZFS you dont have to make partitions in advance, they grow dynamically and it is light weight as creating a directory. Every user will have a dedicated partition on his own.
Like LaTeX and MS Word, all Word supporters says they can do everything with Word what can be done with LaTeX, but once they try LaTeX – they love it. Just try ZFS and you will change your mind.
And, ZFS is so simple too. Ive heard to set ut a raid with linux + lvm is like 25 command lines with difficult syntax
I have tried LVM2 and I can tell you it is a pain in the butt. I have this old 1ghz Athlon as my file server here at home, and the curious cat as I am I once decided to try LVM2. Well, it was pretty complicated but I got it up and running, worked well for a while, but then one of the disks failed…I didn’t have a replacement disk for that so I had to remove it from the system and place the files somewhere else. Now, LVM2 doesn’t make that easy. I literally had to move all the files to another system (and delete quite a bit of stuff too cos I didn’t have sufficient storage space elsewhere) just because there is no way to remove a disk from LVM2 setup without resizing the partitions and resizing partitions was a complete no-go cos I couldn’t allow anything to be written to the broken disk. Since ZFS is a pooled system it allows one to just issue a command for removing one disk from the pool and the system will just locate the files somewhere else in the pool.
Too bad that I can’t use ZFS on my server due to memory requirements, it only has 128mb RAM and due to some motherboard issues I haven’t been able to find any combination of RAM sticks that would work, only a single stick will function.. :/ I am hoping btrfs will someday be good enough for me to use.
Werecatt
If you can not use ZFS on your home server, because of 128MB RAM, maybe you could upgrade your home server? I mean, a server is important. I at least, prioritize the home server and would surely find an old computer, 1GHz + 1GB RAM will do (but you will get 20MB/sec unless you have 64bit cpu). But your data will be safe. Isnt it important enough to shell out 100 bucks on an upgrade?
As someone posted, the advantages is too great not to use on a server.
http://kerneltrap.org/FreeBSD/BSDCan_2008_ZFS_Internals
If you can not use ZFS on your home server, because of 128MB RAM, maybe you could upgrade your home server? I mean, a server is important. I at least, prioritize the home server and would surely find an old computer, 1GHz + 1GB RAM will do (but you will get 20MB/sec unless you have 64bit cpu). But your data will be safe. Isnt it important enough to shell out 100 bucks on an upgrade?
I use it to purely serve files to 4 computers, nothing else, so 1ghz with 128mb RAM is plenty. I don’t see any point in upgrading the system just to use another filesystem when it won’t benefit anything else. Of course the self-healing capabilities of ZFS are an excellent deal but as I said, the system isn’t used for anything else so I just can’t justify the upgrade.
After reading this blog http://breden.org.uk/2008/03/02/home-fileserver-what-do-i-need/ I wouldn’t mind giving it a try.
Edited 2008-05-19 16:18 UTC
I am pretty sure that it works like a charm on my 2.8GHz 32bit P4 Xeon. I am also pretty sure that I get about 80MB/sec out of these old hard-drives with compression and 100MB/sec without compression enabled.
And I can tell you for sure that for my workload (a file-server for a 30 workstation domain) it worked just as fast when it had only 512 MB of RAM. It now has 2GB for some containers that run other applications, but it worked with only 512 MB of RAM.
Couldn’t you just have used lvmove to move all your data off the bad disk and then done a pvremove to remove the disk from the volume group? Also, how could you have used ZFS to remove a drive from the pool if you didn’t have the spare space for all your data??
Couldn’t you just have used lvmove to move all your data off the bad disk and then done a pvremove to remove the disk from the volume group? Also, how could you have used ZFS to remove a drive from the pool if you didn’t have the spare space for all your data??
The subvolume spanned all 3 disks so there was no disk to lvmove to. But, there was enough space in the subvolume itself to hold duplicate copies of the files that were in the broken disk. So, if I could have just located those files on the 2 healthy disks then I wouldn’t have had any issues. But LVM2 doesn’t provide such utility since it’s just another layer..With ZFS you could just instruct it to move all the files laying in the broken disk to other areas in the spool without writing anything in the broken one. With LVM2, if I resize the volume I have no way of preventing it from writing to the broken disk.
hmm….I have to read up on LVM again. I was sure that the lvmove command basically moved all files from a specific pv and moved the data to the space available in the vg. If thats not the case, then yeah, I guess you were kinda screwed. :/
You’re correct. I suspect the original poster made a VG with 3 disks, and an LV that took up all the space in that VG. In that situation, you’d have to:
– Use ‘resize2fs’ (assuming ext2/3) to shrink the file system to fit on the two remaining disks.
– Use ‘lvreduce’ to shrink the LV to fit on the remaining disks. Make sure the file system is smaller than the LV!
– Use ‘pvmove’ to move the extents off the PV to be removed.
– Use ‘vgreduce’ to remove the PV from the VG.
The lesson here is to only allocate space to a volume when you need it. Don’t just make a giant LV that takes up all your space. It’s a lot easier to expand a volume than it is to shrink it. You can even expand a ext3 file system while it’s on-line.
Also, ZFS is fantastic. I have a few Sun T-series boxes and it’s just so damn easy to use.
That’s the one thing I can’t stand about the current state of volume management on Linux: you have to allocate your disk space at logical volume creation time!
What’s the point of having 5 TB of disk space available if you have to determine how to partition it before you start using it?
I used to be a big proponent of LVM, and have used it a lot on multi-disk arrays (mainly for storage of Xen and KVM virtual machines), and it does have it’s uses. But having to figure out exactly how much space will be used by what when creating the LV is a pain. And if you get it wrong, re-doing it is even more of a pain.
Add to that the whole “save space for snapshots” issue, and LVM is starting to become more of a pain than plain partitioning.
After using ZFS for the past couple of weeks, creating raidz pools, creating filesystems for /usr, /usr/local, /var, /usr/ports, /usr/src, /usr/obj, /home, /home/user1, /home/user2, basically creating a filesystem when needed, and then setting the properties on a per-filesystem basis (gotta love inheritance) without having to worry about getting the space allocations perfect at the get-go has been a godsend. Just set quotas as needed (when needed) and let zfs manage the rest.
Add to that snapshots you don’t have to save space for, and that take next to no time to create, and very little time to rollback, and you have a storage management system that makes DM+MD+LVM look archaic.
I think that file pre-allocation for minimizing fragmentation is going to neat.
.. an ELER strip