The Infrastructure and Advances in ReiserFS 4

Submitted by Rayiner Hashem 2002-11-19 General Development 19 Comments

This is an informative article about the upcoming Reiser4 filesystem. It covers a lot of the performance improvements in Reiser4 along with a lot of ideas about a database filesystem.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

19 Comments

2002-11-19 2:00 am
Anonymous
Hans Reiser says something I’d consider fairly controversial:
In B-trees objects are stored in all of the nodes of the tree. In B+trees all objects are stored in the leaves. B+trees have far greater fanout because the internal nodes are purely pointers with no space in them consumed by data. B+trees also have far fewer internal nodes, and this means that internal nodes are far more effectively cached in a typical computer configuration. B+trees are much higher performance than B-trees for that reason.
One thing to keep in mind is that any lookup within a b+tree will take the same amount of time as the worst-case lookup in a similarly structured b-tree. That’s not to say that the average case lookup will generally be faster because there are fewer internal nodes.
I just don’t like his wording that “b+trees are much higher performance than b-trees” When evaluating general purpose data structures one finds that their performance characteristics vary wildly depending on the dataset they store.
2002-11-19 2:03 am
Anonymous
Hans and friends better have fixed the huge bug which caused NFS exports to unexplainably not work. I’m using 2.4.19 and some partitions will export but not all. This had better be fixed in Reiser4 or I will never use it again.
JFS or XFS will woo me back.
2002-11-19 4:09 am
Anonymous
Just wondering about how RFS managed to implement atomicity and attributes while preserving compatiblity with the Posix FILE semantics, though.
Also, not really familiar with RFS API on how queries can be fired from the shell. Maybe somebody could shed some light on it.
2002-11-19 5:07 am
Anonymous
My understanding of it is that with a b+ tree the internal structure of the file system is not diluted with file data. Since it is contigous you’ll be able to cache more of it in one read, and hence, file lookups will be faster. Disk I/O is the limiting factor for reading files, not cpu cycles.
2002-11-19 8:27 am
Anonymous
They should better head forward and fixing all the issues and remaining bugs in their current system instead writing a new one.
2002-11-19 9:25 am
Anonymous
Somebody *please* pay Namesys to implement efficient file update notification. With efficient small files and efficient update notification, it would be quite easy to write BeOS style live queries and extended attributes for ReiserFS by using small files as attributes as suggested by Hans Reiser.
NTFS has efficient file update notification, but it lacks the ability to handle small files efficiently. And besides it would be pointless to implement extended attributes and live queries for Windows since Microsoft will move to a database file system anyway.
And I would rather see live queries for KDE. KDE 3.1 with extended attributes and live queries would be heaven. I would finally have found a suitable replacement for my beloved BeOS….
regards,
tuttle
2002-11-19 11:31 am
Anonymous
@tuttle: concerning updates, there is the DNOTIFY extension in Linux, that can be used to be notfied about changes in directories. KDE’s KDirWatcher uses it.
What Linux can not do, right now, is notify an app of every change in the filesystem, as needed for indexing the files. Is this what you call ‘live queries’? I don’t know BeOS at all… any pointers?
2002-11-19 12:02 pm
Anonymous
I researched the file notification mechanisms in Linux since I considered implementing live queries for linux. There are basically two mechanisms available. One is the “DNOTIFY” api that is available since 2.4, the other is the FAM library from SGI and the imon kernel module that is used by KDE and I think also gnome.
The problem with both APIs is that they do not work recursively. When you e.g. watch your home directory, you only get notification when a file changes directly in your home directory, which is fine for a file browser but otherwise basically useless.
For life queries of attributes it would be nessecary to recursively watch for all changes of files with a certain name. In Win2k, this is a one-liner using the FileSystemWatcher class of the .NET framework. In Linux it is impossible. Sad but true.
Live queries make most sense when combined with arbitrary attributes. For example, to see all mp3s by tory amos, you would just set up a query for all files where the mime-type attribute is audio/mp3 and the singer attribute is “tory amos”. The results were displayed using a window that looked just like a folder view. Whenever a new mp3 would be added to your system by whatever program, it would instantly appear in the window belonging to the live query.
That is a really neat way to organize data. The more mp3s I collect, the more I miss this functionality. Searching for files using find -exec grep is a PITA!
regards,
tuttle
2002-11-19 12:44 pm
Anonymous
<quote> Live queries make most sense when combined with arbitrary attributes. For example, to see all mp3s by tory amos, you would just set up a query for all files where the mime-type attribute is audio/mp3 and the singer attribute is “tory amos”. The results were displayed using a window that looked just like a folder view. Whenever a new mp3 would be added to your system by whatever program, it would instantly appear in the window belonging to the live query.</quote>
Ok just enable indexing on W2k or WinXP. From the indexing help:
Indexing Service overview
Indexing Service is a service that extracts the information from a set of documents and organizes it in a way that makes it quick and easy to access that information through the Windows XP Search function, the Indexing Service query form, or a Web browser. This information can include text from within a document, (its contents), and the characteristics and parameters of the document, (its properties), such as the author’s name. Once the index is created, you can search, or query the index for documents that contain key words, phrases, or properties. For example, you can query all documents containing the word “product” or you can query for all Microsoft Office documents written by a specific author. Indexing Service returns a list of all documents that meet your search criteria. For information on the different ways to create a query, see Using the Indexing Service query language.
2002-11-19 1:35 pm
Anonymous
@smurf975:
That is all very nice, but I want this feature for Linux.
Are the indexes automated automatically when some file changes? Or are they just faster searches?
regards,
tuttle
2002-11-19 2:58 pm
Anonymous
Ummm…if you have more than one person working on a project you can get more than one thing done at a time. Anyways, in my experience, sometimes its easier to step back and rearchitect a solution that takes care of many existing bugs in a measured way than just to hack at something until it works.
We linux users should be grateful to the Reiserfs people. It’s really the fast developed, most innovative file system going on the linux platform. The others (ext, jfs, xfs) are pretty much legacy systems that have been flat ported or augmented to work on linux. Competition is a good thing.
2002-11-19 3:19 pm
Anonymous
I guess the answer is, if you really want it, dig into the kernel sources and implement it. Shouldn’t be that hard.
2002-11-19 3:26 pm
Anonymous
I think it is quite hard. You would have to intercept every single system call where files are created, modified or deleted. If you take a look at include/asm/unistd.h you will see that almost every linux syscall has something to do with files.
Another problem would be what to do with special files like devices. If I intercept every single read to /dev/zero or every single write to /dev/null the system would probably crawl to a halt.
I think it is a big deficit that an unix-like operating system with a philosophy of “everything is a file” does not have a simple way to be notified when a file gets changed.
Writing the live query functionality on top of a notification facility would be hard enough, but having to start from zero is just too much 🙁
regards,
tuttle
2002-11-19 4:00 pm
Anonymous
I run Reiserfs 3.6 and I couldn’t be happier with it, it’s imo the fastest around.
No doubt though that still to this day I too miss the sleekness of BFS filesystem and BeOS, it was super fast, journaling and instant updating (FAM works like crap on my Gentoo system, probably cause I run bleeding edge of everything).
Anxiously waiting to try out Reiser4.
2002-11-19 4:00 pm
Anonymous
You don’t want to be notified when a file is being read, and not even when it is being written. Because, in the former case, no one guarantees you that the file is in a sane state after a write. You only want to be notified when a file, that has been opeened for writing, has been closed. And when somebody changes a directory (unlink, move etc).
Then write these actions into a ring-buffer (e.g. just append a line “closed /path/to/file” when a file has been closed). This ring-buffer can be read using a device node, and when a process reads it, delete the lines that has been read. The only problem, that I can see, is: what happens when the ring buffers gets too big, because the process can’t read fast enough? Probably either just put a message there (so the process knows that it has to re-index everything), or, more sophisticated, summarize with a message “from now on a ignore ever change in /path/to/file because they are too many”. Then you only need to re-index a smaller directory.
2002-11-19 5:54 pm
Anonymous
Why should I have to wait for the file to be closed? Tools like “tail -f” are able to display changes of a file without it being closed, and so should the update notification. Or do you mean something more lowlevel than fclose?
The whole stdio.h approach to accessing files is hopelessly inefficient for many small files anyway. That is why reiser4 will come with a new API that does not need file descriptors for each file that is being opened.
Would implementing the close notification you propose be easier than a full update notification like in NTFS? If so, it might be sufficient for attributes.
I guess I just have to wait and take a look at the new Reiser4 API when it becomes available. Maybe it is possible to implement update notification using these plugins Hans Reiser keeps talking about.
2002-11-19 7:18 pm
Anonymous
“tail -f” is a very special case and is only useful text. It is not realistic to parse files while they are being written. close(2) should be ok for more than 99.9% of all applications.
2002-11-19 7:25 pm
Anonymous
Or anything with reiserfs4 vs other current fs, or is it still not ready for tests?
2002-11-19 7:26 pm
Anonymous
jeeze i feel stupid