The Intelligent File Format

Thom Holwerda 2006-02-22 OS News 21 Comments

“What would happen if the beginning of file systems embedded a driver for accessing the disk? If the driver was in some sort of neutral format (similar to the X Windows drivers), then any OS could access the file system! While this concept was exciting in of itself, it didn’t even begin to scratch the surface of what was possible. It wasn’t long before I considered the fact that a file system is nothing more than a hierarchical database. There’s nothing inherently special about it, so why can’t the file system payload be replaced with some sort of other data? As long as the embedded driver can read the format and produce some sort of usable data structure, there’s no reason why the concept could be extended for all types of data!”

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

21 Comments

2006-02-22 7:52 pm
aarnott
The article seems well thought through. The one part I didn’t care too much for was his short jab at .NET security, dismissing .NET because of it and choosing Java.
.NET was designed with security in mind from the beginning. Investigate Code Access Security, App Domains and safe code, and you begin to see the great depth to which .NET has gone to be very, very secure. It has even more control than Java, I believe. Not to start a flame war. I just think that .NET should not have been dismissed to casually without justification.

2006-02-22 7:59 pm
Nathan O.
Agreed, but he mainly dismissed .NET for its lesser portability. Also, he mentioned C#, not .NET. So in the literal sense, he’s implying a comparison of C# (the language) with Java (the platform).
Either way, I imagine the implementation should be agnostic of the design. That’d be the most portable way of doing things.

2006-02-22 7:54 pm
Nathan O.
So, my understanding (taking a few liberties) is that we’re talking about putting the filesystems for all the partitions on disk in a sort of meta-partition at the beginning of the disk. All an OS needs is to be able to read whatever simple (but stable!!) filesystem is holding the drivers for the rest of the partitions (where the “real” data is) and have the ability to assume from data in this meta-partition how to mount / address / drive (via drivers) the partitions on this disk.
Then the author takes it a step further and applies this concept to individual files, in a fairly direct fashion, pointing out that there are some kinks with this that could be worked out, such as file size (when every little file includes its own bulky driver).
Maybe the metapartition at the beginning of the disk could include file drivers. Then, rather than appending whole drivers to the front of every file, you could just append a pointer to the proper file driver on the metapartition. Kind of like MIME.
Unfortunately, any way you slice this second part of the concept, it makes it a lot harder to implement existing filesystems in to the primary concept (filesystem drivers in the superblock / first part of the disk).
It sounds like a *very* good idea. Sadly, it’ll likely stay nothing more than a good idea, but that doesn’t make it any less of a great idea!
2006-02-22 8:06 pm
PipoDeClown
the writer suggest a new file format, but what he actually wants is an object oriented file system, where the data is alive instead of just static ones and zeros.
he binds it to one programming language which is not portable at all.
i rather have some filesystem object watching and identifying what type of file iam opening.
Edited 2006-02-22 20:09
2006-02-22 8:10 pm
stew
Similar ideas have been designed and implemented in the Monads PC and SpeedOS:
http://www.informatik.uni-ulm.de/rs/projekte/monads/SemanticFilesE….
http://www.informatik.uni-ulm.de/rs/projekte/monads/DirectoriesE.ht…
2006-02-22 8:11 pm
r2d2d3d4d5
Still waiting for one of those MiniOS in BIOS to come out. With FS Reader/Writer-Advanced System Checks-Configurable Boot Loaders-Upgradable-etc…
2006-02-22 8:15 pm
evert
It’s impossible to describe a general “filesystem driver architecture” as long as one of the main functions of any OS is to implement it’s own filesystem subsystem.
2006-02-22 8:17 pm
BigZaphod
I admit to not reading the entire thing, but this seems amazingly misguided and silly. By specifying the format and how to parse the file, you’re not really gaining anything. Why? Because you still need an application program that understands the parsed output in order to do anything useful with it!
Say you have a file that has a bunch of map coordinates in it. If I understand the gist of this idea, basically the file format would have a header in it that would have some fancy Java code which would read the file contents and perhaps present an array of values to the application program to use. That array would be the map coordinates. Okay, fine. What happens if I try to open this file in a word processor? It has no clue what an array of numbers means and no matter how you try to present that info to the user, it won’t make sense in all cases. So now you are back to square one and having to add a ton of logic to the application programs to understand how to use this pre-parsed data. You might as well skip the pre-parsing since it gets you nothing and just keep things the way they are now – or move to XML which is basically the same solution but without the overhead of embedded java code, the requirement of having a Java VM just to read a data file, and the potential for nasty viruses and exploitations. And even with XML you have the same problem of needing an intelligent application program in order to make any real sense out of the data that’s presented.

2006-02-22 8:23 pm
Nathan O.
True, but I think you’re missing the trees for the forest (which is usually what you’d prefer if you had to pick that or the inverse). This idea would add a lot of flexibility to a system. It’d be kind of like having a device for each file type. That, and it’d make it easier to write FS drivers cross-platform.
No, this idea doesn’t do much for the current way OSes work, but as a concept, it opens up a few possibilities. We’ll never know what it’d let us accomplish until we implement it and poke it around some.

2006-02-22 8:46 pm
BigZaphod
I don’t think it’d make anything easier. As PipoDeClown said above, I think this person really wants an object oriented file system. Things like that exist – for instance in Smalltalk or Squeak systems where there really aren’t files – only objects within the system image. It’s a very cool concept and one I’m a big fan of myself, but assuming it’d make it easier to handle cross platform portability is a mistake. Look at it this way… if I was building my own OS or something like that, I’d have a hard time reading these files until I got things to the point where I had an entire Java VM working in my system! That’s pretty heavy stuff there. I don’t see how it makes anything any easier.
A filesystem is an agreement for how to store bytes on a disk. Putting a driver into individual files doesn’t help solve that problem at all. The OS still needs to understand the concept of a file in order to even read and store these magic file formats. Those files also need to be managed and organized (usually into folders and the like). The number of tradeoffs in filesystems are immense. Do you want fast access to tons of small files? Fast access to large files? Tons of files per directory? Unlimited volume size? Ability to resize volumes on the fly? Ability to take snapshots of volumes? Etc. All these factors (and many, many more) contribute to the design of the layout of the bytes on the disk. Those problems still need to be solved at some layer and putting all this intelligent logic into the files themselves completely misses the point about what problems a filesystem is actually solving.

2006-02-22 9:14 pm
Ronald Vos
Indeed. Being forced to be able to run Java code is a pretty hefty requirement for OSes, and one that implies dependence on Sun.
I think he’s almost on the right track, but missing some points entirely. Instead of code to interpret filesystems, which usually is very kerneldependent, he should go with something akin to schematics that are interpretable by different OSes.
Then however, you get a problem with radically different filesystems like ZFS, which requires handling harddrives differently, instead of merely a slightly different file layout.
2006-02-22 9:51 pm
Nathan O.
Oh, the part about drivers in files definitely makes it harder to write portable FS drivers. What I meant (and very unclearly said!) was that the initial idea- putting the FS driver in a generic, OS-agnostic area on disk- could make it easier.
But yeah, files with built in drivers, that’s a different story!
I think it’s understood, if not by the author, that non-Java implementations could be made, so I think any issues with being Java-exclusive are moot. Of course, his part 3 implementation explanation would then need to implement files without JARs, but that doesn’t affect the overall concept. Implementation detail.

2006-02-22 8:40 pm
transputer_guy
There must be something in the air, as an end user on several different OSes my gut tells me there is very little really interesting under the hood in most FS in use although I have yet to read up on specifics of the Reiser or BFS designs. The article a few months back on the MS labs OS where everything is a process seems much more promising to me than using this Java idea but there are some nice ideas here too.
At the file system level I want to see the entire directory structure and file descriptions, abstracts or thumbnail of some sort held in memory where they can be searched more or less instantaneously. The problem with File Systems is that they seem to serialize lots of extra things so that for one thing to happen, lots of small files must be continuously fetched 1st. One should never open a directory and see these long pregnant pauses. I would regard my hard disk as a final repositary for a hierarchical database, the indexes should be in ram, the raw app specific file data on disk some cache. The Finder/Explorer/Tracker then looks & feels more like a responsive CAD engine.
I would also like to see every directory, every GUI widget and application described also as a concurrent nested set of processes using C++ syntax this would add ports to the class declaration. Now I can see concurrent apps being much easier to build and compose since they just look like files and directories. Some of this is already here but I like to generalize, I come from a Verilog background. I note that almost every app I use on Windows (but not BeOS) seems to be a database already and has to be written from scratch, reinventing stuff not provided by the OS, Thumbsplus, Winamp come to mind, even OpenOffice.
I know, I know, one can dream or implement
2006-02-22 8:45 pm
rajj
I suppose what he means by X Windows and “neutral drivers” is XFree86/Xorg. Their implementation isn’t portable because the drivers are in a neutral format; it is portable because XFree86/Xorg’s drivers access hardware directly and duplicate some of the operating system’s bus enumeration and interrupt functions. XFree86 has been justly accused of being an operating system within an operating system.
Filesystem drivers tend to be very highly married to kernel internals. It might be easy enough to abstract the filsystem API out when on operating systems that are UNIX like (read/write/open/close), but it certainly won’t map well on others that are not. Of course, with enough glue one can stick anything together; however, the result may be less than appealing.
The author uses the Newton as an example, but the Newton doesn’t have a filesystem. It has a data
storage mechanism known as a soup [1]. It is more or less “flat” with records contained in namespaces for each application that uses them.
The UNIX notion of a filesystem is a heirarchical namespace with objects (files) that represent a logically contiguous stream of bytes. There is no internal structure imposed on the contents of these bytes, and I believe this to be the primary strength of the filesystem (though others keep insisting its a weakness). Most every operating system in use today uses this model (including Windows).
The belief that any and everything can be solved by adding n numbers of indirection is a pipe dream. Sure we can make everything close enough alike to fit inside the grand and all encompassing abstraction, but then what is the point? Everything is the same anyway.
[1] I’m not going to call it an object database because that’s the most nebulous term ever; whatever is an object anyway? One could easily say that the files in a filesystem are objects. As far as being a database, anything that holds data, quite frankly, is a database.
Edited 2006-02-22 20:54
2006-02-22 8:49 pm
transputer_guy
Okay from previous monads post links I can see others are thinking along same lines, files as a process object includes raw data and possibly methods, perhaps also liveness.
Thats what this place is for
2006-02-22 9:21 pm
Troll
For those of you who remember AmigaOS, the first part of the article should sound familiar…
AmigaOS had the ability of storing file system code in the “RDB” (Rigid Disk Blocks).
As for the IFF (an unfortunate choice of name, or a blatant reference to a sadly forgotten file meta-format), this sounds like a nice concept.
Maybe separating the loader form the payload so that to save the overhead…
But this sounds like awfully like codecs ( and datatypes before that on AmigaOS) or even xml DTD/schemas, only this time the usefulness is restricted to Java applications.
Granted, I’d like to see an implementation of all this, but I’m not sure it would be that usefull or successful.
2006-02-22 11:54 pm
snozzberry
There are some compelling arguments for multiple filesystems on a disk. Half the reason for the original Macintosh data/resource split was that the resources were stored in a more readily accessible location for fast retrieval.
What we seem to be talking around here is that OSes, applications, and documents have different needs for access. One filesystem to rule them all is a good idea, but ends up compromising heavily.
OSes need secure access, journaling/redundancy, and the ability to recognize/survive minor corruption.
Applications and temporary files need fast access, caching, secure checksums, and optimization.
Documents probably need the least complex filesystem.
Why should all three share the same partition/scheme (and its fate)?
Edited 2006-02-22 23:55
2006-02-23 2:24 am
DigitalAxis
This sounds like the self-extracting .zip file, or the XM2EXE embedded player http://www.un4seen.com/ or the emoviX distribution (burn your movie to a CD, with a bootable player mini-distro).
Edit: After reading more of the article, I noticed that he already said this. And it sounds like what he wants is probably more like a file specification that can be universally read, rather than an executable
In the end, won’t this add massive amounts of extra code no doubt duplicated across the file system?
I can see how this would work for devices- since you’d only likely have one, or a few components of the type on your system, having their own drivers would make sense, and seems to follow the USB idea of all devices identifying themselves.
With files, you’d end up with massive amounts of duplicate driver/viewer/player code. In the case of files, the “File Handler” idea seems better- one implimentation of a handler for each type of file, thus allowing all programs to share the file handling code and abilities. What I’m thinking of in this is the following case: A very old version of a media player, playing the very newest file format simply because you’ve installed the handler for… Theora Mark 3, and that old player can use the ‘video codecs’ handlers. Or whatever.
I get the impression that most OSes have SOME degree of this ability- I know Windows has it for audio/video codecs and Linux has it for a lot of file types- upgrade libpng, and new programs should theoretically be able to use it, as long as it’s binary compatible… And RiscOS, BeOS and SkyOS have apparently used the concept all over the place.
Edited 2006-02-23 02:35
2006-02-23 4:00 am
ojh77
I don’t see what this has to do with actual file systems – all he seems to propose is put a decoder plugin in the header section of multimedia files. I don’t see what this has to do with the underling file system.
Also the he proposes using a stack VM like JVM or CLR for the decoder. I can only conclude a serous lack of research behind this proposal. Given the performance critical problem of multimedia formats that he seems most interested in an abstracted register CPU code like LLVM would seem the only logical choice.
2006-02-23 8:34 am
gdiaz
Plan9 (http://plan9.bell-labs.com)uses one protocol to speak to any filesystems, so you can write a filesystem the way you need in an easy and convenient way.
This idea remainds me of the other about having the devide drivers inside the device itself ( sharing a common protocol for all devices to be albe to get the driver).
pink world view. . . the companies need to compete, so there is no way of having such things (look at VESA for example ), unless all of them become opensource :-), or at least, use standard and convenient protocols and formats and not propietary ones.
But today, event Microsoft will open their file formats, so the needs for such things are getting low with the time.
time will say
2006-02-23 7:14 pm
John Nilsson
Is supposed to be anything like content negotiation (rfc2616)?