File systems need to change. Current file systems are horribly out-of-touch with the realities of what users need to effectively find, organize, and modify their vast quantities of files. Unfortunately, no major consumer OS vendor (Microsoft, Apple, various Linux distos, etc.) has had the foresight, the will, and most of all, the cajones to implement anything more elaborate than a small departure from the standard hierarchical name-space which we all grew up on and should rightfully deplore. Worst of all, the best suggestions for changing the current entrenched standard are incredibly toothless, incredibly feeble.
Not a single proposal I’ve read has ever started by considering the most important motivator of good file system design: how will the user interact with it? The tacit assumption that everyone seems to be making is that by making it easy for programmers to code for, the program can then present whatever “user-friendly” interface it wants to the hapless user. This is a damaging notion, and is born of the misplaced belief that any user-centered file system must necessarily be difficult to interface with for the programmer. False! Programmers must translate user intent into machine code. When the file system is created to support a user’s intent, the programmer’s job becomes trivial. This, then, is our task: design a file system sympathetic to the user’s weaknesses which nonetheless fits all the needs of programmers. For this I propose the Brendan File System, or BFS.
The Lost Art of Collating
The traditional hierarchical structure was introduced to create structure where there was none, in order to aid both the user and the programmer in categorizing information. Unfortunately, it became the only way to organize structure on-disk, and so became much more restrictive than the flat file system’s disorganized chaos. We’ve come to the point in OS design where over-simplification causes nothing but headaches and provides no measurable performance advantage. BFS throws hierarchy out the window. Instead, folders become “groups”, and files can meander about and show up wherever they’re needed, even in multiple places at once. Groups function much like folders, but a file is never tied to a particular group; files have an independent existence and can belong to any number of groups (or none!).
Let’s say a user is starting a new programming project. She first creates the group for the new project, optionally names it (who said a group had to have a name?), and starts adding some of the relevant documentation to the group. Need that primer on C++ templates? Add it to the group, and never mind how many other “copies” you have floating around in other groups of yours. When you’re done, you can just remove it from the group and you needn’t worry about it disappearing or being deleted before you need it again. There is, of course, a separate delete command available if you never want to see the file again. Certainly, all this can be done with folders and symlinks and the like, but it’s just so much cleaner this way.
Finding Simplicity in Complexity
Well, if you don’t have any directory structure, how on earth do you find things? You can’t use file names because there’s no guarantee you’ll remember the name (besides, who said your file had to have a name?), and you can’t use the file’s “location” because it doesn’t really have one, so what to use? The best answer is a good search mechanism. The Be file system basically nailed this with extendable attributes and live queries, and the BFS embraces this system whole-heartedly. Issue a query for all mail messages from Bob Smith and you get a list of all of them, updated in real time as new ones come in. But where the Be file system made these queries persist at the application level, the BFS lets users and programmers create and extend groups by adding a search pattern to their metadata. The user could create a group for storing all her HTML documents which would automatically be updated as new ones are created; the user need never physically add an HTML file to the group. Of course, you can make groups which are mixtures of a live query and user-placed files. In the HTML group, you could also stick your favorite browser for easy previewing. The possibilities are limited only by the complexity of the search and the imagination of the user. This has tremendous potential for the programmer as well, and will be further elaborated on in the section on the System group.
Metadata: Man’s Best Friend
Metadata is worth its weight in gold. There is absolutely nothing as important to a modern file system as metadata. Without it, you simply cannot find, catalog, or classify the vast amount of data present on even the lowliest consumer PC. To that end, there is absolutely no reason why file systems should in any way limit the type or quantity of metadata associated with a file. Once again, the Be file system shines in this regard. Storing small, keyed attributes, such as the file’s type (probably as a MIME type), the default application for the file, the various dates associated with it, and the other, custom, types is the only sane way to organize and classify data in a modern filesystem. To this end, everything about the file, including it’s name (and any localized names), should be stored in the auxiliary metadata file associated with every regular file on the system.
Of utmost importance is that there be no limit on the size of type of the metadata present. The metadata file is designed to be the way to identify a file. Keywords, document titles, authors, etc. should all be stored in the file’s metadata, allowing the user to issue searches for “all documents with author ‘Brendan’ and contain the word ‘money'”. Nothing less will suffice as the gateway to the vast quantity of files a use must deal with. Metadata searching is the biggest aid to productivity since the whip, and BFS will have it.
Some Groups are More Equal than Others
Okay, while it is all fine and dandy to have a system that is extremely easy to navigate for the user and which does away with many of the annoying constraints of modern systems, there is a vast swath of code out there written for the usual, hierarchical structure and which is not going to change over night. Very well, since programmers demand a highly structured disk layout, we shall give it to them in the form of a System group. Like the “blessed” folder on the Mac OS and the root directory on Unix, this is where the off-limits system files and usual hierarchical structure reside. If the program gets a directory listing of “/usr/bin”, the file system will return all the files present in the group System -> user -> bin. If a program deletes a file, it disappears from all the groups in which it currently resides. Do note that, although the System group looks just like “/” to the programs, the hierarchical structure is only skin deep. The actual layout on disk bears no resemblance to the hierarchical structure present in System.
Beyond just providing a compatibility layer for older programs, the System group can play a greater role in aiding applications in finding the files/information they need. For instance, even if the user doesn’t have all their applications in a single group named “Applications”, the System group could contain a group called “Applications” which would have a live query associated with it that found all application files on the system. Hence, no more expensive searching to find the location of a certain application and no more popping up a dialog asking the user to locate one. The same goes with other things of interest to programs such as: all removable media, all USB devices, all connected monitors, etc. Unix started the trend of making everything available through the file system, why not finish it?
Doppelgangers
So far we’ve divorced files from their locations, from their names, and from their metadata constraints. Why not divorce them from their file type too? Why not add a plugin architecture (technically not part of the file system, but it’s related) which could allow applications to install various filters for certain types of files. Why have to save a file once as a PNG and once as a GIF to have access to both types for different applications? BFS and any related software above it will let applications install a PNG -> GIF converter to allow any application which understands GIF to open and PNG document. A program need not look only for the files it can understand, but simply tell the OS which file types it can accept and let the OS decide whether a given file can be converted to the appropriate type.
Beyond even this, BFS will allow users to create “doppelgangers” of various files whereby any modification to the original file will automatically trigger an update to the doppelganger. Imagine having an important group full of billing information for your company. Create a doppelganger of that group which gzip’s the group’s contents into a backup archive. As changes are made to the original files, the doppelganger is replaced with the compressed, updated files. The link between the original file and its doppelganger can be broken at any time, thus making it easy to create stationary snapshots.
Implementation Details
The user interface for all this should probably be much like the spacial Finder of Mac OS 9. However, the multitude of windows spawned should be cut down to one. A zooming interface wherein groups and their contents zoom out to let you inspect them is probably better suited to navigation. The desktop metaphor works very well with the group metaphor in that the desktop is always your temporary dock where you can toss stuff onto and then wipe clean again. Note that tossing files on to the desktop does not remove them from their previous groups. You need to issue a “remove” command to make that happen. Groups should probably be displayed as a clump of the files it contains. This allows the user to quickly glance and see which groups are fuller than others.
Files, without a location, without a specific group to belong to, without even a name, should be identified at all times by a unique number, probably 64 or 128 bits in length. This number should be unique to the volume that they are on and should be intelligently used by the system to find a given file and to index information about a file. For instance, there might be various B+trees scattered about the disk, all of them containing a portion of inode lookup tables (sort of like the directory structure of today). A file with a number ending in 16 one bits would be found in the B+tree associated with 16 final one bits and so on. The file number should be persistent so that applications can store the number in a file and then find the file again after a restart. Note that the file number does not depend at all upon which group(s) the file resides in.
Metadata is likely to take the form of a small file and can probably be efficiently packed into blocks with other small metadata files. Various techniques like placing the first couple of bytes directly after the file inode should also help. Metadata journaling should be present as well.
Finally, the elimination of a hierarchical structure should free the implementor from having to worry about unbalanced directories, strange create/delete patterns that strain block allocation policies, etc. Further, no centralized directory need exist; all information for a section of the disk could be stored at the head of the section like ext2 and could be easily recovered and indexed at boot time.
The Final Retort
As I have said previously, the current hierarchical structure of file systems is unnecessarily constraining. Creating a more free-form storage allows the user or the System group to create hierarchy when necessary and refrain from doing so when unnecessary. The end result is a system more suited to a user’s organizational skills and less dependent on the fragility of keeping a location and name for every file on the system. Addition of superior metadata handling is the coupe de grace which creates order out of chaos and fully exploits the human ability to describe rather than identify.
I Want to Know More
Raskin, Jef The Humane Interface Addison-Wesley, 2000
An excellent introduction and forward-looking view of human-interface design. A must for any serious programmer dealing with GUIs. Check out his website http://humane.sourceforge.net/home/index.html.
Giampaolo, Dominic Practical File System Design Morgan Kaufman Publishers, Inc., 1999
The creator of the Be file system tells you everything you ever wanted to know about creating a file system. Very interesting read. Buy it at Amazon.
Reiser, Hans Name Spaces as Tools http://www.namesys.com/whitepaper.html
The creator of the ReiserFS talks about his vision for the future of file systems. Rambling, at times completely incoherent, but holds a couple of jewels of insight. Be sure to check out the section: “Storage Layers Above the FS: A Sure Sign the Developer Has Failed”.
About the Author:
“I am currently a junior at the University of Wisconsin – Madison studying Mathematics. In my spare time I program for fun and profit for a small company doing graphical logging of roadways. I also manage to inflict on the world my limited theatrical talents by performing in plays, musicals, and the like.”
It’s an interesting idea. Perhaps one of the many hobbyist OS projects (IE BugOS, Mechanix, etc…) could pick up on this while they’re still in the early stages. I know I’d at least be willing to give it a shot.
That’s cojones, I think… cajones means boxes.
Anyway, with Dominic Giampolo working for Apple, I’ve got a secret hope that he is making BFS++ for OSX…
In fact, the most interesting thing I’ve read on this site in months. It would be nice if something concrete comes out of this article.
Check out how directories are handled under Plan 9, acting very much like your ‘groups’ idea, instead they are called union directores…for example…
Under Plan 9, with a boot floppy and a install CD
the floppy is mounted at /n/a: and the CD at /n/dist, I can effectively create a 90% fully functional Plan 9 without installing to hard disk, I merely bind /n/dist/386/bin /bin (I believe that’s the syntax, I haven’t used P9 for a while), and the like for the library directories, so now if I look for a program in /bin, it looks at /n/dist/386/bin too automatically.
Which extends the way you can use resources off a network: simply binding them off the network to your desktop, you can seamlessly use these resources (such as devices, file stores, etc), as if they were on your own computer.
Neat, huh?
..well, IMHHHHHO, the author is a bit confused about how a modern OS
works. The things discussed in his article belong in different layers of abstraction. There are things that go into the application level (Doppelgangers,
for example) some things should not be built into a FS, unless you want to bloat
the driver to the point you need a 700MB CD just to make a bootdisk to a
command prompt.
It’s true that too much abstraction ends up bloating everything, but i don’t
think that’s a reason to throw it all away and add application functionality into
a filesystem driver.
Thanks for the article. …it has great ideas. However, here are some suggestions and things to think about:
1) BFS is already taken as a filesystem name (the Be FS)…
2) Many of the things you talk about are already implemented in the original BFS (imagine that). Metadata is present, and like many *nixs, the files are not explicitly tied to a directory structure (they are like inodes). Which, in fact, is how filesystems (except for FAT) have been implemented for years. Unix’s files are only references to inodes … you can have a whole bunch of hard links that describe the behavior you talked about. However, I do agree with you in that there should be better ways to access, sort, and categorize files and “groups” and so far not many OS’s have come close (Be did with its BeFS).
3) Check out what MS is trying to do with WinFS and their UI. I don’t know a whole lot about it, but it seems like they are trying to do something like this.
4) If you haven’t experienced the BeOS I would suggest you study it and its filesystem because it is the filesystem that describes your the most. You could even help with the OpenBeOS project if you are interested and want to learn a whole lot more 🙂
Ok, reading this article hurt my brain because it’s pretty clear that the author has some serious misconceptions about how current file systems work.
She first creates the group for the new project, optionally names it (who said a group had to have a name?)
Well, you’re right in the respect that not everything needs naming, but files and the containers that hold them needs SOME KIND of metadata to identify itself. A name is the easiest piece of metadata to bring to mind. A time/date autogenerated by the system -might- be better, but if I was looking at a desktop with 5 or 6 ‘groups’ as your BFS vision implies, I’d rather identify them by a name rather than any other piece of metadata.
Certainly, all this can be done with folders and symlinks and the like, but it’s just so much cleaner this way.
Warning sign #1 that this author has a serious misunderstanding of how file systems work…
The best answer is a good search mechanism… …But where the Be file system made these queries persist at the application level, the BFS lets users and programmers create and extend groups by adding a search pattern to their metadata
Ok. So now it’s easier for me to manually meta-data all of my files rather than just keep them in a heirarchy? And don’t get me started on auto meta-data generation. Current state of the art absolutely sucks. I work with a lot of digital audio. Am I supposed to meta data each of these files, besides naming them, just so that I can find them again via a search? The computer sure as hell isn’t going to be able to describe it as well as I can. Hell, OTHER PEOPLE can’t identify very important sound files (like drum samples) that I have to keep organized to be at maximum productivity. Manually meta-data’ing these files isn’t going to help make me more productive.
Very well, since programmers demand a highly structured disk layout, we shall give it to them in the form of a System group… …The actual layout on disk bears no resemblance to the hierarchical structure present in System.
Warning sign #2 that this author has a serious misunderstanding of how file systems work…
Imagine having an important group full of billing information for your company. Create a doppelganger of that group which gzip’s the group’s contents into a backup archive.
I believe this is already possible with named pipes.
Files, without a location, without a specific group to belong to, without even a name, should be identified at all times by a unique number, probably 64 or 128 bits in length.
Warning sign #3 that this author has a serious misunderstanding of how file systems work…
Finally, the elimination of a hierarchical structure should free the implementor from having to worry about unbalanced directories, strange create/delete patterns that strain block allocation policies, etc.
*blink* *blink* You just advocated B+trees above this!!! People don’t build tree structures for the fun of it. They do it so that it will make searching WAY, WAY faster. Balancing vs. tree depth will always be a concern. Block allocation policies are more a function of wanting to have a file located on disk in a linear fashion than anything having to do with heirarchal file systems. Disk rotational latency is what will DESTROY a FS’s performance faster than anything and that can’t be solved by simply organizing the logical file layout differently.
Further, no centralized directory need exist; all information for a section of the disk could be stored at the head of the section like ext2 and could be easily recovered and indexed at boot time.
How do you think it works now? That there is some monster chunk of data that describes every directory on the system and what files are in each at the beginning of the disk? No. It’s distributed the way you describe _now_.
*sigh*
ok, here’s the problem: You are confusing two very important concepts. There is a physical layout to a file system and then there is a logical layout to a file system. They are connected and some compromises need to be made between them, but almost nothing of what you describe above is new. In fact, 90% of what you describe is EXACTLY HOW THINGS WORK RIGHT NOW. The logical heirarchy of files in folders in folders in folders has little bearing on the physical layout of a filesystem. In the rare cases it does, it’s as a performance tweak. (having different partitions for parts of the file system is, of course, a different case)
Think of it like this: physically, the disk is just a huge pile of files, some are fragmented, but none organized. Each has a unique number. Some of these files describe groups of files. Some of the files in this group are themselves also groups of files. This is how things currently work. The groups are directories.
Above the disk platter, things are different. The system maintains a ‘logical’ filesystem, which is the one you are familiar with. It maintains links from an imaginary root to a set of upper level groups (directories) and where and what disks they are on. These upper level groups (such as ‘etc’, ‘usr’, ‘var’, and so on) in turn are just files on the disk that are lists of other files and other groups.
Since you clearly seem to believe that the logical layout of the file system is somehow intrinsically linked to the physical layout, I want you to consider this: under unix, a user can mount a filesystem on a totally separate disk anywhere in his or her system; in the root, or 8 levels deep in a directory. If this is possible transparently and with no impact to performance, how is it possible then that the logical layout of a system is dictated by what programmers think would be easiest to program? (hint: IT ISN’T)
There’s a TON more to rip on regarding how unusable this system would be for those of us who deal with a dozen different projects on a single day, 3 dozen during a week, and how it has terrible consequences for creating simple UI’s.
“Ok, reading this article hurt my brain because it’s pretty clear that the author has some serious misconceptions about how current file systems work.”
But… hes talking about somthing completely new, and different from current modern file systems! Why would he have to have any understanding at all of current ones to develope somthing completely new? He only needs to understand the system ‘below’ the file system that he had to interface with, i.e. the hardware interface.
Yes the author of this article demonstrates a poor understanding of how a file system functions.
But I think he makes fome interesting comments about how the end user should be able to interact with his files.
I felt compelled to rant about article for a while, but you’ve saved me the time.
“But… hes talking about somthing completely new, and different from current modern file systems!”
His point was that current filesystems are already able to do most of these
things. Or, more accurately: Modern filesystems are already designed that way.
For the most part, I think the author is dead on. I wouldn’t say he’s come up with any incredibly revolutionary ideas, since Apple’s already implemented “groups” in their iApps/Safari, and the real BFS by Be did use metadata (although I though it was a pain to use UI-wise). But these are good concepts that should be implemented in one way or another soon in a mainstream OS.
One thing you said which I have to laugh at is: “So now it’s easier for me to manually meta-data all of my files rather than just keep them in a heirarchy?”
Uh, hello! You have to manually keep all of your files in a hierarchy! The computer doesn’t save files and create directories for you, you know. What’s the difference between navigating a hierarchy to find the right directory, or just typing in a couple of choice keywords when you save a file? Time wise, little difference, and the keywords will make it far easier to retrieve the file later on.
Regards,
Jared
File metadata is a great idea, but I’m _more_ than comfortable with a tree, it’s just about doing some organization yourself, things dont get organized alone.
bg wrote a very good response and I suggest you read it.
Let me state two things:
BG is quite correct that most of what you want is handled right now. I’d disagree with his Unix bias and point you towards something like Z-OS (formerly MVS) which uses a database filesystem which includes most of the features you want (like automatic conversions, another layer of logical groupings on top of the hierarchy, metadata….).
Secondly you have to have a hierarchical filesystem its a complete force based on the phyiscal disk. You may hide this from the user but there is absolutely no choice that this layer exists. Here is why.
On any disk there exists only certain addressable points. The disk to function well needs to go to an addressable point and do all reads and writes over the entire addressable area. We will call these “blocks”.
Because consecutive reading and writing of blocks is so much faster than random reading and writing of blocks you generally want your blocks to be in sequence. So for example if you have 1k blocks and a 15k file you want the file to have as few pieces as possible and to follow the block structure. I.E. the ideal would be 15 consecutive blocks for this file.
Because files need to be larger than a single track you may not be able to achieve this for all files. We will call consecutive sets of blocks “extents”. You need to be able to figure out what your extents are, what file they belong to and in what sequence. So you must have a file_id which is unique across the disk an extent sequence number which tells you what order the extents go in which make up the file.
Then at the very least you must have a simple extent table:
file_id, extent_sequence, begin_block, size_blocks
This table will be huge for any reasonable sized storage device, in particular much too large to search sequentially. So you’ll need to create an index on this table. The only logical thing you would want to search on is file_id. An index on file_id is a hierarchical filesystem!
You can see that we made no choices here and generated a hierarchical filesystem preforce. You may or may not make this visable to your users but underneath the covers it must exist. The only remaining issue is how far you want to abstract your users from it.
Uh, hello! You have to manually keep all of your files in a hierarchy!
Except that:
1) most apps remember the last place you accessed files so I don’t have to do jack when I’m working within the same project context (folder).
and
2) I can navigate the deepest places in a heirarchy MUCH faster than I can type even 5 or 6 metadata name/value pairs and it’s not for want of typing speed.
Why would he have to have any understanding at all of current ones to develope somthing completely new?
I’m a huge believer in rebuilding from scratch when I think that I could apply a better design to something (often to the frustration of my managers and fellow programmers), but no serious programmer advocates rebuilding something completely without first checking out how the problem has been solved before. Anyways, that’s not why I mentioned that. I mentioned it because he’s describing exactly how modern filesystems work for the most part. The only thing he’s describing that’s new is a UI concept.
But I think he makes fome interesting comments about how the end user should be able to interact with his files.
Ok, then. Here’s how to create this vision under windows: First, ever navigate your drives. Just create folders on the desktop. If you want to have files in multiple folders in there, just right-drag and ‘create shortcut’ from one folder to the other. Viola! Except for the metadata ‘magic’ that every seems to think would make this concept work, that right there is exactly how his system would work. Heck, the search functionality in win2K and winXP allow you to search on name, date, and if the file contains specific text, among other things. Since that covers most automatic generation of metadata possible, we’re already 97% to the author’s vision.
That sounds like UI hell to me. But since others feel it’s a revolutionary idea it’s a good thing it’s already here and now.
I love the ideas presented, even if some of them are already in some form of existense, bringing them all together as described would be interesting to see and great to have.
I see many people assuming he needs to know how modern file systems work, not so. He is on a completely different paradigm then many people seem to be.
When a paradigm shifts everything goes back to square one.
Thanks, jbolden1517. But I think you’re over their heads, even.
You’re referring to having a block tree for the file so that you can seek to a specific location in the file in logarithmic time, right? Rather than having to skip->skip->skip through the files extents looking for the specific block in question?
The author isn’t even at that low a level. I think he’s advocating a flat logical file system rather than the physical layout. I doubt he’s even put much thought into the physical aspect besides mentioning a ‘B+tree’.
First off, I believe this is more an application-level problem than a FS problem.
As an example: newdocms is a proof of concept that does a lot of this stuff completely in userland.
In my mind, for this to work two things are needed:
– very well designed interface for apps to “find” and use files. (how to create/combine queries, how to be alerted if a file has changed, etc, etc)
– a very open and generic design of how the metadata is structured, and how relationships are defined between them.
How this gets stored on-disk is an implementation detail. I think a combination between a traditional filesystem and a mini-database could be enough. Of course if your filesystem supports more advanced features, you could always take advantage of it (attributes -> EA, transactions, etc).
One thing that is (in my mind) very important, is that metadata should be automatically collected. In my mind, some kind of helper/plugin system should be used to extract metadata from certain recognized filetypes, and keep them in sync!
As an example, an unnamed (as in filename) mp3 is put on your system. The attributes will automatically show a “title” attribute that corresponds to the title id3 tag. Change the attribute, and the id3 tag in the file gets updated, and vice versa. Or the title tag in a html file.
In the same way, I’d expose “traditional” metadata that the underlying filesystem offers, like filename, path, modification time, owner, etc. That way, views can be constructed that reflect the traditional filesystem, or something completely different.
So you have a file that has an arbitrary number of various attributes. Now, based on that, I’m trying to think of various ways in which queries could exploit relationships between attributes.
For instance if you have a picture of a cat and a dog that have as attribute: “is a mammal” and you have a file named “mammal” that has “is an animal”, could you then create a query “show me all animals” that returns both the dog and cat?
This is a “contains” relationship, but are there maybe other types of relationships possible that have completely different semantics?
From a user point of view:
– A lot of metadata should be automatically collected, because you don’t want to place the burden on the user to keep al this information in sync.
– Users should be able to define how relationships beteen objects are interpreted (“when I ask for ‘animals’ does it return only the stuff in ‘animals’, or also all the stuff in ‘mammals’, ‘birds’, ‘fish’, …”) , for truly flexible/limitless systems. Otherwise all you’re creating is “just a different hierarchical filesystem”.
Make sense? (it’s extremely late here, sorry for rambling)
“never mind how many other “copies” you have floating around in other groups of yours. When you’re done, you can just remove it from the group and you needn’t worry about it disappearing or being deleted before you need it again.”
We already have that in UNIX (Linux, *BSD, Solaris, AIX, etc). Those are called links. And no, not symbolic links – real, hard links.
So we already have “copies” of the same file in different places. It’s not used that much, because it’s not nearly as useful as you seem to think.
You have nice ambitions, but you don’t seem to understand how current file systems work. I think that’s something you should know before you trash them.
You’re referring to having a block tree for the file so that you can seek to a specific location in the file in logarithmic time, right? Rather than having to skip->skip->skip through the files extents looking for the specific block in question?
Sort of. Its more the problem of having to search through the entire table of extents. For example, lets say a fairly standard 20 gigs of data + programs has 300k files in it. Lets say that makes something like 2m extents. If we assume 50 bytes per entry on the extent table and this table was stored in memory it would consume say 100 megs of ram. If you used sequential reeads here then every time a file was accessed the system would have read through on average 1m extents or about 50 megs worth of data to figure out where the next read is. To simply read an average sized file would mean processing about 350 megs of data (since we were assuming about roughly 7 extents on average for the average file); reading a large file could require processing 10+gigs of data.
So of course you don’t do this, and instead have an index. An index is a hierarchical tree and a the only possible field to index on is a file_id. That’s was my point about an index on the extents table is a hierarchical file system.
The phyisical layout itself doesn’t involve files at all. That’s all part of the 2nd logical layer:
layer 0 = blocks of data on a device
layer 1 = extents as parts of files
layer 2 = files in a hierachy
That is he has a hierarchical filesystem unavoidably.
What Unix does (which is quite sensible but not the only option) is to have the index be somewhat user oriented so that “file_id” becomes natural i.e. file path + file name.
Unix layer 2 1/2 = files from multiple devices in a universal index.
The other alternative which is more consistent with what he was after would be building directly on layer 2:
layer 3 = abstract files in a database
which is essentially what mainframes do. The important point is that mainframes still have layer 2; its just that most user apps interface with the system at layer 3 not at layer 2; the same way that most Unix apps interface at “layer 2 1/2” and not the other layers, with a few exceptions like dd which can operate on layer 0.
I still like my idea that on the user-interface end of this, we use XPath for searching the file system. This would likely only be for the command line, but maybe the finder-thingy gui would have an optional “Go To” box that lets you type in an XPath.
http://www.cogsci.ed.ac.uk/~kowey/loria-personal/xpath_filesystem.h…
I guess it’d be like using zsh’s file globbing, except with metadata
Well I read through and my comments are as free-form as the authors filesystem.
“The tacit assumption that everyone seems to be making is that by making it easy for programmers to code for, the program can then present whatever “user-friendly” interface it wants to the hapless user. This is a damaging notion, and is born of the misplaced belief that any user-centered file system must necessarily be difficult to interface with for the programmer. False! Programmers must translate user intent into machine code. When the file system is created to support a user’s intent, the programmer’s job becomes trivial.”
Well one problem in HCI design that people forget (on both sides) is that the reason things are complex on one side while *appearing* simple on the other is because the world is a messy place, and humans are the masters of ambiguity. A programmer has to work hard to make a box of silicon and metal deal with that.
“The traditional hierarchical structure was introduced to create structure where there was none, in order to aid both the user and the programmer in categorizing information. ”
Not quite. To forstall a long discussion, look around you and see the sea of hierarchies. From org charts, to auto repair manuals. Natures structure is simply a bit looser than what machines can deal with.
“Finding Simplicity in Complexity”
Sounds a bit like vFolders with a live component.
Reminds me of a spreadsheet I use to use. Had live feed connections, and backend database querying.
“Doppelgangers”
Basically a synced copy(s).
There’s also the problems inherent to most any “conversion”, and that is usually loss. One will have to make the original complete enough to do any transformation:i.e.element paths,object properties.
In summary all it seems he’s advocating is a free-form database. It seems we’re already heading that way.
Personally, I like the idea of a filesystem with a search interface instead of a hierarchy, but there are two major problems:
1) Sometimes, I have trouble finding things on Google for two reasons: a) can’t figure out and/or remember the specific query formulation to get the information that I know is there, and b) search terms that are too generic and return loads of the wrong results.
2) I don’t care about meta-data! Occasionally I search by modification time. Because of the execrable design of Windows, I often search for filenames on that platform. But mostly, I care about the *contents* of files. For text, glimpse works okay for me.
But what about non-text data? How do you search on the contents of a picture? An OGG stream? An animation? Textual meta-data? See #1 above. And who’s going to enter it?
Oh yes, now that FreeBSD (in 5.0) supports attaching arbitrary meta-data to files, it mostly works like this hypothetical filesystem. (See comments above.)
[email protected], Have you done much development in XSL? I have. It’s … ahem … interesting. Effective, but mind-warping.
I think that’s a neat idea, but I think it’d scare off even the most hard-core of CLI junkies.
jbolden1517: I think I disagree. I need to go do a little reading before I open my mouth again, though. Specifically, I think I disagree that most file systems maintain a tree that describes the file system heirarchy separate from the logical fs layer. But again… I need to do some refreshing before I’m going to step up to the plate on that.
Interesting article (loud) and even better responses.
I will add some points.
Typically many novice users (most of my relatives) will not likely have that many files on their PC v the huge no that is in the OS, ie they will have family pics, some mp3s, some docs etc plus lots of crap that the web brings along. Managing this shouldn’t be too difficult for these users since they only really understand the files they put there & ignore everything else. I just deleted 12K junk cache files off my wifes PC! They should be protected from seeing any of the critical OS/App files & most OSes make you see far too much. Idiot mode should have the OS/Apps look like a few untouchable files the way MacOS used too. I always liked that MacOS allowed all App resources to hide inside one App file yet ResEdit could pop it open. None of the OS/App contents has any business being seen by the novice user. And OSes shouldn’t be spying, growing, monitoring etc.
Now folks like us probably collects everything under the sun for future ref so I easily have 100K files in reach mostly not the OS, and even with the amazing BFS it is a pain to manage. I was just searching by date the BFS and it never got past 10% and was grinding along at a few files per sec (new HD as well). I don’t think even BFS had huge nos of files in mind when it was built even though it boasts 64bit.
Now there are other users like chip designers (or pick your own interest) who manage 100M or more objects & they certainly can’t rely on the FS to provide the DataBase. Those SW authors who rely on the FS for storing & managing the these mega DBs are dooming the user to the worst exp & those tools go away. So in these situations a special purpose DB+FS is built on top of the OS general FS.
I think that for each type of user files we should just stick with well optimised apps that can manage 100K+ files and leave the OS FS the chore of holding a few indexDB files & the managed hierarchy of data files. For instance I can’t concieve of say Windows ThumbsPlus being usefull for managing anything but picture DBs, it has so many usefull tools like sort by similarity.
Even as FSs get more complex to be able to manage mega file collections, the nature of file storage could well change to make the assumptions silly. You might have a few GBytes of DRAM or Flash storage the speed difference should be amazing. I would venture that in a few years, HDs will be relegated to storing less critical low useage data like music, pix and that the critical OS & FS structures could go into solid state MRAM, Flash, TEM, etc.
I think that on MacOS with HFS you can find a file system that as some good properties.
The name and directory are only file attributes, files are indexed by numbers in a BTree. Searching for a file is pretty fast. Also you can change where an application is installed, it will still work, because you have only change one of its attribute, not its number (hey windows users look at this feature -> moving apps).
Searching for a file is much powerful than in windows, by default you can look for a file that contains the given words instead of “*thing*trick*.*” (a little archaic, i think). Its fast because the filesystem is fully managed as a database instead only a storage space.
Metadata are very powerfull and can contain much more information than just a “dot three letter extention”.
On MacOS classic, you can also have a group attribute (giving a special color), but it minimal. It’s also useful, for example I have set a particular group once my OS was installed so I can easily found any orginal file installed by the system and other. But this feature has been removed on MacOS X.
You did not write anything about links.
The amiga has a good ideas here, mulitlinks!
Example: a multilink ‘bin’ which points to all
your bin-dirs. When you write something at
the commandline it will be looked up in ‘bin’.
Many environmentvarables can be avoided this way.
Simon
/usr stands for Unix System Resources unlike what many people think unfortunately, I’m not sure if Red Hat even was aware of this considering that they put desktop and browsers in /usr and not /opt as they are not required for system function, unfortunately, Red Hat clones and spawns followed them off the cliff like Lemmings.
On another note, this sounds too much like my apartment and how disorganized it has become!
An interesting article from the author, but I’d seriously suggest he reread the ReiserFS whitepaper.
In no way is it rambling or incoherent. It *does* use some pretty advanced concepts, I only fully understood it after reading it 4 or 5 times. I have a feeling that the parts Brendan thinks are incoherant are just parts that he’s not fully understood. I’d be happy to debate this with him however
A lot of these concepts are being worked on, the one that was really new to me was doppelgangers, but I have a feeling that it’d simply be smarter for the computer to transparently convert between formats on the fly, so if you try and embed a Gimp XCF into a webpage, it’s converted into the best format for the job for you, rather than having the user set up the conversions manually.
I really would suggest that you investigate set theoretic naming more fully however. That addresses a lot of concerns you have.
The area that ReiserFS has BFS beat hands down though, is that it’s addressing the issue from both ends. A lot of thought and research has been put in to the design, both from the user perspective and how it will work. You can’t just brush that off. Building a filing system that does all these things while remaining acceptably fast is a nightmare – BeFS was plagued with performance issues, and they never fully realised their dream. That’s why ReiserFS up to version 4 has been fully focussed on speed and efficiency with small files (filing system objects). Bear in mind that a 128 bit identifier is useful only if your content is significantly longer than 128 bits – by no means guaranteed.
I like the idia of having “plugins” in the file system…
PNG->GIF GIF->PNG
BUT…
Imagine ur going to a Folder (or group) with 10k PNG files… u open 1 of it , and computer decides convert or not to convert (:]). Das a program mustn’t depend on a .png or .jpg it must open any file.. and the plugin will decide convert it or not…
but if u don’t have that plugin [and converters are useless with that file system which supports plugins] u will need to download them… but a program can be shipped with it…
OK lets c … I buy my self a Photoshop 12.5+EA0… It installs that plugin…
Now i buy PtotoDeluxe 10000 and it installes another plugin…
1 Those plugins may conflict with each other
2 There may be an error when both plugins are “converting” that the same.
3 There’s no renaming mechanism in those plugins… couse
when 1 plugin converts a file… then u won’t need it converting it all over again the next time… so it l00ks for that file that it converted and the system gives a command not 2 convert [bla bla bla … etc…] so it can’t rename it [ it’s really deficult 2 understand me here]
It will Be like Dependency hell on *n*x =]]
[i like the idea about converting .doc & .xls…]
As odd as it may sound, Many of the ideas Brendon has proposed can be see in JRiver Media jukebox.
It has metadata (ID3 tags), it has flexable groups (album/artist/genre), and it doesn’t care where the file really is or what the filename is, the MP3s could just be a bunch of numbered files sitting on the root of the HD for all it matters.
The way i see it, you don’t need a new FS at all to do this, any pre-existing file system would work, even fat32. All you really need is a different file manager UI. You put all the files on the root of C: and give all the files a unique number. The metadata would be stored in the file itself (ie. ID3 tags) and a copy of the data would also be store in a special database file with a file ID of 1.
When you want to work on a file you run the file manager (a special icon that is always present on the desktop)find the file by either sorting the column or opening a user defined group (the groups a file belongs to is also a metatag).
When you doubleclick the file after you find it, the file manager figures out what program to open it with (an application’s metadeta will also list what kind of files it can open) the file manager tells the application the volume number and file number of the data file and application opens it.
There could also be something like the windows start menu that contains groups of applications to all your applications. The file>run dialog in an application merely brings up a smaller version of the file manager.
The only required metadata is the file ID number and the file type. Everything else, including name and group, is optional.
Some people say that BeOS and MacOS already do this, but the biggest difference is that they still use named files, in named directories, not uniquely numbered files in a flat undivided space.
I have no clue about Filesystem, besides from using them, but that doesn’t prevent me from posting some thoughts 😉 :
This On-the-Fly conversion he talks about seems to provide very much the functionalitiy of Amiga datatypes: Every program which was able to use datatypes was able to open any given format for which a datatype was available.
These datatypes were a userspace thing and I think this is the right place for such a functionality. This shouldn’t be implemented on a system/driver level.
MfG
Sebastian
is it just me or nameing a file system after yourself egotistical? anyway BFS stands for Be file system anyway so maybe you should change it.
The author does seem a bit confused over how many file systems actually work.
I can pretty much do all this on any FS I choose given the right tools (NT4+Index Server+Structured Storage Wrappers for example). Of course, building parts of the into the FS itself (as BeInc did with BeFS) has advantages, and it looks like even MS realise this. But at the end of the day, he’s not proposing anything new or original. Just a natural evolution that’s already starting to happen.
Get it here: http://www.credit1.co.uk/charlie/xdbfs.pdf
Hi bg…
I have done some stuff with XSLT (not the other half XSL FO), and will admit it to have been a very unpleasant experience in general. But i thought the distastefulness came from its highly cumbersome syntax (why many things but not all should be in XML) considering the application i wanted, not from the actual XPath bits.
Do you have further insight into the mind-warpiness that using XPath would produce?
To me, it’d be just like a regular CLI, except that “//” doesn’t mean the same thing as “/” anymore… and that i can search for metadata if i want to.
Most of the traditional literature assumes that “files” = text files
or database files, with content that could be indexed by the computer.
IMO we should adopt as the Standard Problem the handling of drum
samples, as mentioned above. There is nothing in these files that can
be indexed automatically, and a serious user may have thousands (maybe
tens of thousands) of similar files.
The file name and date have to be the main identifiers, and the user
cannot avoid having to give each file a well thought out name – ie.
not just “hihat_004x”. Obviously long file names must be supported
(ISO 9660 is no good), and 3-letter extensions should never be used to
identify file data types.
Metadata is a Good Thing but you cannot require a user to enter extra
information, and metadata modules can be a serious nuisance in a large
directory. All files must be easily portable between OSes, so it is
not really an option to bundle the metadata into the file itself.
Anyone who has had to move font files from old Macs to another system
will know what a pain the Apple forks are.
So IMO a metadata file should _optionally_ accompany the data (or
program) file. It can include an icon image (with an alpha channel), a
thumbnail if the file is itself an image, comments, etc etc.
As for physical versus logical, partitions and removable disks
(usually CD ROMs nowadays) are certainly physical units and any file
system must recognise this. All partitions should be treated as
removable. Within a partition, any sorting system (such as
directories) is arbitrary. But hierarchical directories are tried and
tested. I think they work quite well for managing large numbers of
sound samples, emails, fonts, etc.
I find the biggest nuisance is tracking down files that have been
backed up onto CDs and are no longer on a local hard drive. There is
no way you can keep all your audio data on a local hard drive.
Hiding system files? Maybe the user could be asked at
install/unpacking time “Do you want to understand how the computer
works?” If the answer is No, then hide the system files. Explain in
the requester what is going to happen, in plain language.
This reminds me of datatypes on the Amiga, only slightly clunkier. Why have applications install conversion plugins individually when a universal system that understands the file type for the application would allow all programs access to all filetypes?
Granted, datatypes were read-only, but any modern take on the concept would have to allow writing too.
is it just me or nameing a file system after yourself egotistical?
I happen to know of a certain Finnish computer science student who named an entire OS of himself. 😉
Perhaps BrendanFS could be incorporated into Eunix? =)
This article is pretty much moot since Microsoft is already working on a new, very different from today, file system.
If we listen to the naysayers, nothing new would EVER be implemented. Just ultra slow advances in technology.
For those naysayers, where is the hierarchial filesystem on my PDA?? Dont tell me its a UI, tell me if it is BETTER for the user.
If so, then your high and mighty :’well its always been in the application layer, author doesnt know what hes talking about’ is a worthless argument.
I dont know how long it will take the Unix ‘community’ to see what microsoft and apple learned 20 years ago. You win the Users first.
> 1) BFS is already taken as a filesystem name (the Be FS)…
That didn’t stop Be Inc.
BFS already stood for Boot File System.
I happen to know of a certain Finnish computer science student who named an entire OS of himself. 😉
No, as I remember, the guy responsible for the FTP site where the sources first appeared coined the name 🙂
-fooks
I’m a long time Mac user and I have used VMS, Unix, and Windows extensively. Each of their filesystems seems to have some traits which are useful, though there seems to be little innovation.
Here are some things I want to see in my ideal file system:
1. Filename case insensitivity – From a user perspective, it is really confusing to allow two files with the same name differing only in case.
2. Filename independance from file type – I hate required filename extensions. This is something that DOS and VMS had and it forces user behavior that should be unnecessary. Extensions should be optional. When Apple decided that applications should use filename extensions in OSX I was really disappointed. This was one of the things I really liked about MacOS Classic.
3. Multiple file versions – VMS is the only commercial OS that has this feature (that I know of). Why has nobody copied this on other platforms? I don’t know how many times I have accidentally overwritten a file and was saved by having older versions of the same file.
4. MIME file types (or their equivalent) – BeOS had this one right. One of the issues of filesystems and OS’s these days is being able to succesfully transport files between platforms and retain the file type. MIME is a recognized standard across all platforms.
5. File locking – In a multi-user project environment, there are times when many users might want to write to the same file. It seems that Unix relies on file permissions to handle this. This is insufficient. We should be able to write-lock.
6. File aliases – This is a MacOS feature which I really like. Symbolic links suck because once created, they always point to the same location, even when the file pointed to has changed location. In other words, they break. Hard links are pretty much indistinguishable from the original. MacOS aliases are halfway in between. You can move the target file and the alias still works. On the other hand, you can readily tell when you’re working with an alias vs. the original file.
7. File Creator – I still like the MacOS concept of a file creator which basically indicates which application is the default for opening a given file.
8. Generic Metadata – There should be a framework to store metadata with arbitrary content. MacOS HFS had resource forks and multiple file forks. I thought that was a pretty cool idea.
9. Metadata Exchange API – This would be a layer above the basic file system which would manage translation between various levels of Metadata content. The purpose here would be a standard way of importing/exporting files for use on other filesystems. Part of the problem with cross-platform compatibility is that different filesystems support different metadata(and very little at that). As a result, we default to lowest common denominator. Instead, a user should be able to define a default set of behaviors for interacting with different filesystems. These preferences would be applied any time a file is read or written in any application. For instance, preferences could indicate that all files be saved with a file extension. In addition to this, with support for generic metadata, files imported from other filesystems could have *all* of their metadata fully preserved.
Sorry if I rambled a bit, but I hope you got the ideas. I hate living with the lowest common denominator.
I don’t think that deleting a file in one ‘place’ should make it vanish from all the others. Garbage collection should automatically delete a file that has no existing references.
stupid article I’ve read in a long time. Folks, read bg, get a clue. A filesystem is not something user interacts, it is a way of storing and accessing raw data. Logical properties of files do not exist in filesystem level except as metadata *which is non-interpretable for filesystem itself*, data are data and that’s all there is to it from FS POW. You know what, a tree is vastly superior way of storing and searching data compared to a list or a graph. How information, as opposed to data, is accessed and manipulated is a more higher level decision. Just like you don’t flash OS’s gui in bios, you don’t put information access methods in FS. Possible, but a REALLY bad idea.
“I don’t think that deleting a file in one ‘place’ should make it vanish from all the others. Garbage collection
should automatically delete a file that has no existing references. ”
I generally delete stuff to make room on a full disk. Therefore I want
it gone, there and then. No Trash Cans or delayed deletion, please.
It’s funny how 40Gig drives seem to fill up faster than 40Meg drives
used to.
I think that this author is having a hard time separating an interface (how one uses the tool) and its implementation (how the tool is actually constructed). In describing groups, he writes, “Certainly, all this can be done with folders and symlinks and the like, but it’s just so much cleaner this way.” But what is “this way”? The second magical attribute of this filesystem is robust support of a large range of queries. Later there is an implementation section, but it does not describe a fundamental architecture and only glosses some potential issues.
I think that those critical of the hierarchical file system should begin with a better interface that utilizes existing file systems or databases. The success of these new interfaces would then drive a reevaluation of the existing filesystems/databases and make better ones that would be a better foundation for more fantastic interface improvements.
I personally like hierarchy. The file system you describe (assuming is were possible) would be a nightmare to naviagate from the command line; especially without unique names that give any clue as to what is in the file.
May I suggest the following:
-Keep hierarchy
-Allow for “public metadata” which would be any type or size of data you want to attatch to the file. This would be more useful for filemanagers because they could store info on things like: what type of file it is (no more extentions, hence no more confusing file types because of their extension, take .dat files for instance) what program you prefer it to be opened with (ability to assign programs to specific files rather that file types) and countless other uses that would be handy. Also, the metadata could be specific to filemanagers or programs handling the file. Just because one file manager stores metadata that specifies what Icon to display for a file doesn’t mean a second file manager couldn’t store it’s own separate metadata for icon display.
-Allow the path to specify a tree rather than just directories. Ever tried to find files in folders that have 100s or even 1000s of files. It’s a mess. But all those files are dumped there because that’s where they have to be in order to be in the path. If trees were specified in the path, you would be able to group common files in their own folders while still remaining in the path. Take for instance path=/folderX/bin. Then all programs that need to be in the path have to be in /folderX/bin. But lets say that you have path=/folderX/bin:2. In this case the “:2” would mean that /folderX/bin and any folder up to 2 levels down in the hierarchy would be in the path. That way files could be put in /folderX/bin/programX and still be in the path. All of a sudden /folderX/bin doesn’t have 1000s of files in it, but rather broken down into various folders.
The author brings up some good ideas, though I don’t think one of them is original. I would encourage everyone interested in file system design to consider both the IBM AS/400 storage system and the Novell Storage Services system in great detail in addition to the usual Mac, Windows, Unix, and BeOS.
For example, the AS/400 will tell you how full a particular drive is, but will not tell you which files are on which drives. Unless you have a need to remove a drive, it should never matter where the file is physically stored. Although, you certainly need to know how full your system is, or at least know when you’re getting low.
Now Novell Storage Services (NSS) is a thing of beauty that I am certainly keeping my eye on. I feel that it’s still in its infancy, but is developing at a good rate.
NSS is not really a file system in the traditional sense, but an object database. Each file is stored as an object with a 32-bit or 64-bit number (NSS can be compiled with either) as an object ID. 32-bit numbers are faster for 32-bit processors. As systems move to 64-bit processors, NSS can grow with them.
The traditional file system “feel” of NSS is provided by filters between the object database and the OS. As far as the OS is concerned, it’s still just a file system, but enhancements to the OS could easily provide new features.
NSS uses raw disk space as storage pools and you create virtual volumes within these pools. Most people would think of the volume as a “disk”, but it is really more like a grouping of objects that looks like a disk to the OS.
You can create many volumes in a single pool or create a volume that spans many pools (to make a huge volume).
Volumes themselves can have quotas and expand to fill the storage pool. It is even possible to “over commit” the volume quota to a size larger than the storage pool with the anticipation that you will add space to the pool at a later time. It is possible to add space to pool on-the-fly.
In the most recent versions of NSS, you can even shrink a volume with too much free space, making it a smaller size. You’re then free to add that space to a new or existing volume. All of this can be done on-the-fly.
One of the new features that I expect to see in the future (possibly very soon) is the ability to replicate the object database to other servers, much the same way that Novell’s LDAP directory replicates its data to other servers.
I anticipate that you would replicate a storage pool to other servers (one or more, depending on how much redundancy you need or want).
It would be possible to have a master copy of a volume at corporate headquarters and have a replica at branch offices for performance… and off-site backup. Just like eDirectory, changes could be made at any location and replicated back to the other replicas. Also, if one copy becomes unavailable, the system might be able to automatically switch to another copy. Replicas can be added to or removed from a server at will.
Something like NSS is what I expect for future data storage systems. They need to be distributed and reliable, perhaps even scouting out new storage locations on their own and moving data from old and unreliable replicas to reliable ones automatically to ensure that your data is never lost.
Two point:
First, people do not think in terms of “opening a .jpg file” or “opening a .png file”, people open Images. I hear many people talk about “a Mozilla image” or “a Paint image”, using the standard application to open the filetype. Opening a file in another program then becomes an almost impossible task to some (most?) people. This proposal does not solve this.
Second, the first problem arises because, on Windows, the Explorer is used to open files. What is also complicating is that two different hierarchies are maintained, the file-system and the desktop menu’s, programs (in unix this would mean multiple virtual desktops with their opened file). I think these two should be combined. A virtual desktop should be a specialised group (group as mentioned in the article), which can be opened. Then for a given project/filetype/whatever you have the display and the belonging files directly linked. Not sure if there should be hierarchical (or linked) virtual desktops.
Any thoughts on this?
PS I have completely ignored implementation issues here, since the article was about the user experience.
> 1. Filename case insensitivity – From a user
> perspective, it is really confusing to allow two files
> with the same name differing only in case.
It’s especially disconcerting when one is attempting to give a client/customer a series of filenames over the phone. One of the reasons I really like OS/2’s HPFS over many of the *nix filesystems available is precisely this reason — HPFS allows mixed case in filenames for aesthetic reasons, but considers filenames which differ only in case to be exactly the same insofar as commands are concerned.
> 2. Filename independance from file type – I hate
> required filename extensions.
So do I. Something similar to the creator information or associated application information stored in the resource fork by Mac apps or in the EA’s by OS/2 apps would be a better approach.
> 3. Multiple file versions – VMS is the only commercial
> OS that has this feature (that I know of).
OS2200 (the mainframe OS descended from EXEC 8) that runs on Unisys 2200-series and Clearpath IX mainframes allows for both file (“element” in OS2200-speak) and directory (program-file in OS2200-speak) cycles, and all deleted past cycles of a file are kept intact until the directory is explicitly packed. This allows for the keeping of multiple concurrent versions with the same name on two levels.
> 6. File aliases – This is a MacOS feature which I really
> like. Symbolic links suck because once created, they
> always point to the same location, even when the file
> pointed to has changed location.
OS/2 also had the ability to dynamically track and update shadows (shortcuts) on the fly, even across physical devices.
> 8. Generic Metadata – There should be a framework to
> store metadata with arbitrary content. MacOS HFS had
> resource forks and multiple file forks. I thought that
> was a pretty cool idea.
It was. OS/2 Extended Attributes were similar, at least on HPFS filesystems (the EA’s were stored in a single monolithic file on FAT filesystems, something which was a pain in the arse in many ways.
Thanks to all of you for taking an interest in my piece. Now, if I may clear up a few outstanding issues.
1) About the name, it wasn’t an ego trip, just a small tongue-in-cheek name. And the similarity in name to BeFS (or BFS if you want) was very intentional. I know I leaned on BeFS for a lot of the ideas, but I think I gave credit where due.
2) There have been a few posts about the need for hierarchy or file names. Hierarchy is available upon request, it is just not forced. I know things get disorganized in a hurry, but I work best with my desktop (Mac OS X) full of icons for what I’m currently working on. If that doesn’t suit you, you can always create new groups (like folders).
As for file names being more useful than metadata, in some cases they are, and you are certainly free to use them. However, in quite a few cases, file names are a complete hinderance. I download quite a few technical papers from online journals and they all have names like “icip00125.pdf”. That doesn’t help at all! But, if the system were nicer to me and the PDF reader simply cached the most common words or the actual title of the paper in the metadata, I could look up that same paper by it’s title “Optimal Recovery”. Furthermore, the lack of free-form metadata has lead many people to name all their images “1024x768BlueBlobs.jpg” or “20020124calculator.c”. Image dimensions and creation dates are two very easy pieces of metadata to store and retrieve and display to the user.
3) BG has made some good points on file system layout and the like. As I was starting from the user interface, the implementation was not high on my priority list, but I didn’t throw out terms without some thought behind them.
Current file systems are inexorably tied to the hierarchy they represent because they must parse paths into actual files. If I want to open the file “/usr/local/bin/myprogram”, and the system doesn’t have the location of that directory cached, it must descend the hierarchy on-disk, going from one directory to the next to find the inode of the file so that it can then go find the file. With a unique identifier, the system gets to how to lay out the hierarchy and hence can be made much more efficient. (I’m talking about the “directory” hierarchy, not the file location on disk.) Not to mention, a lot of filesystems try to keep files in one directory more or less together which effectively increases the size of the blocks you’re trying to allocate and shuffle around to avoid fragmentation.
4) I never claimed that these were all my ideas. In fact, other than calling groups “groups”, I don’t think any of them are. They are, however, what I would like to see implemented.
5) And finally, my apologies to the Spanish-speakers out there . . . it should be “cojones”.
I have a couple friends over at Microsoft and have heard rumors from other people that Microsoft will be using the SQL engine (in effect a database) for the next version of NTFS, maybe it will be called DBFS. This is the way of the future, DBMSs already spend a ton of time choosing where data will go and microsoft figures it will leverage it for use as their filesystem. I am sure there will be some overhead, but remember they do not have to have two teams working on what essentially boils down to the same problem: putting data on a disk. Additionally you can get very kewl features by using a DB – transaction logging, usage, lots of metadata, etc.
It obviously will not be easy, but M$ is working hard on it, and I am sure once they get it right after 2 versions, others will follow.
“When a paradigm shifts everything goes back to square one.”
This may well be true, but there are real physical and mathematical reasons why a lot of this is a bad idea.
First of all, the need for trees to be balanced extends is real mathematical concern. Linear search SUCKS. I think everyone who has done any database building is well aware that if you store things linearly, and have to search it that way, you are in for a world of hurt as the number of entries grows. Even if you do sorting and splitting, it doesn’t help enough. I see no need to do away with trees.
As to the physical concerns. Well, as has been pointed out. Hardrives almost always have everything at the start of the drive anyways (barring logical partitions of course). So his reasoning there is off base. It is already done that way.
A lot of his ideas are pointed in the right direction from the userland perspective. What he describes is a very interesting as a UI, but poor as an FS.
“Unless you have a need to remove a drive, it should never matter where the file is physically stored. Although, you certainly need to know how full your system is, or at least know when you’re getting low. ”
Unless of course you are planning on reformatting, or you hear strange screeching sound coming from one of your drives. Or your drives differ in speed and you wish to keep important/frequently used files on one and less frequently used files on the other, or a myriad of other possible reasons one might want to know which physical drive they are working with.
well, even if I yhink most of this work is just BFS+reiserFS, I think brendanmiss one important thing.
You *Can’t* break comaptibility, or, you can’t until you get someon to even write all the rest of the appse for you.
I think that having amn evolutionary model such as reiser’s one is useful.
ATM the have plugin support, and fast handling of little data , thay grew up a fast tree alghoritm, and the uincorporated many performance features (such as wandering logs). In the future they could even grow up a db-query system, but who would use it until it does’nt find it’s way in kde/gnome/rox ?
I subscribe an old comment. BrendanFS could be useful for EUnix
3) BG has made some good points on file system layout and the like. As I was starting from the user interface, the implementation was not high on my priority list, but I didn’t throw out terms without some thought behind them.
Current file systems are inexorably tied to the hierarchy they represent because they must parse paths into actual files. If I want to open the file “/usr/local/bin/myprogram”, and the system doesn’t have the location of that directory cached, it must descend the hierarchy on-disk, going from one directory to the next to find the inode of the file so that it can then go find the file. With a unique identifier, the system gets to how to lay out the hierarchy and hence can be made much more efficient.
Bull. This is specifically what I addressed in my post. The entire table of “unique identifiers” is going to be the same size as reading all the directories into memory. so if you can store one you can store the other. Scanning the “unique identifiers” sequentially is going to take way too long, so you will end up building an index on them. Once you’ve built the index finding the file will be:
read to index block sequentially for next correct link
get correct block of index
read index block sequentially
….
which is the same action as
read /usr directory to find location of /usr/local
get /usr/local directory
read /usr/local to find location of /usr/local/bin
Program name + path is a unique identifier.
I agree with all of what Steven said above, and lots of good points were made in the article as far as the UI. Those who are dismissing it because of the mechanics are ignoring the need for a better way to find and work with your info.
I think the reason a lot of people don’t mind a hierarchical FS (aka Crap FS) is for the simple reason they don’t use their PCs for actual INFORMATION storage and retrieval. Read my scenario at the end to see if your daily needs compare to mine. Everyone who says Crap FS is adequate probably has a narrow use or way of viewing the data it contains and definitely not in a collaborative environment. Bassdrum0003 can only be approached from so many contexts.
The problem with the Crap FS is CONTEXT and ASSOCIATION. A Crap FS is dead when a user is required to access the same information in a different CONTEXT. Different people require the same data for different uses. Hell, I need to access the same data for different uses.
Zoom in/ out UI – Brendan, use the Brain from thebrain.com for a few weeks. It is fairly difficult to adjust to. The Brain attempts to provide an ASSOCIATIVE FS, but it’s window dressing – you have to dip out too much to break the old paradigm. Lots of great ideas though. Use it just to test out some of what you wrote about.
Search = Bad UI? – Any critics ever used an encyclopedia like Encarta? Search is the primary UI because every topic has an infinite number of contexts. There’s simply too much data for a Crap FS. If you haven’t yet, really, really look at Encarta or some other encyclopedia and how they handle the vast quantity of info with pictures, sounds, text, etc. Search has been my file browser since W2K vastly improved NT’s metadata indexing. I need someone to build me that and in a way I can plug new content with new contexts and associations in. Use your desktop as your workspace; use search for deeper filing needs.
Existing Functionality– sort of, but not quite. As others have said – it’s in pieces in 10 different FS’.
Shortcuts – stupid. It’s easier to just add keywords to the file metadata than to drag shortcuts to every possible place it should exist. In Windows just right click the file, properties. Don’t even have to open the file. I did the shortcut thing under NT and it’s a Band-Aid, not a fix.
Manual Metadata entry – others brought up various auto-entry items. Not so bad once you get used to it. In fact, if the Save As dialogue only contained a name and optional Keywords, it would be adequate.
People are storing a great deal more stuff on their PCs and they need some way to find it. I know Crap FS is insufficient at home as well as at work.
Longhorn – read this Fortune article. Gates addresses many of your concerns including file typing/ extensions. http://www.fortune.com/fortune/ceo/articles/0,15114,371336,00.html In addition, others mentioned Reiser FS. Even if people don’t implement it, it does mean people are aware of the problem. Xdocs looks to be fairly invaluable if I can save all of my Office output in a single, fully indexed format (XML).
Scenario –
I am a “knowledge worker” in the purest sense – my job’s primary requisites are to remember things, track changes, notice long term trends, catch emerging trends, and train others. I’m a field tech, but I am the continuity for quite a few offices. To do this, I have years worth of diagrams, notes, docs, spreadsheets, photos, etc. My pst file at work is about a gig.
No one can navigate my PC. I can’t navigate my PC. Crap FS can’t handle my needs. Search is my file browser. I manually entered metadata for every work file on my PC. I have forced all of my coworkers who cycle in and out to enter keywords (metadata) for their docs too. I configured everyone’s machine to prompt for Properties on every initial save. They are used to it, they do it, and they live with it.
They can search my PC as well as each other’s for the relevant files without having to know my PERSONALIZED directory structure. Hierarchical file systems are always personal. Ignore the /, bin, lib, Program Files, Windows directory crap; I’m just speaking of relevant data you work with.
Network shares – Tried/ failed. People just keep storing stuff on their PC. Who will create the network directory structure? There are at least 20 offices that need my info and none of them think of that info in the same manner. Whoever creates the structure forces others to their paradigm and that just isn’t happening in my environment. It essentially means you have to learn someone else’s job. My Budget office sorts things by Chronology of initial purchase. Programs sorts things by Project Name. Contractors and associated tech offices need to access their little piece of the pie. My field techs need to know everything, but different portions at different times in different contexts. Various DBs have failed for one reason or another.
Part of the problem here is that Brandon started this discussion by talking about a filesystem when he really means user interface / file browser. You are doing the same thing.
People don’t use filesystems to find data, kernels do. People use applications which present the data to them to find data. Dos / Unix use a model where the presentation is a somewhat more user friendly version of the extents index (see my comments above) and as a result people tend to confuse the two.
The solution to your problem is a database filesystem. Use clean relational modeling and then data can be one and only one place. Where it is becomes immediately logical. You don’t need to remember where things are, you just need to remember the map, and at worst you just need to remember where the map is. What you are asking for does not require a change of filesystem it requires a change of user interface.
Using a database, stuff like sorting by different properties or selecting subsections becomes trivial. The problem of how to organize vast amounts of information has been around for centuries. The only major breakthrough in this century which computers have enabled has been the relational model. It seems silly not to make that the cornerstone of a “filesystem” redesign.
<a href=”http://www.osnews.com/comment.php?news_id=1304“>here is a previous entry on OSNews about libferris. It takes a different approach to “Doppelgangers” in that data is exposed from files in application centric ways. eg. gif, png, jpg, etc all have rgba-32bpp metadata which is created on demand from the decoded image data, thus freeing the app from worrying about decoding/encoding. Other cloning deppels are not there at current.
Also the EA interface supported by e2/e3/xfs is implemented and EA plays a strong role both in the VFS and clients.
I may also be of interest to some to note that the “groups” based on active queries as described is similar to Formal Concept Analysis as shown in <a href=”http://toscanaj.sourceforge.net/“>ToscanaJ. As a side note the ferris project will be moving to allow active views from metadata or full text indexing based on FCA in the near future.
As this thread is already long, could ferris related replies be directed to the http://lists.sourceforge.net/lists/listinfo/witme-ferris“>
well my questions got anwsered thanks for the reply
“First, people do not think in terms of “opening a .jpg file” or “opening a .png file”, people open Images. I
hear many people talk about “a Mozilla image” or “a Paint image”, using the standard application to open
the filetype. Opening a file in another program then becomes an almost impossible task to some (most?)
people. This proposal does not solve this. ”
The problem here is that image files are in most cases usable in many
programs (Photoshop files being an exception). Other types of data
files, like those saved by Microsoft’s office software, are
deliberately designed to be impossible to use in other programs, and
are often not usable in other versions of the same program.
Quark XPress files can’t be used in Pagemaker, and vice versa.
There is certainly a need for a cross-program data file format for
printable documents, which could be used by all DTP and graphic
word-processing programs. XML SVG is probably it. PDF would be
suitable if it was an open format.
The only DTP file format from a company that is openly documented is
Pagestream.
Animation files, with dozens of secret codecs, are often not portable.
Commercial companies have an interest in tieing every data file to a
specific application. Users do not benefit at all from this. Nearly 20
years ago, Electronic Arts introduced the Interchange File Format to
try to solve this problem. Unfortunately, it was only fully adopted on
one OS, although formats such as RIFF-WAV and AIFF are slightly
corrupted versions of it.
IMO a database style file system requires open data file formats.
Will this new system use up 32k to save my 100 byte file?
Brendan seems to have put off a lot of programmer types with his use of the word filesystem, because they regard the filesystem as the way the computer formats and stores the files on a storage device, regardless of the way the files are presented to the user.
However, the word filesystem is very often used in a UI context, referring to how the user find his files laid out. When we say “UNIX file system”, we often mean “UNIX file system layout”.
And the file system, no matter where it lies, plays a part in the presentation. Some filesystems have very little metadata support, some have very broad. This is filesystem related, not UI related. Of course, UI support has to be built in as well, but without support in the former, there cannot be in the latter. Look at metadata again. UNIX file system interfaces are often considerably weaker than the underlying filesystem. Many of its features just lay dormant because of lacking support in the upper layers. The same could be said of NTFS.
Actually I wouldn’t mind being able to Google my HDs at least as well as I can Google the web, not just text but all common doc types, pdfs, jpgs etc. maybe Google could produce a FileManager addon but stay inside the Browser.
I always loved when I had that Indexing app from On (Mitch Capor) on my Mac for a a few yrs then those Apple/On idiots killed it. Never had content searching again till I went to W2K,BeOS but not the same. Its alot harder to do this stuff on 100K files than when the old Mac had a few Ks.
An interesting article. It gave me an idea that might be interesting (or stupid flamebait).
As a person who has observed user/computer interaction since non-techies first touched a keyboard, I have seen one common problem crop up on *nix,Dos/Win & Mac: users can’t find their files. Users create files and forget their names, or save them without knowing/understanding where the files “live” in the directory tree.
Some may feel that it is the responsibility of the user to be organized, and the same people who lose files on their PC would have had messy, disorganized desks in the pre-computer era.
However, computers are supposed to make us more productive. Couldn’t part of that role be enforcing an organizational strucure on a standalone PC or network.
(Disclaimer: Odds are what I’m about to describe already exists. If so, is it working for those of you “in the know”?).
In the pre-computer era, a busy office relied on filing cabinets. Many organizational systems were used, but alphabetical was the most common. What if your favorite OS enforced a “filing cabinet” system automatically, so that files would always end up in a predictable place. The more files a given user creates, the more fine-grained the “filing cabinet” becomes. For example, a home user who creates a few files per day might have a simple alphabetical filing cabinet, with one folder/directory per letter/number. A busy office person might have a filing cabinet with one folder/directory per letter and subfolders for common file types (i.e. – Filing Cabinet->A->WordProc, Filing Cabinet->A->SpreadSheet, Filing Cabinet->B->WordProc, Filing Cabinet->B->SpreadSheet). A busier person still might have a finer-grained filing cabinet subdivided by date/time (i.e. – Filing Cabinet->Feb 03->A->WordProc). In a networked environment, the filing cabinet would be subdivided by user/goup (i.e. – Filing Cabinet->Accounting->Feb 03->A->WordProc).
Nothing revolutionary here, except the idea that this process is automatic. When a user saves a file, it automatically ends up in the correct place in the filing cabinet. As the amount of files grow, the filing cabinet automatically implements a finer-grained structure and moves files as needed. Files always end up in a predictable location, a unified organizational system is imposed on all users , everything is (relatively) easy to find.
And what about users who forgot what they named a file they saved 3 hours ago? Simply include a “Show recently added files” interface (or a “filing cabinet friendly” wrapper for your OS’s search API).
Seems like this would be relatively easy to implement. Any soluions like this out there already for Mac 9.x, Windows, Linux? If not, how do some of you folks deal with the “where is my file” issue?
Everyone here who makes the distinction between File System and Filebrowser/ UI/ Manager/ etc are ate up with it. It’s like saying a Nazi is not a Fascist. Yes technically correct, but in common usage even among the principle movers and shakers, the terms are used interchangeably.
If the following can speak of the FILE SYSTEM and FILE BROWSER interchangeably, then the rest of you can… I don’t know your accomplishments in FILE SYSTEM design. I know theirs.
Reiser
http://namesys.com/features
” Hierarchical, relational, semantic, and hypersemantic systems all force structure on information, structure inherent in the system rather than the information represented. If a system adds structure, and the user is trying to exploit partial knowledge (such as a name embodies), then it inevitably requires the user to learn what was added before he can employ his partial knowledge. With complex systems, the amount added is beyond the capacity of users to learn, and information is lost.”
Bruce Tognazzini
http://www.asktog.com/columns/038MacUITrends.html
“The Mac has held the advantage for years with its spatially oriented file system.”
“With the Mac, you have always had the power to move around and organize applications and documents in your own virtual space, maintaining a neat or cluttered workspace, as is your habit. Other desktop systems, from Windows to Unix, have depended more on abstraction, forcing users to remember the location of objects in complex hierarchies. In theory, all of this reduced clutter, but it really only moved the clutter from the visible desktop to the back of your mind.”
John Syracusa
http://arstechnica.com/reviews/01q2/macos-x-final/macos-x-14.html
File System Layout – chapter on the FS GUI.
David Gelertner
http://www.wired.com/wired/archive/5.02/fflifestreams.html?topic=&t…
In the creation of a chronological FS, the whole article and tons of quotes between different perspectives use FILE SYSTEM when speaking of th UI – spatial, semantic, chronological, etc.
John Gruber
http://www.daringfireball.net/2002/11/that_finder_thing.html
“In the classic Finder, there is no abstraction between the actual file system and the view of the file system presented on screen. ”
Benoit Schillings & Dominic Giampaolo interview
http://www.theregister.co.uk/content/4/24648.html
They started BFS with separate userland DB and filesystem and then followed with a db-like FS we all know as BFS. In their discussion, they speak of the FILE SYSTEM and user interaction as one entity. Only in the initial design stage, which they regard as a mistake, do they speak of them as separate entities.
Which brings to mind this principle: If you try to divorce the File System Interface from the underlying structure, you end with a failure, like thebrain. A FILE SYSTEM is accurately reflected in it’s visual metaphor. Just as C:/ongrat/ulati.ons
A hierarchical FS in almost any of these discussions is universally used to describe a system of storing data in directories and subdirectories. We are not speaking of HFS Hierarchical File System used in the Mac (directly anyway.)
If you want a definition: “A FILE SYSTEM provides a mechanism for storage and access to data and programs of the OS and the users of the system. It has 2 distinct parts: a collection of files and a directory structure.” Operating System Concepts 6th edition.
Norton Commander/ Explorer and the other 2 pane File Managers are excellent visual metaphors for the directory structure we are used to (and that Tog derided above) – a hierarchical FS. It accurately reflects the underlying data structure. The single pane Mac Finder was pretty accurate in visualizing the flat workspaces of HFS. Which is probably where DB comes from since his method of storing everything in desktop folders and creating crazy shortcuts everywhere reflects how you had to work in the old Mac OS.
So those who think you know something, get over yourself or show me your accomplishments in File System creation. At least debate the merits of a db-based FS.
The meaning it’s completely different.
CAJONES=DESK
COJONES=what you wanted to say 🙂
As for the topic I have nothing to say, sorry.
I have to agree with jbolden1517 (and others), in separating different layers and aspects of data storing.
“BFS” mentioned tends to D.B. view of storage, and it tends to select concrete “file” (an object stored) by means of a “search” (selecting by complex key, IIRC).
But in any case, each object has to have a unique, so-called “primary” key.
The easiest way to assign such a key is a hash function, but it is a hell to assign a meaningfull hash, so we create a simplest hierarchy. If you want, you can store keys this way: “animals:chord:mammal:bear:white” or “plants:flowers:roses:rose:vulgaris”. You can suggest various schemes, but in any case if you try to search something you’ll find it very s-s-s-slo-o-o-o-o-o-o-ow. If you build index to speed it up, you’ll get well-known tree. If you make separate unique index, you’ll get UNIX tree, if you don’t, you’ll get DOSes tree. Swapping keys order, e.g. you want use “white:bear:mammals:…”, by means of creating aliases or links. IIRC, some OSes use simple string translation tables like “/bin/app1/” -> “/opt/cat1-0/cat1-1/cat1-2/…/app1/”, it is the simplest way and I think it covers almost all the most usual needs.
But of course, some “strange men” (like Steve and me) miss “very unusual things” like versioning feature (VMS! Where are you? . This problem could be solved by RCS (the first found example) or even more simple way, but we (I and Steve) need for this a mechanism to build this thing between User Interface and Data Storing System (DSS), i.e. we need a way to insert our software between “open/close/create/unlink” layer and “real-create/real-unlink” layer. But we have not. jbolden is right it is not the task of DSS, but where I can insert my UI? You all have told about many examples: Reiser, AmigaOS, old MacOS, new Win, classical UNIX (SVRx?), BeOS, even AS/400. But this is not sufficient, as you can see.
For UNIX I see partial solution by more agressive usage of links, but this does not solve versioning problem and automatical naming (e.g. by date-time-type-<anti-collision-hash>, by “TITLE” of HTML/TeX/another text, by “id3”-tag of MPEG file or tags of TIFF file). Also such solution introduces more hidden bugs.
I worked for the government for quite a few years. In the AF and other military branches, you have regs/ “instructions” on how to lay out a file plan, a file cabinet, what goes in each, how long it has to be stored, etc. The problem is, I’m a techie not an admin puke. Filing was stuck on me because I had high organizational scores.
I never learned the file plan more than was needed to do the job. Even though this was “standard” across the whole AF, no one could find anything in that file plan except me. Whenever anyone wanted to find something, they would have me retrieve the files. Too much crap and the File Plan (TOC) is to vague to be worth anything.
Another “Standardized” problem comes up in how the military tracks maintenance repair and supply. It does so in the worst possible dB system called CAMS with supply in a shotgun mated system called SBSS. What do the following terms mean: 907, TM=B, WD=C, WUC=AA000, HMal=H, AT=437? Precisely. In most maintenance shops, the people who suck at fixing things are given the job of data entry – aka the “shop b***h.”
By creating a 1 size fits all solution, what you’d create another data entry job. Data entry jobs exist because people can’t come up with a data storage solution the people who need the data can use.
As you pointed out, you’d have people not knowing what other people called a file. As I pointed out previously, some people think of things in different contexts. Say I buy a special bolt for equipment X and saved the purchase form because it was a pain to find this bolt. Budget would look for bolt, or maybe the purchaser’s name, or store name. The techs would look under equipment X. Program management would look under the Project we bought it to support. Boss would look under the end mission it supports. How do you file it? Alphabetical falls flat.
Also in XP, you can do your plan. Place all user files in a flat directory>sort by type>group by name.
Sorry I’ve been replying so much on this thread, but it is something near and dear on my wish list.
Ellis I suggest you reread your own quotes. In all of them the authors are quite clear about whether they are talking about the GUI or the underlying structures. Lets take a few examples:
Bruce Tognazzini
…
“With the Mac, you have always had the power to move around and organize applications and documents in your own virtual space, maintaining a neat or cluttered workspace, as is your habit. Other desktop systems, from Windows to Unix, have depended more on abstraction, forcing users to remember the location of objects in complex hierarchies. In theory, all of this reduced clutter, but it really only moved the clutter from the visible desktop to the back of your mind.”
John Syracusa
…File System Layout – chapter on the FS GUI.
David Gelertner
— you don’t actually quote him.
John Gruber
ht tp://www.daringfireball.net/2002/11/that_finder_thing.html
“In the classic Finder, there is no abstraction between the actual file system and the view of the file system presented on screen. “
It is not that he just uses terms interchangeably. That’s OK. Author solves non-existant problems by *confusing* (as in being himself confused) the two. He proposes a *worse* way to store files for want of a better way of accessing files. But the latter concern does not necessicate the former, it doesn’t even imply it. I wouldn’t mind someone writing about improving X-windows or linux kernal, as long as writer clearly understands the concepts using wrong terminology is not a problem. In this case, author *still* confuses things.
“Actually I wouldn’t mind being able to Google my HDs at least as well as I can Google the web, not just text
but all common doc types, pdfs, jpgs etc. maybe Google could produce a FileManager addon but stay inside
the Browser. ”
But the Google image search is actually a search for the file names of
image files. If an image of a cat is stored as “dog.gif” it will come
up in a search for “dog”.
AFAIK software for identifying the content of images is only at a very
early experimental stage. Software for identifying the source of drum
samples hasn’t even been thought of. The file names are all you have.
There shouldn’t be any special difficulty in making a local search
engine which would act like Google but on a database formed by
regularly surveying your drives and local network.
The author describes what one could argue is a fairly old state of the art in document management. Xerox used to make a product very similar to what the author envisons. The file system was available via anonymous http; users could login to gain permissions, etc. (yes, I’m aware that the author appears to have suggested a kernel-mode driver, but bear with me.)
Files only existed as unique ids (integers), and groups (which were even called groups) could contain any subset of the files. The system also did automatic conversion of .doc, .ppt and .pdf files into html. In addition to what the author describes, the Xerox DMS also did version tracking (the most useful feature of the product, IMHO).
For what it’s worth, I’ll tell you that it is a royal pain to have to give keywords and metadata to each and every file. Metadata is one of those ideas that seems great in theory, but it really lacks practicality. Users do not want to have to enter extra information about their files.
There is often a reason why people have not adopted a new paradigm. In this case, that reason is the inherent difficulty of creating suitable metadata.
..and what I’m saying is that the difference between the UI and the FS is irrelevant. It’s semantics. You’ve somehow chosen to miss that in every post.
Maybe in Linux/ Unix models with jfs, ext2/3, xfs, gfs, etc it makes a (small) difference when you can choose Nautilus, KFM and the rest. It gives the illusion that there is a difference between the FS and FM and draws a wispy line. On Linux you have only 2 real choices – text-based FM and the 2-pane based FM. You’re just mixing and matching FMs with underlying FSes, but the difference between any of these is really negligible. Do I get the blue trim on the red Pinto or red trim with the blue Pinto? Anyway it goes every existing FM/FS combo available is another way of storing-retrieving-manipulating /path/to/mycrap.txt.
The other 98% of the world uses an OS where the UI is mated intrinsically to the FS – this includes NTFS, HFS, BFS, and any other graphically-oriented OS. To dismiss this author’s overall proposition – we need a new way to store-retreive our stuff – as garbage because <2% of the PC-using population thinks there is a significance between “FS” and “FM” is weak. The 98% see the FM as the FS. A system to file our files. It doesn’t mean we’re stupid, it means we work WITH software, not ON softwares. They’re a tool, not an end.
Maybe you should read the articles – the developers view the UI as an integral part of the FS. They cannot speak of one without speaking of the other.
As far as Gelertner goes, read the whole article – it’s interesting for a lot of different reasons which is why I didn’t quote anything specific.
..and what I’m saying is that the difference between the UI and the FS is irrelevant. It’s semantics. You’ve somehow chosen to miss that in every post.
I haven’t missed it. I’ve responded time and time again why your point is incorrect. I’ve given examples of application communities (like VMS and Z-OS) where file system are further abstracted. Others have mentioned OS/2 and HFS (which included many of the features brandon wants). For that matter HFS+ is a good example of presenting HFS features while using UFS as the filesystem; so even HFS+ is a pretty good example of why interface and underlying filesystem have very little to do with one another.
More importantly on an OS board I don’t think its unreasonable that article authors should know the basic concepts about OSes.
Maybe in Linux/ Unix models with jfs, ext2/3, xfs, gfs, etc it makes a (small) difference when you can choose Nautilus, KFM and the rest. It gives the illusion that there is a difference between the FS and FM and draws a wispy line. On Linux you have only 2 real choices – text-based FM and the 2-pane based FM. You’re just mixing and matching FMs with underlying FSes, but the difference between any of these is really negligible.
I don’t think that’s true. I’ll grant something like: but the difference between any of these is really negligible in terms of information architecture . At the lower levels (the extent levels) the differences are huge. Matching the right file system to the right application / datafile can have large performance effects. But in the Linux community the fact that at the top level they all look the same is considered a feature since it means changes of filesystems don’t require application changes (with a very few exceptions and those are the ones which want the filesystem changes in the first place).
So again, the fact that something looks similar doesn’t mean it is similar.
Do I get the blue trim on the red Pinto or red trim with the blue Pinto? Anyway it goes every existing FM/FS combo available is another way of storing-retrieving-manipulating /path/to/mycrap.txt.
And that’s false. There exist lots of 3rd tier architectures for Unix you just haven’t used them.
The other 98% of the world uses an OS where the UI is mated intrinsically to the FS – this includes NTFS, HFS, BFS, and any other graphically-oriented OS.
And again you are wrong. HFS as I mentioned above is supported by very different UIs; including Linux, Mac 0S 1-9, Mac OS 10, etc… The NT kernel runs fine without explorer and you can use NTFS without whatever GUI you want. People don’t do it because its pointless not because they can’t.
To dismiss this author’s overall proposition – we need a new way to store-retreive our stuff – as garbage because <2% of the PC-using population thinks there is a significance between “FS” and “FM” is weak.
I don’t remember doing anything of the sort. I dismissed the article as weak based on my belief that he did understand the distinction but had a deeper misunderstanding about how hardware drives OS choices. It was only later that I discovered that he was making the really basic mistake you are defending. I attacked his analysis of the overall point as weak based on his lack of experience with systems which offered what he wants. A clear comparison to things like VMS, OS/2, HFS, Z-OS, which use meta data and/or database filesystems and explaining what they do or don’t do would have resulted in a meaningful article. Its the same criticism I’d level at you. You say you consider document organization a major issue and yet you haven’t seem to have looked at alternatives (and there are many for most major OSes). For example what’s wrong with Documentum for what you want?
The 98% see the FM as the FS. A system to file our files. It doesn’t mean we’re stupid, it means we work WITH software, not ON softwares.
98% of the population doesn’t understand biochemistry, they indicate the health problems they want solved they talk about specific chemicals they want introduced as drugs. Once you start addressing things at a gene or a chemical level your knowledge of biology or chemistry does become an issue, you don’t get to walk off as “well I’m just a lay person not a doctor”.
If you don’t like explorer as a file manager stop bring up NTFS, talk about explorer. If you use words like “NTFS” people are going to assume you know what they mean. If you don’t know the distinction between NTFS and explorer then talk about NT without talking about its parts. What does using the term NTFS (when talking about properties of explorer) add to the conversation except confusion?
Maybe you should read the articles – the developers view the UI as an integral part of the FS. They cannot speak of one without speaking of the other.
That’s simply false. As I pointed out with bolding they most certainly did differentiate between them. They used the words properly. And they can talk about one without talking about the other, my guess is that you focused on user interface articles ties to specific OSes. Hang out on places where people are discussing abstract information architecture and you’ll see almost no discussion of filesystems in the specific. Go to something like kernel.org and you’ll see detailed discussions of filesystems with no discussion of user interfaces.
You are the one that is confused. Read the bolding.
This was implemented in Amigados/Workbench with the *.info file, which contained two icons (selected/unselected) of arbitrary size, plus file metadata (though not the detailed information that is suggested here).
(BTW, the Linux desktop could benefit from using this approach.)
As others have pointed out, nearly all the ideas are implemented in one fs or another. For example, ln can be used to create the author’s “groups”.
If the heirarchies are removed, then the only way to reach a given group or file is to search for it? Do I understand right? Seems like a loss of functionality. I like heirarchies, they mean I can store a lot of files, and have a mental concept of the contents of the drive. Structuring the drive helps me structure my thoughts. I don’t think this article is particularly inspiring.
Someone mentioned about pictures… would it be possible to have a picture of a dog, and a picture of a cat, with meta data labeling them as such, and search your drive for all mammels… This sort of thing is the goal of the W3C’s semantic web format OWL (except they intend it for the web). But OWL’s not finalised yet. And you’d also need a dictionary of terms, like WordNet or OpenCyc, except I think they’re not compatible at the moment.
Semantic meta-data is definately a good thing though, I think… and could be taken further… I think there may be many folders/files on a system that represent the same concepts… e.g. languages, countries, software platforms, src/bin/doc. If I am not a techie, and I only speak one language I might not need to see any of the folders/files for another language (but someone else logging on to the same machine might need to see different ones instead).
Also, I think I kind of might like heaps of use of URIs (and so real extensive heirarchies). e.g. OpenOffice stored in a directory something like local/org/openoffice/1.0.2/linux/en/bin, with documents of filetype org/openoffice/1/file, and preferences stored in directory ~myhome/.prefs/org/openoffice. In this sort of case, it would make a very clear connection between program, filetype and prefs. In other cases, if you end up with a filetype you don’t recognise (or preferences for a program you don’t remember installing), you know where to start looking on the web. And I think it would be good to avoid seperate registration processes for filetypes and directories (including preferences directories). We don’t need lots of seperate namespaces controlled by lots of different bodies.
Oh well, they are. But no fun to use.
I see a huge problem in implemenetation of the user interface to the FS, that we have seen so far, mainly due to a mistake MS Windows did and thousands of zealots copying it (KDE, Gnome, etc.)
Look at M$: C:WINDOWS*.* No real hirarchy, all thrown in.
This is the worst of usability examples.
Does not count here, just wanted to mention it…
Next bad example (and my first point) is the implementation of the hierarchy-browser in Windows, KDE and Gnome: A WindowsExplorer type thing with a tree on the left and a document window on the right. The only advantage is the preview/documentview, that we can apply in the right pane.
However, it takes a long time to aim these little crosses and drawers in order to wade through a structure.
A tree is being displayed as a tree. This not only adds additional computing time (and no, displaying a tree in any explorer did not get faster over the years, especially not with “show icons” on).
For a visual effect time and comfort gets lost.
Much different is AmigaOS.
You still have the hirarchy. But it is displayed flat.
When you open a file-requester or a file-manager, you are represented with the current directory in a list. All files and folders, plus their attributes, are visible. You just klick a “line”. Not a “part” of it. You also do not need to see where you are in the hirarchy, since you have a string-gadget that not only allows to manually enter the path (including TAB completion) but also you see “where” you are. I can navigate a filesystem about 4 times faster on AmigaOS than under Windows or Linux, of course, I am talking about graphical navigation here.
But the author makes some good assumptions.
Computers are raw databases. Nothing else. We tried to do all sorts of things with computers, that try to make it look and feel as it would be something else than a computer.
But a computer IS a computer.
It is a processing unit and memory.
Memory is an indexed table. Not more, not less. Why not use
it as such ?
We staret having so much data that we need a database to organize it. Any OS, that does not implement a database at OS level is not ready for the future. Another example, just to illustrate my frustration, are GUI skins.
We managed to get away from the tape-deck, the cd-player just in order to find ourselfes applying el-neato tape-deck and cd-player skins to the media-players.
Where is the integration with the OS ? Where are the benefits of a large storage system ? All I get is mini-buttons, that I can barely see.
There won’t be much of a change without the OS being adopted to new needs. Heck, the only ones who see the future are Microsoft. Really. I am dead-serious.
They applied, as first for the consumer (not in general, I know, I have used Oberon myself), the global component based programming. I doubt, applications, that do not simply use the components of an OS will have such a bright future. Whys should they ? It is so much nicer to update a component, that affects all installed applications, than the application itself.
With the implementation of dynamics on the desktop (task-based computing, means not “I want to start Photoshop”, it means: “I want to process an image”, that is a whole difference, whether this command starts Photoshop or not) background layers will adapt also. That might be finetuning the filesystem-access layer (i.e.: for filesearches give higher hirarhcy to image files and filters, that to MP3 tunes or make only certain “groups” (to stay with this example) visible to the user) or applying different shortcuts to taskbars and desktop and so on…
The other thing MS has seen is the requirement of a filestrucutre, that is much closer to a database table than now. Compare hirarchical databases (usually known as “directories”, i.e. “ActiveDirectory” and “LDAP”) to table-based databases: Tables offer much more power and speed. Any database professional will outline this (and why).
Hirarchies should not get lost. They should be available. If not other than emulated. There are uses for them.
But with a decent filetype recognition built into the OS (not this stupid *.ext) and some default groups set by the user, addition of translation filters, indexing of data should be very easy.
So, I think a computer should be to the user what it is to the technician: A database with a processor. Having this fixed straight a lot could be done to make it more useful.
.john
“SHE first creates the group for the new project, optionally names it (who said a group had to have a name?)” Enough without the PC baloney.
Automatic conversion is great. My app only reads divx files. I’ve a 15GB .avi on my hd. No problem, autoconvert kicks in. Several hours later, I can open the file. Oops, wrong file, I really wanted take #2. In batch processes this autoconvert becomes really interesting…
So in short, I like the author to expand on this and cover:
1. Time it takes to do a conversion.
2. How to deal with loss (converting audio).
And all this, without a user being involved.
I know that the author wants to erase the hierachical tree, but how about XFS. It has extended attributes. You can tack on arbitrary name-value pairs.
Take a look:
http://oss.sgi.com/projects/xfs/features.html