Linux boasts the widest array of filesystem support among mainstream operating systems. However, Microsoft (with Longhorn) and Apple (with Tiger) have made it clear that they consider the filesystem of the future to be a database of information to be mined, and that client PCs will be a major part of the next chapter in the “search wars.” The future of Linux may depend on whether Linux filesystems continue to innovate. My Take: Actually, it is more a factor of applications and frameworks get built around ReiserFS4’s or XFS’ advanced features (and scrap filesystems that don’t have such functionality, e.g. ext3) rather than “continue to innovate”. No matter how much innovation Mr Reiser brings to the table, if these features don’t get *used*, they are useless. What Linux needs is a filesystem that it’s so standard and highly used that frameworks can built around it and allow user-visible features like Spotlight or BFS or WinFS.
anyone remember seth nickells storage?
Of course we do, but Storage seems stalled atm and it’s written in Python. Beagle is another effort, but it is still in its 0.0.1 release and requires Mono.
None of the two are low-level enough for a wide adoption from all DEs and apps.
Ext3 is packed full of features – many of the same that ReiserFS 4 and XFS have. The big difference is performance – Ext3 is slow. Reiser optimizes for small files, and XFS for huge files.
One of the few remaining differences is Reiser’s interface to extended metadata, which he considers to be the holy grail. ANd frankly I see where he is coming from. But if AVFS or something similar ever matures then it would be simple to create a plugin where /somefile#/ is a virtual directory with /somefile’s metadata, so Reiser isn’t needed for this.
Everybody is touting database based filesystems as the next big thing and saying its the only option. I am skeptical. I feel that people may not use them because they don’t want to have to assign metadata, autoassigned metadata might not be what they want or wrong, it could really add to the clutter, and old habits die hard.
For all the hype, they could be huge bombs. At the same time, I’d put more stock in Apple’s version than MS’s. To me, Apple makes things easier for the user and more natural while MS is still too geeky with implementation at times.
Rather, what’s needed is a standard way of accessing database-oriented functionality. This high-level API would be filesystem-independent, and thus not require migration to a common filesystem.
Given the existing infrastructure, this probably wouldn’t be terribly hard. Incorporating such functionality into KIO wouldn’t be unduely difficult, and would make the database FS transparent to most KDE apps. GNOME apps would be trickier to handle, because most don’t use GNOME-VFS, but some level of support could be enabled by incorporating searching capability into the file dialog. In general, I’d like to see a freedesktop.org standard that subsumes gnome-vfs and KIO, and incorporates searching capability from day one. As long as KDE and GNOME follow suit, along with a few key third-party apps like Mozilla and OpenOffice, things will work out. It has become increasingly clear that any platform not associated with this “big four” will in the future be considered “legacy code” anyway, so their support of search technologies does not much matter.
Now, I think an important point to keep in mind is that there is lot’s of time to get such a thing implemented. The “Search wars” won’t start until Longhorn ships, and that’s looking to be two years away at a minimum. After that, you’ll have to wait for all the major apps to transition to using the Longhorn infrastructure, which will be another year, at a minimum. If previous transitions are any indication (16 -> 32 bit, Win 9x -> Win NT), the wait will probably stretch out a few more years beyond that. To make full use of the search paradigm, new UIs will have to be designed anyway, which will cause the transition to take longer.
>Rather, what’s needed is a standard way of accessing database-oriented functionality.
I disagree. If you have a “standard” way of doing db stuff, you endager the innovation part. If you get all 3-4 filesystems to expose the same functionality and be comaptible with a specific API, then no one would want to break something in order to add functionality. You are creating a trap in the long run.
IMHO, it is just ONE filesystem that’s needed. One fs that’s good for most stuff.
This is a proposed undergraduate dissertation title for 2004 – 2005:
2) Using a DBMS to Implement a General Purpose Filesystem
The limitations of filesystems are well known. Using a DBMS to store data is claimed to overcome these limitations. It can be argued that a filesystem implemented on a DBMS would demonstrate all the advantages of an ACID compliant DBMS while overcoming the limitations of filesystems and retaining the general-purpose flexibility of filesystems.
Almost all modern operating systems support more than one type of filesystem. For example, Windows 2000 supports FAT16, FAT32, ISO9660, NTFS, and remote filesystems via FTP. Most UNIX and Unix-like operating systems support an even wider selection of replaceable filesystems. Some alternative filesystems for Linux address one or more aspects of ACID compliance (eg., Reiserfs supports journaling), but none employ a true DBMS. Some speciality operating systems such as PICK employ DBMS-like mechanisms at a kernel level, but PICK does not implement a true relational model. Rumours suggest a future version of Microsoft Windows will employ a DBMS-based filesystem, but this remains to be seen.
Implement a filesystem for Linux on top of an ACID-compliant database such as PostgreSQL. Design and conduct experiments to quantitively and qualitively explore the benefits and limitations of a general purpose filesystem implemented on top of a SQL database.
My university http://www.derby.ac.uk or http://acis.silktide.co.uk
http://www.openoffice.org/editorial/JT-2002-03-15.html
Eugenia,
I hear what you saying about the fear of stalling innovation in lue of backwards compatibility, but that shouldn’t be the case. Anyone can extend the API without breaking with the past if the designers/developers think about what they are doing before they set out blindly. API and such are extended everyday in software without breaking legacy applications.
I agree that there needs to be a standard approach to getting to the extended data, and allow the actual filesystem (Reiser and XFS) to deal with the details of where and how the data is actually stored.
I have thought for sometime. A database storage solution would be great for data files. But I’m skeptical as to how clutter the systems would become. User typically save files just whereever unless a particular program defaults to a specific lcoation. Maybe what should happen is a partion or sorts that is a database that all data must be stored to.
so the postgres server binary would be a file in a database served by the postgres server binary which is a file in a database served by the postgres server binary which is a…
using something like postgres, would you have to have a non-DMBS disk to start the service before being able to mount a disk?
i would agree more the common API guy then then the one master FS guy. FOSS’s bread and butter is common standards. HTTP, SQL, POSIX, X11, et al. so what’s the big deal if a common metadata/search API is written for filesystems? i’d rather have 23 different choices of file systems that all work with apps built around a specific standard then 1 filesystem that is only supported by a limitted number of programs. the developers still have choice in how to impliment their FS, wether it be hidden folders, central metadata repositories, DB, xml files, the possibilities are endless, but in the end everyone wins because no matter which FS they use, they can still get the added functionality of a common API.
if someone made a new API every time they wanted to do something like talk to a computer on their network (instead of using TCP), we’d go no where fast. i’m not saying only one API should be developed, i just think, in the end, one API should come to the forefront and be embraced by the majority.
Whoever implements it first wins. So if somebody builds a database filesystem on top of ext2 or Reiser4, then that’s the standard. But it won’t pass much time until that new database FS gets ported to ext3 or XFS or fat. Besides I don’t see that kind of FS replacing today’s FS soon. They may live together in the same system, like today we have different FS.
If you have a “standard” way of doing db stuff, you endager the innovation part.
At a high-level interfacing to databases is essentially a solved problem. The existing paradigms are very mature, and there is no indication that they will change anytime soon. The “innovation” will be in the user interface, not in the application-level API. In the remote chance that there is such an innovation in how databases are accessed, there is no reason the standard API could not evolve with it.
If you get all 3-4 filesystems to expose the same functionality and be comaptible with a specific API, then no one would want to break something in order to add functionality.
Of course, the same thing is true when you have only a single filesystem. Say everyone standardizes on Reiser4. What happens when Reiser5 comes out? It’ll have to remain API-compatible with v4, perhaps offering new functionality with a new set of APIs. Semantically, there is no difference between this scenario and having a single standard API that supports extensions. Now, the worse case comes when the hypothetical XFS2 comes out, which totally trashes Reiser4. If there is no standard, XFS2 will have a completely different API. At that point, you’ll either have to stick with the inferior Reiser4, or endure the pain of having to move all your apps to a completely different API, instead of just a new version of the existing API.
IMHO, it is just ONE filesystem that’s needed. One fs that’s good for most stuff.
There is no such thing. Any given implementation is a point solution in a large design space.
For a Linux system, you’ve got four major markets you have to support. You’ve got the server market, which needs high throughput for many medium-size files. You’ve got the workstation/scientific market, when needs insane throughput for a small number of absolutely enormous files. You’ve got the desktop market, which needs low latency for a moderate number of medium-sized files. You’ve got the PDA market, which needs high-efficiency (low power, minimal rewriting of flash cells) for a small number of small files.
Now, Reiser3 fit the server and desktop markets pretty well, while being edged out by XFS in the workstation market. Initial results suggest Reiser4 will perform well in the workstation, server, *and* desktop markets, but at the cost of an enormously complex system that is entirely unsuited to the PDA market. Since all of these markets could potentially benefit from a search-based UI, and Linux needs to support all of these markets, any potential solution *must* enable more than one implementation to be supported.
None of the two are low-level enough for a wide adoption from all DEs and apps.
Only on OSnews do filesystem implementations have anything to do with a Desktop Environment.
Locate always worked for me anyway.
@Ext3 is slow:
http://www.gurulabs.com/ext3-reiserfs.html
What would be a hit is if someone made it so you could google search your hard drive. Now that would be something to look forward to!
(Just joking)
“What would be a hit is if someone made it so you could google search your hard drive. Now that would be something to look forward to!
(Just joking)”
I’d like that, I’d like that a lot…
Last time everybody thought Google was joking, I got a gig of free email storage… I hope they’re paying attention here, or at least that somebody is working hard on this kind of thing.
I got beagle the other day, It really needs a vfs module so we can have live queries.
Choice is good, but we’ve way too many FSs…
ext3/reiser3/reiser4 (a different beast than 3), XFS, JFS…
I _like_ being able to have a choice, but it’d be great if we’d have just a couple of them which have a good behaviour on most of the scenearies, instead of being good just for one single thing
“reiser3 is fast for small files” “XFS good for large files and lots of processes writing things at the same time” “ext3 has very good data-safety because of its data=ordeder journaling mode” “ext2 is damn fast but it doesn’t have journal”.
Can’t we get a couple of them wich have most of the advantages?
http://www.bug-br.org.br/openbfs/
All it needs is for a big Linux vendor to make it the default filesystem, and the unwashed masses will support it. 🙂
Anybody who says “why can’t we just have one super-duper filesystem!” has never worked on any sort of engineering project. They make this statement out of ignorance of reality, or some foolish desire for the logistics of programming to work in a way that they do not.
A simple fact of life is that the more developers you have working on a given project, the lower the productivity of each developer. Without a huge, productivity-sapping logistical infrastructure, the upper limit on the number of people that can work on a given component actually quite small. If the number of people interesting in working on a given type of component exceeds this upper limit, you’ll naturally get multiple, competing projects.
We have Ext3, XFS, ReiserFS, and JFS not because some guy said “wouldn’t it be cool to have lot’s of filesystems?” but because there are a lot of people looking into designing filesystems, and it’d be impossible for them all to work on the same time, especially because they all work at different companies. Even if it was a good idea to remove this competition (it’s not), you couldn’t do so anyway. Linus could say “Reiser4 is the official filesystem on Linux, nobody else can develop filesystems on Linux,” but then the developer community would respond “it’s a free world, it’s a free kernel, we don’t have to listen to you!”
the whole meta-data’d file system has always sounded like a great idea.
until you realize someone actually has to put that meta-data there.
then it all falls apart.
Not at all. Consider: Google operates on metadata. The iPod operates metadata. Widely-used Knowledge Management systems like Tomoye operate on metadata. Computer-mining of metadata can go a very long way, to the point that manual entry of metadata becomes much less necessary for the system to work.
Alright, I can appreciate this DB/metadata approach as a user interface improvement aimed at new users. Although, I thought this had been tried before and abandoned. However, it is a mistake to believe this approach is an important technical enhancement.
Applications need to use exact strings to access required files and can’t count on automatic metadata applications to have guessed right. If we abandon the file hierarchy this exact string becomes even more confusing. As far as applications are concerned simply having the appropriate libraries to read and write XML files will be sufficent. As this DB-filesystem approach is merely a UI innovation it is a mistake to be too certain about it’s success. Even if it is better doesn’t guarantee acceptance (see NeXT).
As for the insistance that we need more applications and so forth which take advantage of the benefits of ReiserFS, XFS and etc.. This is just ridiculous. Linux and other major operating systems virtualize for a good reason, so you don’t need to reprogram the applications to gain advantage of new filesystems. Just installing these new journalling filesystems gives you all the advantages to fsck, access speed and so forth. The only two things I can see wishing for is better support for sparse files (as I understand only JFS really supports them well and it is very slow) and better kernel integration of some of the complex I/O operations in XFS.
Furthermore, I have some doubts about using a DB type interface for the filesystem. In the vast majority of cases acess to the filesystem will be done by one type of key, the name/location of the file. Modern filesystems are designed to be simple and optimize acess to these sorts of queries making all access DB-like would add additional overhead. Much more reasonable seems to be the idea of keeping the same filesystem and adding a seperate DB (perhaps with it’s own dedicated partition). Since Linux already has well developed databases why not just add (if not already done) the ability to store metadata about files in a database to gnome and KDE?
Applications need to use exact strings to access required files and can’t count on automatic metadata applications to have guessed right.
Eh? The database will always contain exact references to the inode of each file. What’s being “guessed” is the metadata that should appear in a given search, not the name of the file. If the application knows the exact file it wants to open, it can always open it by name directly.
If we abandon the file hierarchy this exact string becomes even more confusing.
Why? As far as an app is concerned, a filename is just a string of bytes, and it will remain just a string of bytes.
As far as applications are concerned simply having the appropriate libraries to read and write XML files will be sufficent.
What does XML have to do with anything?
In the vast majority of cases acess to the filesystem will be done by one type of key, the name/location of the file.
Nothing stops database-oriented filesystems from optimizing for this sort of access.
Modern filesystems are designed to be simple
Nobody has ever accused modern filesystems like XFS of being “simple.”
and optimize acess to these sorts of queries making all access DB-like would add additional overhead.
As Reiser4 seems to be showing, using a database like mechanism is lowering the overhead. There is has been a tremendous amount of research into minizing the disk-access costs of databases, and search-oriented filesystems can take advantage of this research.
Much more reasonable seems to be the idea of keeping the same filesystem and adding a seperate DB (perhaps with it’s own dedicated partition).
This isn’t good from a performance point of view. One of the things Reiser4 enables is efficient handling of small files, something that wouldn’t be possible building on top of a legacy filesystem.
Seems to be taking indexing 1 step further…if my entire system is just a stored Procedure away would that mean hackers or other uninvited guests would have access to my system?
Actually, I’m hoping that some bright spark – not me; the Property Boom fascists have made me practically homeless and my PC’s in storage – will add linguistics to the data mining thingee, so we can have our file systems read the files as they are saved to disk and have the metadata sorted automatically from the data. Face it, if they’re powerful enough to do X-Y-and-Z at xGHz, they’re powerful enough to handle their own babysitting.
I agree completely with this article
http://www.openoffice.org/editorial/JT-2002-03-15.html
“The sidebar to the article says: “Microsoft is replacing the plumbing of its Windows operating system with technology borrowed from its SQL Server database software. Currently, documents, Web pages, e-mail files, spreadsheets and other information are stored in separate, mostly incompatible software. The new technology will unify storage in a single database built into Windows that’s more easily searchable, more reliable, and accessible across corporate networks and the Internet.”
So – Microsoft wants to get rid of application files and store everything in a database. How convenient.
It is a brilliant strategic move. After all, Microsoft users are not ‘chained down’ by their loyalty to Windows – they are chained down by their loyalty to their most heavily used Office applications – principally Word and Excel.
Openoffice.org hopes to win these users over, but to do so we rely on the critical interoperability provided by our import/export filters. I personally have been writing letters to antitrust officials begging them to force Microsoft to publish the specifications of the file formats for their Office applications. Such publication would just about completely level the playing field, and allow users to use whichever office productivity applications they like. This in turn would give people much more flexibility in choosing operating systems.
But just think – what if there were no file formats to publish? ‘Sorry judge, we would like to – but the data is not stored in files. It is stored in a database that is an indivisible part of the operating system.’
The database records will of course be totally inaccessible to any program other than the application that stored them – for security reasons. Throw in some encryption, and if Microsoft is really smart, a patented API by which applications read/write to/from the datastore – and interoperability with other office applications will become a priori impossible.
People will still need to collaborate on documents of course (that is, to exchange ‘files’). But the documents will simply move (via .NET) from the datastore buried deep in the guts of a Windows OS running on one computer, to a datastore embedded in a MS OS running on another computer. Microsoft will gradually make the whole thing more and more opaque … to the point at which people will not even think of files anymore. The concept of ‘files’ may be something that is taught to our great grandchildren in history class.
”
And, of course, put a DBMS on Windows will make all of our today PCs obsolete (because of minimum hardware requirements).
if you really want to grasp the concept of metadata/db driven file system, download BeOS Max edition, or BeOS Personal Edition (both are free for personal use). you’ll notice that it uses the file system for a lot more then just storing files.
– programs will take your ID3 data from your mp3s and store it as attributes to that file. now when you’re browsing your mp3 folder, you have artist, title, length, etc. right there in your file browser window, no need to open another application
– email is stored right in the file system and accessible by any program. sender, recieved date, email headers, all right in the file system, any program that wants to read emails just has to know how to read a file and its attributes
– more recently, buddy list in file system. you open a folder, and there is your aim buddy list, again, all the attributes displayable in your file browser (profile, away message, online time, etc), accessible to any program that wants to use them.
– instant searches, because the metadata is automagically indexed when the file is modified, searches happen just about as fast as you can hit enter, or click the search button.
all that and you still have a directory heirarchy, that doesn’t go away. my words don’t describe it well enough, fire up BeOS to get a taste of what you’ll be seeing in the coming years (and BeOS had it, what, almost 8 years ago now?).
conclusion: you don’t lose any of the current file system functionality. optimally, because index are only modified when the file is modified, you incur little to no overhead for a major amount of improved functionality. file systems can go beyond places that are just there for storing files, and become dynamic tools for organizing, finding, storing, sorting, accessing, and using data. it’s not that it was abandoned before, its that the company that brought it to market was destroyed (though luckily the developer has basically reimplimented it at Apple).
reading about winFS seemed cool, but not as cool as the effort they were putting into it. then tiger came out, and i went over to the apple site and read about spotlight. aftwer watching the vid, i was blown away. by the time i finished reading the description, i was drooling.
imagine access to a database of every file on the os, that you can query like any other db, that returns you file objects pretty much instantaniously. that db is seamlessly updated on the fly, so every time a new file is created/moved/deleted/modified/etc. The API is honestly more attrractive to me then the actual UI widget itself. watching what is capable with this technology is really awsome, and imho the first big significant step towards the illimination of heirarchachal filesystems, at least to users.
There appears to be a deep misunderstanding in this article – and many comments – on what belongs in a filesystem, and what a filesystem is. For example:
Think, however, about Windows’s expectation that files of a certain type must have a certain extension; absent that extension applications are often at a loss for how to handle the file. Of course there are application-level workarounds for these problems, but they point to a clear tension in application design: how much should the filesystem be doing to facilitate application execution, and how much should the application be compensating for functionality not in the filesystem?
File typing is not a filesystem function, it is a shell or, at most, an I/O library function. A filesystem, in and of itself, should have nothing to do with determining filetypes. At most, it should allow metadata that facilitates higher level things identifying files.
Longhorn, Microsoft’s next generation operating system, expected in 2006, will include WinFS, a filesystem built on an object relational database structure.
WinFS is not a filesystem. NTFS is the filesystem. WinFS sits above the filesystem.
A filesystem’s function is (safely) storing and retrieving data to and from the “physical” disk. That’s it. Not identifying that data. Not categorising that data. Not searching through that data. All these nice little things that add functionality and usability sit in layers *above* the filesystem.
That’s not quite so clear. By knowing the relationships between data items, the filesystem can much-better optimize the higher-level behaviors like searching for data. At some point, it pays to have a “smart” filesystem that does these sorts of things.
That’s not quite so clear. By knowing the relationships between data items, the filesystem can much-better optimize the higher-level behaviors like searching for data. At some point, it pays to have a “smart” filesystem that does these sorts of things.
I’m not quite sure I understand what you’re alluding to here, can you expand ?
Certainly, filesystems should allow the storage of metadata for things like filetypes and keywords for content searching, but they shouldn’t be trying to figure out whether a file is an mp3 or a word document – that’s the job of applications (or perhaps I/O libraries).
What about layering a database on top of the existing filesystems in a way that “the meta-data sql files” represent “real files” in the filesystem browsers and gui dialogs.
Then we can continue to use whatever filesystem we want on the backend, the users will get easy file system searching and management, but the entire system dosent need to be reworked to handle it.
we need for every place where we don’t already have common standards something to unify and guarantee backwardscompatibility.
y-windows would be a good start on the gui side, but as of now their either stalled or just well beyond schedule.
when it comes down to to file systems the situation is a bit different. we don’t really see this problematics between ext2/3 and reiserfs/xfs as we do with the gui and ui parts gtk/qt and arts/esd.
a common design goal for filesystems, and in my opionon an adoption to binary xml/tags on files to let the system wild with all it’s meta-data ideas and storing the public key of encrypted files in the tag and so on would be a goal which has no reason to be years away.
when it comes down to database adoption in file systems I don’t see the point. ReiserFS has already walked that path and as I see they’re now walking forward, away from old ideas with dbs ?
to me a FS needs only to do well what is the purpose of an FS: store data efficiently on a block device and give the system simple means of accessing, modifying and labeling this data for future access. that is, meta-data and customtags as binary xml (or the like) would be the labeling part, and as for the rest reiser has it all.
we already see our juk/rhythmbox and other music managers to use the metadata on our music collection to do wonderful things with it. why not extend this and let the system and it’s application get the freedom of setting custom-tags on a file and folder level ?
as to point out too, I’m just a common user with a users perspective, so if I’m all wrong or misses something important that’s the reason
Reiser4 has the right idea with Plugins. DBMS is overkill for a complete file system, just tag some directorys as DBMS dirs then the Reiser4 will look at them that way. No point slowing down normal system folders for crap DBMS support. What I would like to see isn’t DBMS, but a Source Code Control type of filesystem so at any point you can go back to a pervious version of your file, even if you delete it. This will be possiable with Reiser4 if somebody writes the plugin! HD space is so cheap and product files sooo expensive to fix if somebody does something stupid its worth the space it takes.
But Reiser4 really is the future of flexable Linux Filesystems.
As I understood it, the WinFS shenanigans would be purely a database to manage actual files on the NTFS filesystem of a given directory. As an example, under Longhorn you would have your nice C: formatted with NFTS, when you try to access files in C:My Documents you would be presented with the database “front-end” that allows you to perform a database query on the files stored within the C:My Documents directory. The actual files would still be there, and should to try to access them in a standard manner (rather than using the DBquery function) such as with “legacy” applications or (God forbid) a CLI, you would still be presented with regular looking, and regular functioning files. My guess is that there would be a large database held somewhere like C:WindowsSystemMyDoc.database that actually contains the database entries (and therefore the metadata) of all the files (and “folders”) held within C:My Documents. This would skirt around the issue of the database file being an item within the database, which is an item in the database, ad nauseum. It does, however, show that it is not a filesystem function at all since it sits “on top” of the filesystem.
Personally I find the idea abhorrent, but then I generally tend to keep files stored in logical places, and name them logically too. As an example, I really don’t like it when people drop *all* their MP3 files into one place and let their music software do all the sorting for itself.
I guess I am used to manual and practical storage methodologies. Think: in an actual office with paper documents, would you just dump your files randomly and have a database programme tell you where you dumped them? Or would you have a filing cabinet with seperate drawers, and seperate “folders” within the drawers, all given sensible and appropriate names to aid in your finding?
Now, implement the two together, and that’s an idea! Sensible file management together with database query functionality so that files can be accessed and maintained efficiently – what a combination!
Now, implement the two together, and that’s an idea! Sensible file management together with database query functionality so that files can be accessed and maintained efficiently – what a combination!
Which (again) is what BeOS has had for years.
The only way to see how great BeOS made this concept work is to try one of the free BeOS downloads from http://www.bebits.com
reiser uses *a lot* cpu cycles, to much!
robustest filesystem is ext3, next comes XFS, which has nice features too.
Which fs you use is your opinion. My writing should only help a bit!
Are you talking about Reiser 3 or 4 here? Will Reiser4 be better or worse than its predecessor in this regard?
I believe the new Reiser looks rather promising. The only problem is that once you rely on its plugin facility to implement some feature, you’re locked into one file system. Relying on a single file system is not what Linux is about, and any tool that does this will have a hard time getting acceptance. Features that are common between file systems and have standard interfaces, on the other hand, are welcome to be implemented for Reiser using the plugin architecture. This makes it easy to turn features on and off, and should also make sure that Reiser4 will never be the file system to lack some widespread feature.