A New Intelligent FileSystem for Gnome

Guest post by Marcus Carlson 2004-07-21 Gnome 48 Comments

In these days there has been much fuzzing about the new browsing with files organizing themselves with the help of meta data. Maybe you ask yourself “What have this to do with the spatial browsing in gnome and how can it improve the browsing?”. That’s what I did. As I see it, the gnome people have introduced the spatial browsing so we are used to it when this new browsing is coming to town. This is very intelligent move of the gnome people and will help us adopt faster to this. This is when the spatial browsing is really making sense. I hope you see this when you’ve read this article.Recently Mac OS X 10.4 introduced a lightweight version of this in called Spotlight. This will also most probably be implemented in Microsoft’s new operating system in the Windows family, called Longhorn in 2006 if nothing makes them changes their minds.

This new way of browsing your files is closely related to a database search where each file has a number of fields describing its content. What fields this would be can I only guess. But something like this; not just the name, date (created, last modified), rights (as it is for now) but more info such as MIME-info, source (which application created it, used it, etc), content (summary of txt, pdf, sxw, doc etc) and more, you get the picture.

Best would be to integrate this data in the filesystem, like WinFS (Windows new filesystem working closely with MS SQL), but as far as I know Linux lacks this kind of filesystem for the moment (correct me if I’m wrong). One solution would be to add this in a gnome (or better a freedesktop.org) specific file, the same way as Nautilus saves where each window/folder is located on the screen. Or even better; a new filesystem using MySQL laying beside the real filesystem and having the kernel taking care of all this.

/dev/hda1 = ext2/ext3/ReiserFS etc
/dev/hda2 = Metadata FS / MySQL-FS

Well this is very interesting, but nothing I will discuss now…

My vision – the user view

Well, that’s the “technical” part, let’s go to what the user should do to get this working. Remember this should be as simple as possible. Even your old grandmother should be able to do this in just a few minutes (I think seconds is too optimistic, but preferable). I suppose its a lot of discussion going on in the gnome development team right now, but this is my proposal.

I propose a new “Home” icon on the desktop called (something like) “My organized Home” (or maybe do the opposite; Home = organized and have another icon called “Disorganized Home” 😉 ). However, I think you get my point. This folder will include the earlier created shortcuts like the ones above. This way there is no need to try to have a organized hieratic filesystem, which take time. You just create the shortcuts (read “virtual folders”) you would like to have and then you don’t need to bother get your files organized, instead you get more time with yourself and your girlfriend ;-).

Anyway, this way we’ll have folders like “bring me last weeks downloaded (from Firefox and Firebird) pdf, svx, ps and doc files which includes the word ‘report'” as a link on the desktop (or where you’d like to have them) called “Downloaded Reports”.

And for all the geeks who must have the tree structure (such as me); it’s still there! Sure, this will be confusing having two “homes” so you have to choose which one to use.

With this there’s no need to create folders like “Documents”, “Movies”, “Music” etc (as proposed by freedesktop.org) like every program have to adopt to. Sure, this is a good idea, but I think until all programs use the appropriate folder, this is a really good solution and even after that because this is far more flexible.

And this will make more sense together with gnomes new save dialog, you don’t have to bother about where the file ends up. Just save it and it’s already at the right place!

Of course it could take a while to get there, but I think this is the only way to get the average user to find their stuff and at the same time have them heavily organized.

Conclusion

A lot of questions still remain. But I think it is important to get the discussion really going. I know this is not only the gnome teams problem, but they are heavily involved in this as I see it. Because this it what we see in the end. I think it’s important to get all people of the whole Linux community going with this to get it as good as possible. Because this is the way we will browse in the near future.

My tips

Until this come true my tips is to improve the spatial browsing even more. Here are my simple changes which has made it simpler for me to use the spatial browsing.

Change Nautilus click behaviour to single click. Why double click when you can single click?! Together with the middle click you won’t get a lot of windows open and doesn’t have to click a lot to get to the folder you want.

I don’t know how many of you out there that makes use of the emblems in Nautilus, but that is very good way to find your files and folders even faster in a window, then you don’t have to read each name; just look at the icons and click. My believe is that is goes faster. But that could be just me 🙂

About the author:
I recently get in to the Linux community and had only been using gnome 2.4 for a couple of month when 2.6 was released. As of everybody else I was first sceptic to the spatial browsing. But now I just love it!

48 Comments

2004-07-21 6:42 pm
Anonymous
One of the biggest issues is who will input this meta data. Lets take a very simple case today. That being mp3s. Mp3s have tags inside the file which include useful information like artist, genre, title…
Unfortunately, a lot of mp3s do not have those appropriately filled in. Winamp5 does a nice job of potentially trying to guess those fields from the file name. Like if the file name is “Joe Jackson – My good song.mp3” Winamp can extrapolate the appropriate artist, song name.
I’ve recently switched complately to the media library concept. I don’t do any of kind of file renaming or organizing files in directories. I just winamp handle it all. Sadly, it really lacks in terms of user input. I have to select a file, choose to edit the file info, and then type in any needed fields. It would be nice to have a right click off the menu (change artist, chage genre)…or even edit it right in the media library list.
All the backend stuff really amounts to the same thing. A big database. Its this gui aspect that will make or break its success. For now, I’m just putting up with winamps.
2004-07-21 6:45 pm
Anonymous
I’m a big fan of meta data and the power it could provide users with in organizing themselves. But I’m always put off by having to enter a lot of it manually. I know that a lot of people wouldn’t mind adding metadata to their photo collections and such, but as a graphic artist, I keep so many stock images, textures etc. around that it simply isn’t feasable to do this manually for each file.
In essence there needs to be a way to easily apply meta data sets, or individual metadata components to multiple files in one go. Adobe tries something like this in their new album software but the interface is too clunky and slow to make it work nicely.
I’m thinking of perhaps a special side pane that holds meta attributes in a navigable interface, in the form of icons or somesuch that you could drag a file, or selection of files onto to give them the desired metadata attribute or set of attributes.
Any thoughts?
2004-07-21 6:49 pm
Anonymous
XFS has always extended attributes, and I believe JFS and ext3 have added them. I’m not sure they’re quite as robust as you’re thinking about, and I don’t think they’re easily searched.
2004-07-21 7:00 pm
Anonymous
Let me see if I remember some things
Metadata…
Extended Attibutes as part of the filesystem…
Pseudo directories as the results of a databaselike query…
Why does this all sound so familiar?
Oh yes, because I was a BeOS users for years.
Seriously, I would love to see this sort of thing in Linux, including the kernel level support needed for thigns like extended attributes and all the fun things you can do with them. It has always been one of the biggest things that I have missed since I stopped using Be.
2004-07-21 7:16 pm
Anonymous
true, this was in BeOS back in 1997. Nice that metadata concepts are coming back.
Check out this:
http://www.nat.org/dashboard/ it’s for gnome. Pretty promising.
I talk for metadata kind of because I just sorted out 5 gigabytes of my own files, i wouldn’t mind that i need to enter some metadata myself, as long as its somehow searchable
2004-07-21 7:19 pm
Anonymous
how about presets that you can save and apply to a file? then you can apply metadata for all files in one project by creating a project metadata file and when you save a file select that metadata file.
2004-07-21 7:20 pm
Anonymous
Suggestion – look at Beagle, Dashboard (the parent/predecessor of beagle) and Storage in gnome cvs and then come back and write another article – they’re basically what your talking about. Storage is exactly it (using postgreSQL instead of mySQL), Beagle is similar if only lacking a natural language search option and a gnomevfs conduit.
2004-07-21 7:27 pm
Anonymous
WINFS is nothing more than NTFS with MS SQL database on top of it. Apple’s new spotlight is closer to BFS but not quite it’s power. Apple uses a metadata approach. MSFTis going to use a full database on top of a file system. You will still have to defrag it, you will still have to sort out and setup your data.
It won’t be the computer picks what data it wants, WinFS will store data where it wants you to store data. So if you know the hard wired location you can bypass winfs, thus infecting the system without a way to find the infected files. With luck MSFT will make this hard to do, but MSFT and security don’t go well together.
Apple’s system does hide files from users, but they are still there you just have to know how to look for them.
MSFT’s system I am waiting for, as speed most likely won’t be there.
2004-07-21 7:33 pm
Anonymous
I want some hard facts on the matter. It’s too bad that two groups are doing different things.
Also hasn’t storage been worked on forever? On the other hand Beagle not too long at all?
2004-07-21 7:43 pm
Anonymous
I’ve also tried winamp media library but it’s not good enough.
I can suggest iTunes. I allows “in place editing” of metadata. And It also supports multiple file editing in a single dialog. (Additional benefits, like mp3 ripping and any speed writing are bonus!)
But iTunes lacks “metadata guessing” function. Use another mp3 tag tool for that (there are plenty of OSS/freeware/shareware utilities).
2004-07-21 7:46 pm
Anonymous
Hi
Come on. didnt you read the detailed blog about storage by seth nickell recently?
2004-07-21 7:56 pm
Anonymous
I still don’t think he answered his own question: “What have this to do with the spatial browsing in gnome”? And besides that, what does it have to do with GNOME at all? I mean, shouldn’t meta-data be desktop-indepent so that you can also use your Gnome files from KDE and vice-versa? Shouldn’t you be able to use the metadata-fs from the command line?
2004-07-21 8:07 pm
Anonymous
These flat filesystem/meta-data ideas are application domain specific, and it should remain that way. Trying to stretch it to fit all problem domains just doesn’t make sense.
The computer is never going to be able to categorize data for you. It’s a dumb adding machine; no amount of logical suger coating is going to change that.
2004-07-21 8:28 pm
Anonymous
The computer is never going to be able to categorize data for you. It’s a dumb adding machine; no amount of logical suger coating is going to change that.
Yes and no, they are very stupid, that’s true. However, there has to be some method of statistical analysis combined with user feedback that could implement machine learning.
Also, I would agree with you that it doesn’t make sense to metatag everything, as there isn’t a complete solution integrating the interface with applications.
2004-07-21 8:36 pm
Anonymous
You should read about storage:
Overview:
http://www.gnome.org/~seth/storage/features.html
Bit more technical:
http://www.gnome.org/~seth/storage/technical.html
Human-Computer Interaction Discourse:
http://www.gnome.org/~seth/storage/associative-interfaces.pdf
2004-07-21 8:45 pm
Anonymous
“The computer is never going to be able to categorize data for you. It’s a dumb adding machine; no amount of logical suger coating is going to change that.”
There is another usage of metadata in BeOS besides music collections, e-mail handling, bookmark handling and other metadata-ready-by-nature sorts of data.
Namely, filetype-ing.
Metadata contains mime-type, preferred individual app/handler signature (if no individial handler – system wide for given MIME will be used) etc etc.
Also this allows unlimited flexibility in access permission controls. It seems that SkyOS is starting to use that in its fresh multiuser-implementation
2004-07-21 8:47 pm
Anonymous
A little OT, but I’m sure some of you will find this info useful. If you want to get automatic meta data info into your mp3 files (no entering it, no guessing by file name) try MusicBrainz Tagger (sorry Windows only, they have an OSX version too) http://www.musicbrainz.org/tagger/download.html. It creates an audio siginute (for mp3, ogg, acc files) and then comparess it with its database.
Out of my 550 mp3s I’ve ripped and acumulated over the years, it correctly indentified like 520 of them, the other 30 were like entriess that had a low similarity yield, or were close to two diffrent songs. It’s super useful.
2004-07-21 8:55 pm
Anonymous
It’s really good to see people activle discussing this. Before microsoft announced WinFS I was checking out documentation on BeOS and found out about the storage system. Instantly it was a solution to all my problems of wanting ot store files in more than one directory. I’m working slowly on my own solution for this and hopefully I can get it to work in Linux and Mac because I’m using C# and .NET. Regardless, I’m glad people are looking for a solution to the problem of managing files/data on desktops.
2004-07-21 9:55 pm
Anonymous
” A little OT, but I’m sure some of you will find this info useful. If you want to get automatic meta data info into your mp3 files (no entering it, no guessing by file name) try MusicBrainz Tagger (sorry Windows only, they have an OSX version too) http://www.musicbrainz.org/tagger/download.html“
both rhythmbox and juk already use libmusicbrainz to extract the info. very useful
2004-07-21 9:56 pm
Anonymous
Sorry, I don’t get it!
Just hours after yet another post about how to fix nautilus… Don’t get me wrong, spatial could be useful to some people but:
1) I am not convinced it should be the default – debatable
2) There should be an option to switch it off – I know it is in cvs/2.8 whatever, what scares me is the fact that it wasn’t there in the first place! Anyone doing proper usability studies would have thought, yeah maybe the user who digs in the menu should be able to present things any way he likes.
Metadata is great but it has *nothing* to do with spatial navigation.
That thing belongs in the vfs layer, how you get to it is another matter entirely!
2004-07-21 9:58 pm
Anonymous
If your interested the project is on track for some years now:
http://members.cox.net/sinzui/medusa/
afaik winfs started after medusa, it would be fun to have a win32 implementation of it before longhorn.
2004-07-21 10:06 pm
Anonymous
I see a lot of talk about this subject in the open source community but see less action. When we will get to see these projects (storage, beagle etc.) coming out of cvs and into a stable GNOME ?
Not in my lifetime, I think…
2004-07-21 10:08 pm
Anonymous
Check out http://www.musicbrainz.org
It will automatically tag mp3s with correct meta data using music recognition software. It doesn’t always get it right and sometimes you have to manually select and mp3 from their database. However, I find that in probably >90% it will find the correct tags.
2004-07-21 10:37 pm
Anonymous
“Check out http://www.musicbrainz.org ”
both rhythmbox and juk in linux already uses this
2004-07-21 10:47 pm
Anonymous
He didn’t really answer, but I understand where he’s going.
It’s just in looks only. I’d counter that just because it is usually a single pane view of your files doesn’t make it spatial. Just single pane.
In WinFS and Storage, the many screenshots I’ve seen are spatial in appearance (but not practice).
Here’s a white paper with screenshots of WinFS and Aero.
http://msdn.microsoft.com/Longhorn/understanding/ux/default.aspx?pu…
It uses the Outlook 97-style semi-spatial UI with favorites pane on the left and files on the right that Apple also uses in OS X’s Finder.
If you go up a level in the top breadcrumbs bar, there are plenty of articles with screenshots that reinforce the “spatial-like” view.
You can’t really browse dynamic folders/stacks/ saved queries. They aren’t heirarchical and don’t really lend themselves to that structure. The way you’d work with it would be in a single pane window, shallow directories with many files in each. Think iTunes with 1 massive root library and the playlists are just abstracted folders. If you delete it from the folder, you disassociate it from that keyword or sort query but not physically delete it. Folders are like “smart playlists” in iTunes, WMP9, MC10 and other jukeboxes. I can see working in multiple windows ala spatial. If you drag a file from one window to another you’d be associating that keyword/ search attribute to the file.
On the other hand they definitely aren’t spatial. Spatial reinforces the view that “this icon” is “this file.” A folder has a location and a size and they open the same way that you left it. There’s no reason why it can’t open these saved searches in a similar manner, but since everything is abstracted you’ve lost the whole reasoning behind spatial.
As far as the general trend in this convo toward a metadata system.
I’ve been ripping CDs for a while now, not to mention my old cassette and vinyl collections I haven’t started. I have to keep myself disciplined to keep the metadata up to speed.
On the other hand, I can’t see the average non-anal non-music freak doing this.
I do think there are plans to categorize docs & pdfs based on content. Photos may be categorized based on date, size, aperture, etc. Songs and contacts have inherent properties that can be metadata. The push will be for applications to provide as much automatically generated metadata as possible. Integrated face scanning software to group photos for instance. You keyword one picture “dad” and it will recognize most occurences.
In any case, it will always be a lot of work. I have a few thousand CDs, records and tapes. I have a large library of books. My bills need organizing. Pictures are scattered all over the house. It takes work to keep everything organized. If you look at someone’s house or garage, it’s probably as organized as their PC is and vice versa.
2004-07-22 1:41 am
Anonymous
Moderated -1: Redundant
2004-07-22 1:55 am
Anonymous
I read the article, and the solution *could* be this: a small root using a traditional file system just to get the system up and running, then from there we mount a file system which the whole system resides.
Use something like Firebird SQL where by the whole database resides on a raw partition, that is, it is one big file located on a partition without any filesystem; you can do similar with Oracle meaning you get the raw speed without the file system overhead, and better yet, Linux already supports raw partitions.
You will get all the perks of a database without the nasty overhead of having layers of crap building up.
2004-07-22 1:55 am
Anonymous
“Best would be to integrate this data in the filesystem, like WinFS (Windows new filesystem working closely with MS SQL), but as far as I know Linux lacks this kind of filesystem for the moment (correct me if I’m wrong).”
Reiser4 ( http://namesys.com/ ) handles metadata. Specifically, every file can also be treated as a folder, with files inside this “folder” containing metadata. that is, you can open a file /home/user/blah normally, but then /home/user/blah/author will contain the author, /home/user/blah/project will say what project it’s associated with, etc.
Of course, none of this can be extensively used unless Reiser4 is adopted universally, which is unlikely to ever happen.
2004-07-22 2:52 am
Anonymous
And I wonder why when I visit Paris and ask, “Parlez-vous l’anglais?” they always shoot me a dirty look. It’s because assholes online constantly yell at article writers for not speaking English. And so when I go to Paris, they yell at me for not speaking French.
Give the guy a break.
Besides, it’s not his job to edit his article. It’s uhm, the editor’s job. By definition, editors… edit…
I don’t mind reading articles written by people who don’t speak/write English too well as long as the content is good.
Not that the content of this article is so inspiring. This work is being done by the Storage/Beagle/Dashboard projects, as everyone has been pointing out. I only wish I had the time and knowledge to contribute to those projects, to get the work done faster. But I’m only a young computer hobbyist, not one of those “professional” 17-year-old Gnome hackers (did you know the writer of Muine is only 17? Sheesh. I feel old at 19, already past my prime… )
Bottom line, respond to the article content, not the way it was written.
2004-07-22 2:53 am
Anonymous
Learn Manners
2004-07-22 3:18 am
Anonymous
Look, this is a tech community. We deal in hard facts and strict language so that we’re clear on what we’re talking about. It’s either the fault of the author for not at least having someone proofread, or it’s the editor’s fault for not doing it themselves, but in any case, the article sounds very silly, and it’s hard not to laugh while reading it.
2004-07-22 3:47 am
Anonymous
when foreigners adapt English to suit their deficiencies in it. It is always good for a laugh. Having gone through the language acquisition process once, and now for a second time, I have to say that anyone who puts forth the effort to learn another language is entitled to their mistakes, and those mistakes are best played for a communal “you said something weird, and didn’t even know it” guffaw.
That said, one thing I took away from this “press release” is that I finally am beginning to understand the reason people cite GNOME for bloat. Let’s hope for Gnome 3, (where breaking of binary, and source compatibility is acceptable,) that the GNOME devs move to a more unified programming base. Let’s not have the *NIX equivalent of Windows’s million programming environments. Keep it simple, stupid.
2004-07-22 5:16 am
Anonymous
that there are too many solutions coming? It appears there are three specific to gnome if I’m reading this right. One with reiserfs and I’m guessing at least one for KDE.
For a good article someone could write on all the linux winfs equivelants. Status, technology, pro’s, cons’s, etc. That would be an interesting read.
As for the author and English I’m dissapointed to see people bashing. Obviously the author felt the topic was important enought that he went through the effort to write this story in what is obviously not his native tongue, making it much more difficult.
2004-07-22 6:10 am
Anonymous
It seems a significant number of people feel that it’s better to have the metadata in the filesystem itself rather than in an index built on top of the filesystem as Longhorn and Tiger will do.
Can someone who believes this please explain to me why this is so? What advantage is there in having the metadata being in the filesystem versus in a layer on top of the filesystem which itself provides a file access API?
2004-07-22 6:36 am
Anonymous
It’s a good start. However I believe we could collect more metadata without much active user interaction.
My ‘vision’ would be that applications and a meta-aware filesystem are able to store a common sense of metadata.
Imagine you create a new picture image called a.jpg. Now for some reason you need that same picture but only rotated 90 deg and you save it as b.jpg.
The application could now record the changes you made from file a to b, applications allready keep that records to provide some kind of ‘undo’/’redo’ feature, so why not save the changes made to that picture associated with that file?
Not only could a versioning system be provided (like cvs or subversion), because the filesystem would record changes between files and also related files (like in this example file a und file b are related/siblings), but you could also store the action you took (in this example: rotate 90 deg, no other changes).
If you would like to know, which files have the same content, you could find the files a.jpg und b.jpg. The same query would not be possible today without a huge overhead.
Similiar methods could be applied to other applications, like word processor etc.
2004-07-22 6:47 am
Anonymous
I think (and this is just my opinion) that it is a matter of both elegance and technical merit.
The “cram it into whatever-vfs” approach seems to be to be a hack. People can’t or don’t want to put that work into the operating system (the GNOME and KDE acronyms are missing an L-for-Linux for a reason 🙂 so they put it in their very own vfs layer.
If some data describe an object they should be attached to that object. Not written on sticky notes and tacked on the side of a monitor or scribbled on a desk blotter.
Additionally, by storing the metadata in the filesystem, approaches can be taken to ensure that performance is not affected. All UNIX(-like) file systems (that I am aware of) store the metadata for an object (it’s permissions, owner, group, [acm]times, etc) near the data (in the inode) and trys to allocate blocks for the data as near to the inode as possible. By storing metadata in an external store you leave the system open to the possibility of have to seek between ends of the disc very often. This wouldn’t be much of a problem (even without caching) due to the fairly advanced disc scheduling algorithms in use today, but it may be. Also it introduces a single point of failure. One file gets deleted or corrupted and there goes the tags on my umpteen MP3 files and the titles and authors of my n-thousand document PDF archive.
Putting it in the application VFS layer is a hack for those who have no alternative. As a prototype it can be useful, as a workaround for difficult situations (like portability) it can be helpful, as an elegant feature, it lacks sorely.
Again, this is just my opinion.
2004-07-22 7:06 am
Anonymous
Sorry for the double post…
If you would like to know, which files have the same content, you could find the files a.jpg und b.jpg. The same query would not be possible today without a huge overhead.
Assuming, of course, that you used only those applications that were aware of these special features in the operating system. And that you didn’t need to access your data from another environment (such as over a network file system).
You would not alway be able to make the determination that files were similar. Take the example of a peer-to-peer application:
1) You start the application.
2) Somebody requests a download from you (for a.jpg).
3) The application opens a.jpg, and starts sending it.
4) You begin to download b.jpg from another user.
5) The application creates and opens b.jpg and writes the download to it.
The situation of data production and consumption in a pipe line poses similar problems.
Any scheme that doesn’t have the huge overhead you mention would make mistakes, unless they relied on the applications stating specifically that b.jpg was derived from a.jpg (whoch they would have to if saving the steps along the way), and then it would be a lot of effort to modify existing applications to take advantage of it.
Addition, this would take a LOT of hard disc space. We would need to vacuum our discs to reclaim old change data instead of/as well as defragmenting, etc.
This feature could be done (in a dodgy hack manner) using the alternate streams mechanism (or whatever it is called) in NTFS. Just save the “how we got here” and “original” data in alternate streams of b.jpg.
Other problems include the case/s with multiple sources, as in a compilation of images (collage, etc)? What about temporary sources (an image off the web likely as not is copied from the browser cache, not the Internet).
This sort of scheme always sounds nice but I don’t think that people will use it. As Seth describes in some of the storage documents, people don’t even use directories properly, why would they use more advanced schemes?
I personally believe that search will (hopefully) start to address this. By combining an explicit query interface (like Storage) with an unintrusive implicit interface (like Dashboard) and a search system sophisticated enough to do automated clustering, and maybe a little learning, we can almost remove the neccesity for manual organisation, though I readily admit that this would suffer almost all of the things I said about your vision. Consistency is the sign of a small mind. 🙂
2004-07-22 7:30 am
Anonymous
Assuming, of course, that you used only those applications that were aware of these special features in the operating system. And that you didn’t need to access your data from another environment (such as over a network file system).
Yes, it wouldn’t be possible without the interaction between applications and filesystem.
In theory it would be possible to transfer the associated stored metadata of a file over a network, e.g. transformed into an ‘XML-attachment’. But of course the counterpart would need a filesystem and applications that are also capable of using this informationen. If you think this is far fetched then think of something similar to an MP3-ID tag, this is also a form of standardized metadata which can be transfered over different kind of networks or computer architectures (x86, powerpc, sparc etc.).
Of course applications would have to be modified to use these features.
The additional needed space is IMHO not that much of a big concern. Not only get disk bigger and bigger and that trend isn’t about to change or to go in a reverse way. You could save much space by doing diffs between file versions (like cvs or subversion) and delete the oldest file and only keep the latest version as a full set.
However I agree with you that it will need more ressources (disk space, cpu time etc.) in total than todays FS. So storing temp files on such a filesystem wouldn’t be a good idea, but afterall it’s just a dump vision of mine (for now) 😉
2004-07-22 8:53 am
Anonymous
I just wrote an smaller article on my blog with the same concerns. It’s been a while since I’ve been thinking about it. And I hope someday things change in these directions I lay out. After all there aren’t so many news ideas. But it’s a matter of hard work.
2004-07-22 9:04 am
Anonymous
My comment about networks was with respect to existing protocols that don’t support (or whos usual configurations don’t support) the type of metadata you are describing. For example the NFS’, SMB, CIFS, AFS, etc. I’m sure that the Samba people (and MS, Apple, and everyone else) would not take kindly to being asked to rewrite things :-).
Protocols like WebDAV and FTP (more the way they are used than the protocols themselves) would stop working properly (in some ways).
The other point that can be made is standardised metadata is, IMHO, a bit of an oxymoron. Using something like the Dublin Core definitions will be overkill (in full capitals, using the blink, underline, bold AND italic tags) for almost all users. Similarly the semantic web stuff would probably not be suitable (due to the dependance on XML).
All one really needs is an extended attributes facility (which all real/useful/non-toy file systems have) and some useful list of standard attribute names and meanings. This would then make it an i18n nightmare and still not solve the searchability problem.
2004-07-22 10:00 am
Anonymous
Where these things (meta data or DB backed FS) would be most useful is in corporate environment where the number of documents is just massive. But as someone pointed out, the problem is to get good meta data. It doesn’t help to know what application created what document. It doesn’t tell whether a word or oo.org text document is a letter, an invoice, a specification, a marketing report and so on.
Any technical solution will have to do something : implement discipline amongst users. Make it impossible to save an undescribed document sounds harsh but I don’t see how else the mess I see where I work would be cleared. It wouldn’t take long before you end up with thousands of spreadsheet, presentation and text files with “miscellanious” as a description. Imagine that in a flat directory structure !
The best is probably to offer configurable schemes, offering coherent description framework appropriate for various contexts like Corporate, Individual, Consulting etc…. each containing a minimum set of mandatory categories.
I’ve not seen network transparency mentionned anywhere but that’s an essential requirement.
That being said, I don’t believe much in the whole concept. Solutions to this problem exist (documentation databases and so on) and the only context I have seen it used successfully is for critical documentation like contracts, invoices, commercail proposals, and generally speaking, externally binding documents. And it only works if dedicated staff is in charge of the process.
As an individual user, I’ve never had any problem keeping track of anything, although I can understand that people producing material (media or other) could have this problem.
2004-07-22 1:49 pm
Anonymous
Like so many have said before, BeFS did this years ago. Now Reiser4 is doing it. but if you have a Gmail account, you will see it there too.
Gmail don’t allow you to create folders for grouping or categorizing your mails. that is old and you sometimes get mail that can be in several folders. what do you do then? so Gmail allows you to crate labels, like attributes. you can select a number of emails and apply the same labels to it. or while reading the mail, apply a label to it.
Your labels appear like folder names. you select a label and it shows all the mails with that label. or you can select a “all mails” label which you don’t apply, which shows all the mail. the best part, is that you can apply several labels to the same mail. hence y i use my Gmail account.
I have labels like FYI, Account Info., Personal, Business, OS Dev., Tutorials, and Reference. Just about every email i get gets tagged with one or more of those labels. If those were folders as in other email programs or on my computer, who would I classified some of the things which fall into more than one? i would have to duplicate one or more of the sub-categories and that nonsense.
2004-07-22 5:47 pm
Anonymous
Where these things (meta data or DB backed FS) would be most useful is in corporate environment where the number of documents is just massive. But as someone pointed out, the problem is to get good meta data. It doesn’t help to know what application created what document. It doesn’t tell whether a word or oo.org text document is a letter, an invoice, a specification, a marketing report and so on.
That’s true, but what the other one said is, that you could generate this sort of metadata without the need to bother the user with it. And already with the amount of data someone good gather from the system itself would be very benificial for local search queries.
We all know that userers are lazy (well, at least myself is 🙂 ) and that the don’t want to bother with creating metadata. Surely they don’t want to be forced to do so. A good example that I always bring up was a older version of MS Word. MS forced the user at every saved file to enter more data about the document, very basic stuff like “author” “topic” etc. The had huge negative responses from there user so they removed that in the next fixup or whatever it was and changed it back to optionally.
To get metadata from casual users you have to collect it in a very low profile way. Because in my experience, and I’m working in that area, ordinary users don’t see the benefits of this overhead. It’s very very hard to get through to them. A totally different thing are “educated” users, e.g. librariens. They know how crucial good metadata is and therefore are willing to invest more effort in a document in order to benefit from it later on.
The best is probably to offer configurable schemes, offering coherent description framework appropriate for various contexts like Corporate, Individual, Consulting etc…. each containing a minimum set of mandatory categories.
A really nice approach of that kind is X2U (spotted it a while ago at freshmeat).
2004-07-22 6:28 pm
Anonymous
Are we going to reinvent the wheel again because one small OS which still happily exist has had this for years?
Well at least we can say when someone copies you you’re probably havin’ good stuff copied : )
2004-07-22 7:24 pm
Anonymous
Without conceptual processing, no system of metadata can be sufficient to handle human knowledge processing needs.
And very few people in AI research are working on conceptual processing these days. It’s the core issue in AI and it’s the hardest issue in AI, so everyone is dodging it and working on “easier” things like machine vision, machine learning (which is also hopeless without conceptual processing) and the like.
Metadata of any kind is just a poor substitute until a good conceptual processing simulation is available. The same applies to OOP in programming – it’s just a poor substitute for REAL system engineering tools.
As for Longhorn, you look at the hardware requirements for this thing – it’s just not going to fly with most people because it will require an expensive hardware upgrade. They’re talking a GB of RAM and a 3GHz CPU just to run the OS – with no applications running.
Adoption of this OS will be even slower than Windows XP was. A significant percentage (a third or more) of corporations are still running 98 and 2000 and have never upgraded to XP. Are they going to spend another couple grand per PC to upgrade to Longhorn? I don’t think so.
The only way metadata can be useful today is if the system is able to read and interpret data being entered while a file is being created and automatically include that data in the metadata store. If I write a document, the system should be able to read that document and be able to find it based on any data in that document.
If it is an image, it’s hopeless – no computer can read an image, so I have to enter the data.
Also, the content of the metadata must be variable – I should be able to specify the meaning of every field, not just have a few standard fields to fill in.
Basically, the best way would be to have one huge comment field which the system reads and then can find the file based on any content in that comment field. At least for music and image files, this would work. I could pop up a dialog and enter several sentences about the image or music which would include names, dates, places, genres, anything at all that has meaning to me. The system would automatically index that image based on ALL the content in the field (leaving out “noise” words like “the”).
But it would be better if the system could retrieve that info from wherever I GOT the file from. If I download an MP3 from a Web site, that Web site has already identified the MP3 – the artist, where it was made, when, etc. The system should be able to retrieve that information automatically as I download the MP3 – then all I need to do is add any additional metadata that is important to me that the original metadata omits.
2004-07-22 9:32 pm
Anonymous
If it is an image, it’s hopeless – no computer can read an image, so I have to enter the data.
This is resolved by Imgseek : http://imgseek.sourceforge.net
Basically you draw a draft of an image (eg. the eiftel tower) and get a list of pictures with this pattern.
It s not yet at 1 release while performing well enough for basic searches.
2004-07-23 4:44 pm
Anonymous
http://channel9.msdn.com/ShowPost.aspx?PostID=14275#14275
Watch the video demo. Eugenia linked up a shorter video on C|Net a couple of weeks ago. This is much more in depth and a lot of fascinating stuff is in there.
The software recognizes all videos and pics with faces in them (toward the last minute). Right now casinos here in vegas run facial recognition software to detect crooks, cheats and counters. This stuff filters down quickly. High end 2 years ago is today’s budget.
The imgSeek looks like quite a few steps in the right direction too. So you can pick a pic and say basically “find all similar.” Then keyword them or assign them to a temporary “group.”
2004-07-24 8:44 am
Anonymous
What make sense with this is that harddrives begin to be very larges.
1 terabyte disk costs nearly 1000$ now. We could expect that on the standard desktop for mid 2006 or so. Will be hard to keep track of all the files stored on a terabyte disk.
That’s why Microsoft try to anticipate with WinFS. And we, *nix users, can say we will face the same problems for our desktop usability.
Btw, MySQL is a good database for servers, but need administration, and isn’t licence compatible with every free solutions. I think sqlite (wich is pure public domain) is a better solution: no server, no administration, and absolutely compatible licence with BSD, GPL, LGPL, MIT and so. This is why I think it’s a better candidate for freedesktop.org adoption.
SQLite is also very very fast. The only drawback is that it lacks a tcp/ip server technologie. But on desktop, it’s rather an avantage since this reduce the risks of components failure.