This article offers feature suggestions to budding OS developers looking for that neat edge.Wouldn’t it be real nice if FILE *f = fopen("ftp://www.microsoft.com/mission/statement.html","r");
just worked?
People have been dreaming of ‘mounting’ remote filesystems on demand for a long time. It seems to be a popular pastime for architecture astronauts. Despite Joel’s warnings to run from network transparency, I vote that you don’t!
Client-side libraries allow a program to access remote resources in the same way as local ones, for example the excellent libferris.
A new operating system that integrated such handling into the platform level (rather than an additional, optional library) would have the advantage that each and every application could access the same resources. The ‘ls’ in the ported bash prompt would be able to list the contents of an FTP directory, and the notepad clone would load your text files whether they were local, on some window server, or the other side of the internet.
Users work with and are familiar with URIs, so URIs are the natural way of expressing a file name and location. The filesystem ought to work with URIs.
Imagine the following snippet of fictious commandline:
> pwd file:///home/me/ > cd http://www.microsoft.com > cd mission > ls . .. statement.html > rm mission.html access denied.
The use of URIs introduces what I term the ‘Multi-Root Filesystem‘. The protocol, e.g. “file”, “smb” or “webdav”, is a root. All protocols are peers of each other, and you can’t navigate between them in relative paths (i.e. no “file://home/me/../../../http://remote”).
Protocol handlers are global to the user or system. The file structure might be made upon demand, with the contents of “http://” reading like recent internet history; whereas “file://” might countain the local UNIX-style root “/” which contains in-turn “home” and “bin” etc.
Provide your system with a new protocol handler and suddenly all applications can use files available via that mechanism.
Obviously opening a file over ftp is possibly a million miles more complicated than opening a local file. When allowing the same API to be used to access both local and remote file-like resources, things that rarely go wrong on local operations (such as takes a long time, might fail) happen a lot more often. That the average programmer never checks that things fail/succeed and always putting IO into the UI thread is just plain bad, for both local and remote resources. It would have to be thought about. Massively multi-threaded message-passing operating systems might have the edge in this respect.
A unified, transparent network access for many protocols does not remove the need for dedicated protocol handling libraries for specialist programs. But it does make the average program suddenly much more powerful and useful to the average user!
It is worth mentioning an additional feature for the interested OS developer to research:
auto-mounting archives and encrypted files transparently, e.g. “sftp://www.mycom.org/mail/archives/2004-07.zip.pgp/get rich quick.msg”.
If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.
This is already possible with various virtual file system layers.
I think the way Plan 9 or Inferno does this is better. I mount Bell Labs FTP server using the ftpfs command to /mnt/ftp. All the files of the FTP server show up as local files. Reading them pulls them from the FTP file server. If I were to write to or create a new file in the /mnt/ftp directory and I had permission to do so, it would appear on the FTP server.
No additional API is needed to do this if you have userland file servers. You don’t need to worry whether a protocol is supported or not. If the file server exists and the user mounts it, it can be used just like real files.
This is already available in KDE, through kioslave
fish://user@host/home/user/doc.txt can easily be used, and it works for the user too, when he wants to open/save a file
He’s talking about making it a feature on the OS-level (or, perhaps, API-level), and thus making it an option for all developers, not just those working in KDE (or GNOME).
Linux Userland File System is one such layer that already has support for FTP, SFTP, gnome vfs, and even a filesharing program.
http://lufs.sourceforge.net/lufs/usage.html
$f = fopen(“ftp://www.microsoft.com/mission/statement.html“,”r”);
works in PHP.
Filesystem in Userspace:
http://sourceforge.net/projects/avf
plus
KIO Fuse Gateway:
http://wiki.kde.org/tiki-index.php?page=KIO+Fuse+Gateway
It mounts KDE’s ioslaves and makes them this way available to all linux apps.
Yes, you can do it with user space libraries, but a lot of program won’t use them or will use different ones, and it is another layer on top of the OS which already has a VFS system.
I think the article is too simplistic. With the file utils we have :
(nicolas@Thor2, 1) ~ $ mkdir http:
(nicolas@Thor2, 2) ~ $ mkdir http://www.google.com
(nicolas@Thor2, 3) ~ $ cd http://www.google.com
(nicolas@Thor2, 4) ~/http:/www.google.com $
So if you want to support these kind of names, you have to change basically everything.
LUFS is some kind of user space / kernel space mix which does not work very well and can introduce several bugs and security holes. I don’t like it, I think it’s not the right way to solve the problem which is deeper than something we can do with a kernel module.
I agree with the comment which says that Plan9 does it very well. But I thinks Plan9 has other drawbacks. Basically, Plan9 does not seem to be source compatible with GNU/Linux.
So there is another OS which does it well : GNU/Hurd (for the user it works basically as explained for Plan9, if we stay in the VFS part), and one of it’s implicit goals is to be source compatible with GNU/Linux (the distribution is Debian, the libc is GNU’s, currently even drivers are Linux’s). So every program which runs on GNU/Linux could take advantage of this kind of virtual file system.
VMS did this years ago over DECnet. VMS always had a separate ‘root’ for each file system. It had its advantages and disadvantages, certainly. To get to the system stuff you just typed
$set def sys$system
To get back to your own directory, you typed
$set def sys$login
The symbols, roughly equivilant to / and ~, (though there were hundreds of others) held the file system together for convenience.
The actual command to go to a specific device (be it disk or network directory) was
$set def filesystem:[directory.subdirectory]
or if you wanted to see the root of that disk, it was
$set def filesystem:[000000]
The syntax was sometimes cumbersome, but it allowed things like this:
$set def remotemachine::[directory]
To copy a file from a remote machine *without having to mount anything* you could type
$copy remotemachine::filesystem:[directory]filename.type;version *.*;* – this would copy the remote file to whatever directory you were in, and preserve the name, type, and version number. The remote machine’s decnet server would decide if you had authorization to touch that filesystem, and would usually be configured to have a default file system, so you could omit the filesystem part of the path.
Just some observations. It’s been done.
cat /lib/HTML/Form/new/(
forum_name: “kig”,
forum_header: “lazyfs”,
forum_text: “please make something that mounts the URIs on demand, instead of requiring explicit mounting”
)/submit > /net/http/com/osnews/comment.php/news_id/7907/forms/osnewsform
The Hurd does exactly what you are describing.
Well, I had this idea too.
A cool adon would be pipeing of protocols:
http:tar://www.host.org/file.tar.gz
Or to use PF_UNIX/PF_LOCAL for local webservers/local webinterfaces:
local:http://path/to/special/file
(hmm, maybe this should work with file:http://…?)
However, pipeing would be good.
…is that no sane admin would let you list the content of directories on his web server, so no ls http://www.microsoft.com for you, pal!
The problem with anythin that looks like a local file is that it will be treated like a local file.
Almost all software currently in use assumes that it can read from a local file right away.
Any remote file implementation would need to hide the asynchronousity from the applications, thus effectively blocking the IO calls until they can be fullfilled properly.
Virutal file system layers allow applications to handle the remote connection differently, for example by providing a progress indicator and still be available for user input during transfer.
>…is that no sane admin would let you list the content of >directories on his web server, so no ls >http://www.microsoft.com for you, pal!
Some while ago I made an lswww command which worked fine on http://www.microsoft.com, it just needs to find local hrefs within the index.html, it worked for most JavaScript constructed URIs as well.
Some while ago I made an lswww command which worked fine on http://www.microsoft.com, it just needs to find local hrefs within the index.html, it worked for most JavaScript constructed URIs as well.
Do you really want the kernel to interpret HTML?
But seriously, most common internet file transfer protocols don’t provide enough basics filesystem primitives to make them reliable and/or fast. They provide no locking, reliable retrieval of data, and in the case of HTTP not a persistent connection. Besides that it would really be better to make things mountable (instead of the kinda weird access methods the author suggests). As some others already said, patches like LUFS already provide things like FTP mounting. So, if you really want it you can get it.
* This is already possible in PHP (fopen())
* This is already possible in all KDE apps via KIO slaves (all file dialogs and mounting via KIO Fuse)
* This is already possible in Gnome apps via Gnome-VFS (all file dialogs in Gnome apps)
* This is already possible with some kernel modules like LUFS (this supports mounting remote file systems)
* This is already possible by this HTTP (LD_PRELOAD) hack for GLIBC: http://www.hping.org/netbrake/
The last one comes the closest to your ideas, you can just do a “cat http://google.com/search?q=hello“ from your command line after preloading the netbrake lib.
It’s appealing to be able to mount remote resources (DAV, FTP etc) so that programs that were not designed to access these are suddenly empowered to do so, with ZERO code change.
A flip side of this is that with zero code change, your app cannot cope with a communication problem. A solution to this is the NFS-style hard mount, in which the application is put in IO wait by the kernel until the resource comes back to life.
Which, of course, could take forever. If the mount is not INTR enabled, you cannot even kill the application!
Apple’s finder is such an application. It will display a cute beach ball (or spinning wheel?) while you wait. And wait. And wait!
Unfortunately, Finder is very central to the MacOs…
Mind you, other OSes are no better.
All mount types should have a configurable timeout on error (NFS does) and applications should be ready to handle errors. Not as in catch(IOException e) {}
NFS is mostly okay because it lives in campus/enterprise LANs with sysadmins and where latency is not too high.
This breaks down on the internet.
I’ve heard of Hurd but there ain’t a herd of Hurd out there being used. Go Microkernels! Go away Monolithic kernels. Why should my kernel have a video driver in it? Yes, the file-acess-layer running on top of my kernel (is this what you mean by OS, Tonto?) should abstract the details of file-systems. A more classic example of this is the CD-ROM. Remeber when programs had to be “multimedia aware” to access files on a CR-ROM drive? Now, let’s all get to work on that Google filesystem so I can use my 1GB on my lowly laptop.
Network transparency is good, WHEN it’s done in the OS (transparent), and why? Because it makes applications a) more powerful/flexible, and b) simpler.
Why should an MP3 player have code to open a file, play it, and then some more, different code for the case that the data comes in over a network? Just get the data, and play it, allright? So stuff like ftp servers mounted on directories, where all common file access works the same as for local files, is nice. The user will know that it’s coming from overseas, if he just mounted that ftp directory.
The OS SHOULD worry about local vs. remote, and prefer local operations. In a distributed sytem/cluster, an approach could be, to migrate workloads to other CPU’s in the network as much as possible. Wrong: instead of one CPU doing the thing locally (efficient), it splits the task in parts, and creates a lot of network traffic moving data/code around. A better approach: try to use local resources FIRST, and when those are in full use, THEN try to use remote resources to help speed things. The same effect, but minimises use of the network (=usually the most precious resource/bottleneck)
Looks like a cool idea when you first see it. But really, it is quite impractical. Networks are quite different from local devices, there are all sorts of latency, connection and authentication issues that, in the end, if you want decent usability, must be dealt with in the application. I guess an asyncrounous API extended for dealing with networks could work.
In the kernel? No way. Stick it in a library. (But this is coming from a person who thinks that the local FS and disc drivers should be in libraries too!) If you’re writing an OS, make these libraries your API, not what the kernel exposes.
Has anyone tried typing and HTTP or FTP link in to a Windows open dialog? Works a treat, don’t think it can save though. Only tried this with XP.
Make the local filesystem work like a networked one, problem solved. Your local HD is, after all, connected with unreliable wires that may catch fire at any given moment. Even ramfs or cpufs may fail. Yet these unfortunate happenings shouldn’t take the whole system down.
Things fail. Deal with it.
http://www.bebits.com/app/3511
of course a total “plan 9” hardware sharing over the network would be the best.
Yes latency isue arise, but remember that a HD also have that, just at smaller latency and it’s managable.
lftp http://www.microsoft.com
lftp http://www.microsoft.com:/> ls
lftp http://www.microsoft.com:/> rm index.html
rm: Access failed: 404 Not Found (index.html)
Haha… default.asp!
@Alwin Henseler
Why should an mp3 player deal with files at all? An mp3 “player” should do nothing more that decode an mp3 stream into raw audio data.
stdin, stdout & stderr is all a program should ever think about. (I could imagine adding a new interface for configuration though).
@Daniel de Kok
HTTP/1.1 supports persistent connections.
There are other problems with HTTP though. HTTP-URIs are opaque so there is no concept of directories (WebDAV has collections) and files, each URI identifies a resource. If you can read (GET) http://www.example.com/foo/bar/ you can’t assume that http://www.example.com/foo/ exists f.ex.
Looks like a cool idea when you first see it. But really, it is quite impractical. Networks are quite different from local devices, there are all sorts of latency, connection and authentication issues that, in the end, if you want decent usability, must be dealt with in the application. I guess an asyncrounous API extended for dealing with networks could work.
What you don’t realize is that all the problems you can have with a networked filesystem can happen in some local ones, too.
Think about the spin-up of CD/DVD drives, not to mention scratches on the disk. And what if the user decides to eject the CD while something still accesses it? I know, Linux locks the CD tray by default, but that’s a horrible idea from a usability POV. And anyway, there are removable devices that simply *cannot* be locked, such as USB sticks.
To summarize, network transparency only makes *already existing* problems more apparent. Once those *already existing* problems are fixed, network transparency is no longer problematic.
In the kernel? No way. Stick it in a library. (But this is coming from a person who thinks that the local FS and disc drivers should be in libraries too!) If you’re writing an OS, make these libraries your API, not what the kernel exposes.
You’re obviously right. That’s what FUSE and the KIO-Bridge are all about, as has been mentioned before.
Integrating local and remote namespaces is an active area of work, but one problem with your suggestion is that adding url keywords and syntax to the local file system breaks the unix model of file names. If you’re really interested in this, please go install Plan9, read all the papers, and look at the ftpfs linux vfs program as well as the links that other people have put here.
When you understand why these remote-file solutions work the way they do instead of the way you think they should, you might want to revise your paper to take into account what your have learned. What it really comes down to is that remote filesystems won’t obey POSIX semantics, and that would break most of the programs which used your interface.
Also, if you’re interested in naming, I recommend you read the “future vision” paper at http://www.namesys.com.
LD_PRELOAD= would do the trick with the additional benefit, that you
could only enable it for Apps where it made sense.
It’s probably easy to convert libferris or something alike into a libc
wrapper, that you simply load `export LD_PRELOAD=/lib/libhttpfopen.so`
whenever needed. This was done before with “zlibc” or “libtrash” and
is far easier than to recompile libc or singe apps. Not a big thing –
write that, not the feature request article!