Files and Programmatic Streams

This article offers feature suggestions to budding OS developers looking for that neat edge.Conventionally, each file on a computer system has a type. Different operating systems typically had different ways of working out the logical type of a file. UNIX uses a special flag to see if a file was ‘executable’, otherwise it is ‘data’. Therefore, applications on UNIX have a variety of strategies for determining the file type. Some programs use the file extension as a hint (.jpeg is probably an image, for example) whereas others use more advanced heuristics (e.g. the UNIX ‘file’ command) sometimes called ‘magic’. MS-DOS and MS-Windows use the file extension to determine the type (.exe for executables, .txt for text files etc). More recent operating systems have embraced more recent classification systems such as MIME. In BeOS, for example, a server scans files and marks the MIME type of them in an attribute (basically caching the guess from the UNIX-style ‘file’ heuristics) so that it can easily be looked at later. Almost all Internet content is marked with a MIME type (web pages, files received via email etc) and this can clearly indicate to the receiving computer the intended type of a file.

Now let me introduce something slightly novel and new! Imagine that our new hobby operating system supports multi-stream files. Each stream in a file is a certain MIME type. Some of these streams might be generated programmatically on-the-fly, and basically encapsulate conversion. These programmatic streams allow plug-in filters and converters to be integrated into the operating system and used by programs. Let us explore what this might mean and how it might be used by programs:

The operating system knows the actual MIME type of a file. It also has a list of programmatic stream plug-ins associated with each MIME type to handle the interpretation of a stream of one type into that of another. An obvious plug-in would be to filter text/html so that it was text/plain, stripping out the markup. Another plug-in might convert image/jpeg to image/gif. Another might extract the soundtrack of a movie file and present it as audio/wav.

An application/x-idx file might have a programmatic stream of type text/gnu-makefile, while another programmatic stream might convert text/gnu-makefile streams into application/bash shell scripts. Therefore the operating system can string together these streams transparently, making an application/x-idx file executable in at a bash prompt.

Encodings and character sets might also be incorporated into the same system, for on-the-fly conversion between various sets. A code snippet:

FILE *f = fopen_ex(“ftp://stop.blogging.com/varfar/myos.zip/documents/filesystem.pdf”,”rb”,”text/html;charset= iso-8859-1″);

A file selector dialog might have the file type set as a MIME type and display all documents that match a specific MIME type that the application understands without the application having to understand the conversion process.

Most conversions are lossy – they remove information and meta-data, downgrading the quality of a source. Therefore, it is anticipated that almost all programmatic streams might be read-only. Many programmatic plug-ins’ might be easiest to implement if they support sequential access only. For this reason, it makes sense for the file selector dialog to be able to filter out streams with inappropriate access (e.g. only display application/ogg streams that are writeable and support random access). However, the opportunity is there for super plug-ins that facilitates content stored as one type to be completely edited as another, e.g. reading and writing to an application/ms-word file with an application that only understands application/x-multipart-html.

Let me extend the idea even further. Not only might a file be multi-streamed, it might also be multi-part! Many file formats encapsulate multiple content parts – a movie has both image frames and a soundtrack; a rich document has text, drawing and images embedded; a mail message might contain a plain-text version, a rich text version or two and many attachments. An archive contains multiple files inside.

Such files could be presented to programs as directories. You could now use your favorite drawing program to edit the individual frames of a movie!

Directories themselves can have streams too. In the case of a virtual directory being generated from a single file, the directory’s stream would be the actual physical file (and programmatic streams that convert it). A conventional directory might have a text/html stream the serves the index.html file that it contains; another directory stream might provide a tar file of it and its children.

Conversion libraries on most operating systems abound. Typically these libraries are fragmented, often unwieldy, often exposing technical details, and not pervasively used.

A new hobby operating system has the opportunity to provide a uniform interface between applications and these conversion filters, in a way that is minimum effort for application programmers and in a way that is extendable (by providing more conversion filter plug-ins) without requiring extra code nor recompiling or re-linking of all applications. Placing this conversion into the way that files are provided to applications is, in my opinion, an excellent way to achieve this.

Programmatic streams do not expose many parameters to tweak conversion and therefore do not completely replace the need for dedicated, detailed conversion libraries for dedicated programs. But they do further the usability of the average program in the normal operating scenario.


If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.

29 Comments

  1. 2004-08-16 8:44 am
  2. 2004-08-16 8:47 am
  3. 2004-08-16 8:49 am
  4. 2004-08-16 9:18 am
  5. 2004-08-16 9:26 am
  6. 2004-08-16 9:53 am
  7. 2004-08-16 10:15 am
  8. 2004-08-16 10:47 am
  9. 2004-08-16 11:03 am
  10. 2004-08-16 11:15 am
  11. 2004-08-16 11:28 am
  12. 2004-08-16 11:46 am
  13. 2004-08-16 11:48 am
  14. 2004-08-16 12:00 pm
  15. 2004-08-16 12:17 pm
  16. 2004-08-16 12:27 pm
  17. 2004-08-16 12:32 pm
  18. 2004-08-16 12:34 pm
  19. 2004-08-16 1:07 pm
  20. 2004-08-16 3:37 pm
  21. 2004-08-16 4:05 pm
  22. 2004-08-16 4:24 pm
  23. 2004-08-16 5:00 pm
  24. 2004-08-16 7:07 pm
  25. 2004-08-16 8:02 pm
  26. 2004-08-16 9:04 pm
  27. 2004-08-16 9:06 pm
  28. 2004-08-17 12:17 am
  29. 2004-08-17 6:59 am