Non-POSIX file systems

Thom Holwerda 2020-09-08 OS News 27 Comments

Operating systems and file systems have traditionally been developed hand in hand. They impose mutual constraints on each other. Today we have two major leaders in file system semantics: Windows and POSIX. They are very close to each other when compared to the full set of possibilities. Interesting things happened before POSIX monopolized file system semantics.
When you use a file system through a library instead of going through the operating system there are some extra possibilities. You are no longer required to obey the host operating system’s semantics for filenames. You get to decide if you use / or \ to separate directory components (or something else altogether). Maybe you don’t even use strings for filenames. The fs-fatfs library uses a list of strings, so it’s up to the caller to define a directory separator for themselves. While working on that library, I was driven to write down some ideas that I’ve previously run across and found inspirational.

A deep dive into file system hierarchies before the major platforms we used today – POSIX and Windows – became the two de-facto standards. Excellent article, and a joy to read.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

27 Comments

2020-09-08 10:49 pm
astro
I’ve always suspected that the reason Microsoft uses the backslash as the directory separator is b/c the first version of MS-DOS didn’t support directories, it was a flat namespace targeting floppy disks. So MS “improved” on the Unix command line syntax by using the more intuitive forward slash instead of a hyphen for command line options, e.g. “copy /a this.txt that.txt”.
Then they added support for hierarchical file systems, and they discovered that using the forward slash for directory paths would result in ambiguities with existing command lines. D’oh! C uses backslash as the escape character, so Windows directory separators have to be double backslashes when they appear in literal strings in C and many other languages.
Ten years later, some MS VP decided to show off Windows’ support for long file names (including embedded spaces), so they decided that the root for application programs would be C:\Program Files. To address the parsing issue, I think they added heuristics to the command shell to guess the intention of the user or script. D’oh!
And those long Windows filenames supported up to 260 characters, the infamous MAX_PATH limit. Unfortunately, that applied to the entire path, not to each component of the path. While that was plenty for 1991, it soon became a severe limitation that power users had to work around, until finally MS came out with a registry modification that would do the trick (but it still wasn’t the default behavior of the OS).

2020-09-09 12:40 am
tidux
No, it was IBM meddling. MS wanted more Unix compatibility, but IBM insisted on \ / instead of / -.

2020-09-09 1:21 am
DrJohnnyFever
There is an argument that CP/M had used / for command options so DOS adopted that as well.

2020-09-09 2:39 am
krebizfan
/ was seldom used on CP/M. However, / was the standard command line switch over on DEC systems and MS staff had done a lot of work using DEC hardware.
PIP on CP/M used brackets [] to surround command options so
PIP A:=B:*.*[V] copies files to drive A: and then verifies the copy
The RT-11 counterpart would be
PIP DK:*.*=DL:*.*/H (/V does something very different)
/A and /B carried over to the MS-DOS COPY command.
PIP was quite powerful but easy to get wrong so even DEC substituted commands like COPY and DELETE in DCL which were the commands to show up in the various incarnations of DOS.
IBM didn’t want to break any existing code that used / for the switches and chose backslash since that was one of the few characters not already used and, even better, didn’t require using the shift key. IBM at the time was fairly anti-Unix so it shouldn’t be much of a surprise that IBM was not willing to follow Unix’s lead. IX/370 and its evolution into AIX was considerably later and a reaction to losing contracts to Amdahl which had a Unix variant.

2020-09-09 6:53 am
Alfman verbose=1
tidux,
No, it was IBM meddling. MS wanted more Unix compatibility, but IBM insisted on \ / instead of / -.
I’m not sure who made the decision, but astro’s right about the ordering. Originally DOS didn’t have directories and when those were added / was already in use and would have been ambiguous. They either needed to break backwards compatibility, or use a different character; they went with the later.
Seattle Computer Systems, the company that microsoft bought DOS from, might have been responsible for / in DOS commands?
https://www.windowscentral.com/microsoft-bought-ms-dos-os-early-ibm-pcs-july-27-1981
I am not sure if it’s even possible to get a copy of Seattle Computer Systems’s DOS to check what it used, but obviously that was before the IBM/MS deal.

2020-09-09 9:04 am
M.Onty
“And those long Windows filenames supported up to 260 characters, the infamous MAX_PATH limit. Unfortunately, that applied to the entire path, not to each component of the path. While that was plenty for 1991, it soon became a severe limitation that power users had to work around, until finally MS came out with a registry modification that would do the trick (but it still wasn’t the default behavior of the OS).”
I came across this recently, trying to back up some files from a Windows 10 machine to a (slightly) deeper directory on a Windows 7 machine (I forgot to upgrade it on time, so shoot me, I didn’t want to mess with the software).
So my question is, when did that 260 characters limit go away? I looked it up on Wikipedia and came the the half-arsed conclusion that 256 characters for the full path was still a thing across most file systems and OS’s, including Linux. But didn’t have time to check that for sure.

2020-09-09 11:20 am
Alfman verbose=1
M.Onty,
So my question is, when did that 260 characters limit go away? I looked it up on Wikipedia and came the the half-arsed conclusion that 256 characters for the full path was still a thing across most file systems and OS’s, including Linux. But didn’t have time to check that for sure.
On linux, there’s a 255 limit on filename length, but that limit doesn’t apply to the path. The POSIX API reports 4096 bytes on linux, however there isn’t actually any official path length limit on linux.
The top answers are actually wrong, but the lower ranked answers are informative.
https://serverfault.com/questions/9546/filename-length-limits-on-linux
#define NAME_MAX 255 /* # chars in a file name */
#define PATH_MAX 4096 /* # chars in a path name including nul */
As for the maximum path length, that’s a big misconception. There isn’t one for most Linux filesystems.
Just now I created a path 400 levels deep with 255 byte names (~100k byte path) and it worked fine. There may be some APIs that can’t handle this, but it’s not an OS or filesystem limit.
mkdir -p aaaaaaaa…/aaaaaa…/…
There’s technically no reason to enforce a limit in linux, it would be more work climb up the inode tree in order to report back a PATH_TOO_LARGE error and then you’d have a conundrum about what to do if someone renames a parent directory from 5 to 6 characters placing one of it’s children over the path limit. It’s easier just to return success ignoring how long the path can be. Clearly at some point you’d run out of inodes, but that’s handled separately.
Wikipedia says windows has a 32k limit with NTFS, I think the 260 or so path limit was only for VFAT partitions with long file names?

2020-09-09 11:49 am
astro
I’ve run into this problem with a CIFS mount of a deeply nested source code file tree stored on a Linux server, from Windows 7. I think I was able to see the entire tree using Windows Explorer, but I couldn’t compile the solutions using Visual Studio because the filepaths were too long.

2020-09-09 12:25 pm
Alfman verbose=1
astro,
I’ve run into this problem with a CIFS mount of a deeply nested source code file tree stored on a Linux server, from Windows 7. I think I was able to see the entire tree using Windows Explorer, but I couldn’t compile the solutions using Visual Studio because the filepaths were too long.
Good observation, I am able to reproduce it as well. Using a CIFS mount from linux to linux, I cannot access anything who’s path is over 4K deep. Regardless of the operation that I use, I get an error “File name too long”. Interestingly wireshark shows that the full path is actually sent across the network, but the operation fails. I wasn’t able to find a solution. It would seem that a 4k limit is enforced in the samba code, I’m curious if windows has this limit for network shares too.
2020-09-09 12:38 pm
astro
Alfman,
My problem was with a 260+ character path, not 4K. I just reproduced it on Windows 10 (w/o registry fix), by creating a cascade of folders with long names in Windows explorer. After I reached the filepath limit, clicking “New Folder” results in the message box “Destination Path Too Long: The file name(s) would be too long for the destination folder… “.

2020-09-09 11:43 am
Antartica_
In the Unixes/linuxes I’ve programmed, the PATH_MAX constant was usually defined to 1024 (those were various versions of SunOS/Solaris, HP-UX and Linux).
Specifically, in Linux the transition in the #define in /usr/include/linux/limits.h from a PATH_MAX of 1024 to 4096 is not so long ago — it seems that it is somewhere between Debian 1.3.1 and Debian 2.2, but I don’t have installations of other systems here to check where exactly. Well that are old OS installations, but not that ancient ;).

2020-09-09 2:14 pm
Alfman verbose=1
Antartica_,
In the Unixes/linuxes I’ve programmed, the PATH_MAX constant was usually defined to 1024 (those were various versions of SunOS/Solaris, HP-UX and Linux).
Specifically, in Linux the transition in the #define in /usr/include/linux/limits.h from a PATH_MAX of 1024 to 4096 is not so long ago — it seems that it is somewhere between Debian 1.3.1 and Debian 2.2, but I don’t have installations of other systems here to check where exactly. Well that are old OS installations, but not that ancient ;).
That limit is defined for the sake of some POSIX APIs that can only handle fixed length buffers, see realpath and getwd for example:
https://linux.die.net/man/3/getcwd
char *getwd(char *buf);
getwd() does not malloc(3) any memory. The buf argument should be a pointer to an array at least PATH_MAX bytes long. If the length of the absolute pathname of the current working directory, including the terminating null byte, exceeds PATH_MAX bytes, NULL is returned, and errno is set to ENAMETOOLONG. (Note that on some systems, PATH_MAX may not be a compile-time constant; furthermore, its value may depend on the file system, see pathconf(3).) For portability and security reasons, use of getwd() is deprecated.
Note that getcwd API has a size parameter, allowing for larger sizes if the OS permits it
char *getcwd(char *buf, size_t size);

2020-09-12 7:59 am
JeffR
In isolation there isn’t really much difference between using the “-a” style of command option and the “/a” style; the GNU “- -option” makes more sense (and Multics used the style “-option”, which Unix abandoned in favour of the above). Arguably, the Multics pathname style “>path>to>filename” makes more sense than either “/path/to/filename” or “\path\to\filename” (though both make more sense than “C:\path\to\filename” or “DH0:path/to/filename” like on an Amiga, and the “>path>to>filename” style was only possible because it used a more primitive form of redirection (with temporary files) than Unix, and no pipes).
2020-09-12 7:32 pm
Real Life
A new model to add automation on any File>Open or File double click to open and be able to start a ‘make.exe’ running to edit the file being loaded is a side effect of providing ‘Inter & Intra document aliasing’ via the new FUSE FS http://www.atom-o.com

2020-09-10 6:55 am
The1stImmortal
The “Secondary Storage” references are interesting. Sounds like the kind of thing that Microsoft had with the Windows Remote Storage System in 2000 & 2003. Used rules or manual action to move files out from fileserver shares to tape (or stuff that looked like tape) and left reparse points/links in place of the original file. Clients would access the file and trigger the server to drag the actual data in from tape.
It wasn’t terribly popular and they removed it in Server 2008. I imagine the same unpopularity is why it hasn’t been widely adopted elsewhere!
The backup/restore semantics the secondary storage in the article enabled has mostly been replaced these days by differential and reverse-incremental backups so that’s not much of a selling point these days either.

2020-09-10 9:01 am
astro
The notion of hot (cached), warm (online), and cold (archive) storage has resurfaced for the cloud, though. More tiers are probably on the way with edge and on premises (but externally managed) caches.
2020-09-11 9:31 am
dnebdal
It’s also similar to how onedrive files worked in Windows 8 – you got a placeholder in the filesystem that was intended to work like a regular local file, but it was fetched on-demand from onedrive.

2020-09-13 7:15 pm
The1stImmortal
Good point, I hadn’t thought about the various cloud-based file storage systems

2020-09-10 9:13 am
Moochman
Hm, I was hoping there might be some details on BeOS’s file system, which essentially was a database and file system in one, allowing freely extendable metadata and instant search capabilities, IIRC.

2020-09-10 10:38 am
anevilyak
There’s not really terribly much exciting to describe there. The early BFS in pre-release DR versions was indeed closer to a database, but what ultimately ended up as BFS in R3 and up is not that far off from a traditional POSIX filesystem. The metadata is mostly the same as NTFS named streams or POSIX xattrs, except for not having the length limits of the latter (most *nix filesystems only allow as much xattr data as will fit in the inode). The main distinguishing factor is that each attribute is given a type (int, string, etc.), and can optionally have an index added to the filesystem for it, which is where the search capabilities come from. However, this is not automatic; apart from a few built-in attributes such as name, size, last modified and such, most attributes are not indexed by default, and consequently weren’t searchable in BeOS. Also note that maintaining the indexes comes at a cost, since it means that any modification to an attribute or file also needs to update any corresponding index, which is one of the reasons deletes on BFS are quite slow (removing a file means removing it from several different indices in addition to deallocating its used space and removing its inode from its parent directory).

2020-09-10 1:22 pm
kurkosdr
I never understood why everything should be POSIX. In fact, there are several things in POSIX which are downright un-intuitive and lame. The forward slash is one example, as it creates confusion with the forward slashes in URLs. Having to mount optical discs under the main harddrive is another, as it’s a gross hack intended to work around the fact the original Unix was never meant to have support for more than one filesystem root (all it was meant to do was run Space Travel after all).
And it’s not like POSIX is an actual platform you can target (like Java SE for example). Nobody programs “POSIX” apps. And programming languages should NOT be tied to POSIX but be reasonably abstract. Not everything has to be POSIX.

2020-09-10 2:29 pm
Alfman verbose=1
kurkosdr,
…there are several things in POSIX which are downright un-intuitive and lame. The forward slash is one example, as it creates confusion with the forward slashes in URLs.
Considering that the path delimiter came first, that’s kind of a funny statement. But actually I think using the same delimiter is more consistent, no?
Having to mount optical discs under the main harddrive is another, as it’s a gross hack intended to work around the fact the original Unix was never meant to have support for more than one filesystem root (all it was meant to do was run Space Travel after all).
That’s always felt weird to me as well. Should root be a separate namespace for mounting volumes? I can see how people might find this aspect of unix awkward.
And it’s not like POSIX is an actual platform you can target (like Java SE for example). Nobody programs “POSIX” apps. And programming languages should NOT be tied to POSIX but be reasonably abstract. Not everything has to be POSIX.
I’m not sure why you say this, this is exactly what POSIX is supposed to be. Unfortunately POSIX APIs can lag in development progress compared to other platform specific APIs, which often encourages platform developers to do their own thing. Also there’s a lot of legacy baggage.
2020-09-12 8:17 am
JeffR
[quote] it’s a gross hack intended to work around the fact the original Unix was never meant to have support for more than one filesystem root (all it was meant to do was run Space Travel after all).[/quote]
Comparing the version of Unics [sic] that ran Space Travel to Unix 4th Edition, Solaris, HP-UX or Linux is like comparing Windows 1.0 to Windows 10: in particular, the original PDP-7 version of Unics didn’t even have pathnames, and employed a complicated system of links which, fortunately, was ultimately ditched; also, the “mount” command existed early on, because it was recognised that people would want to mount not only more than one permanent filesystem (e.g. /usr, which originally performed the function of /home in Linux, but started accruing binaries as the number of binaries in Unix grew), but also temporary filesystems such as DECtapes (which unusually for tapes, also had a hierarchical filesystem), which could be mounted on e.g. /mnt.
At worst, it’s a matter of taste whether a path such as /path/to/filename is an improvement on DHO:path/to/filename (like on an Amiga, where DH0, by accident or design, replicates the Spanish order of noun->adjective (Disk Hard Zero) (“Amiga” being the Spanish word for “a friend who is a woman,” not “girlfriend” as is sometimes stated); but the DEC systems the Amiga would appear to take inspiration from used root volume designations like DUAo: and DQA0: with no apparent rhyme or reason. Either way, in an environment where nobody has floppy disks, using C: as the root of the filesystem, because nobody has floppy disk drives (labelled A: and B:) anymore is a curious convention, at best. Arguably it was a mistake that internet addresses aren’t e.g. “file:/path/to/filename” or /org/openbsd/INSTALL.html, but using the forward slash is the least of their problems, designwise.

2020-09-13 7:22 pm
The1stImmortal
Windows NT family OSes also internally have a single namespace (though not just for filesystems – for *everything*) in the NT Object Manager namespace – C: is just a user-visible alias for something like /device/HardDiskVolumeX.
So even Windows ended implementing effectively a single-root filesystem and just hid it for backwards compatibility. It’s a handy model.

2020-09-13 9:13 pm
Alfman verbose=1
The1stImmortal,
/device/HardDiskVolumeX.
So even Windows ended implementing effectively a single-root filesystem and just hid it for backwards compatibility. It’s a handy model.
Windows does use a root namespace, and under the hood drive letters like “C:” are actually aliases of the root namespace, as you pointed out. However it isn’t quite the same model as unix. You mount disks into a root namespace, but not into another file system over-top directories the way unix does it.
I mount stuff all the time under linux but to this day it still feels awkward to have to mount one file system into another. It doesn’t feel like it belongs there. Sure I’ve gotten used to it, but I’d rather that things like cdroms, usb drives, loopback images, network shares, etc could be mounted without having to place it in another file system on the host. It sucks having to modify a root FS just to mount another file system, sometimes the root FS is readonly and you want to be able to mount other file systems – a completely reasonable thing to do, but is problematic under the unix model. There are workarounds, like mounting a tmpfs on top of a directory on the host and then mounting new media within this tmpfs, but ugh this is hacky. Another thing I really dislike is that it’s hard to do maintainance on the root filesystem once you’ve mounted other filesystems into it. Oh well, that’s the way they did it on unix, IMHO windows had a better more intuitive design in this regard.

2020-09-14 9:18 am
The1stImmortal
You mount disks into a root namespace, but not into another file system over-top directories the way unix does it.
Well you can, and it’s quite common in the environments I work in. You can mount filesystems to folders within other filesystems on NT family OSes (though some filesystems don’t let you do that, if memory serves)
You can do that with both normal volumes and VHDX disk images (for example, User Profile Disks and the acquired FSLogix technology mount VHD disk images containing user profiles into folders within c:\users). Arguably, things like the OneDrive filesystem filter driver allow operation of cloud file systems as mounted filesystems too.
2020-09-14 10:03 am
Alfman verbose=1
The1stImmortal,
Well you can, and it’s quite common in the environments I work in. You can mount filesystems to folders within other filesystems on NT family OSes (though some filesystems don’t let you do that, if memory serves)
Yes, I realize that you can do it under NT, even MSDOS supported this too if you recall the join command.
https://en.wikipedia.org/wiki/List_of_DOS_commands
The JOIN command attaches a drive letter to a specified directory on another drive.[10] The opposite can be achieved via the SUBST command.
The command is available in MS-DOS versions 3 through 5. It is available separately for versions 6.2 and later on the Supplemental Disk.[1]
My point was more about linux not having a root namespace external to the primary filesystem. In some scenarios having to mount filesystems into another mounted file system is undesirable. Windows (and DOS) don’t require this and it can simplify things.