File system paths on Windows are stranger than you might think. On any Unix-derived system, a path is an admirably simple thing: if it starts with a
/
, it’s a path. Not so on Windows, which serves up a bewildering variety of schemes for composing a path.When I implemented the path autocompletion feature in Fileside 1.7, I needed to take a closer look at this to make sure I had all bases covered. This blog post shares my findings.
You think you know everything about file system paths on Windows?
Trust me – you don’t. What on earth.
I was close – I think I knew about everything in here apart from \\.\ (which it turns out I possibly should have used where I’d used \\?\ in the past…)
They’re just scratching the surface of window file system strangeness there. You will see a lot of those if you just open up port 80 to the internet and watch what comes in as bots try to do various path traversal tricks to get files it shouldn’t . I recently unlocked the rare achievement of creating two files in the same directory with the same name. In the past, I made an infinitely large file system on a floppy. NTFS also supports odd things like resource forks and junction points that can lead to more fun situations. There are also many quirks in how different versions of Windows handle such paths, or which apis you use to access a file. My old ancient app was killed by a windows update that changed how windows interpreted file paths, such that an erroneous path that somehow worked in older versions of windows stopped working. There is a whole host of mysteries on one of my computers that I haven’t diagnosed the root cause, its just haunted. Absolutely haunted. Is it a filesystem corruption? windows corruption? Hardware issue? Who knows. I just switched off the computer for other reasons, and haven’t had time to diagnose what the hell happened there.
Bill Shooter of Bul,
Strange indeed, though if true and reproducible I’d consider that a bug. Adding unicode opens up a whole other can of worms. Two file names may have characters that render identically but have different binary representations. Unicode supports switching to right-to-left language dialects, which can really mess with your head. File name and file extension appear to be swapped such that you may think something is a text file or PDF when it is in fact an executable.
Yep. Also mounting windows file systems on linux or accessing network shares across operating systems can expose inconsistent semantics. Sometimes I’ve been unable to delete/rename files from linux through samba. Technology is complicated, haha.
Maximum path length is also a fun one (as menti0ned in the article). But: Which exists for compatibility reasons and can be turned off, but even when it’s enabled it is allowed to exceed on network shares anyway.
I think this article has missed a few interesting cases:
1. You can create files with trailing periods or spaces – they just need to be prefixed with \\?\ to avoid normalization.
2. In addition to alternate data streams, NTFS has aliases for the default stream. “foo” and “foo::$DATA” are the same file.
3. To the earlier comment about two files with the same name, NTFS now has per-directory case sensitivity. That wouldn’t be a big deal except for such a long history of applications performing case insensitive matches.
4. In a strict sense case insensitivity was always impossible to handle correctly, because each volume has its own case conversion table. That’s because directories are sorted insensitively, so the case conversion rules cannot change for the life of the directory. Unfortunately, this means any application performing insensitive comparisons using the system’s case conversion table are not guaranteed to match the volume table.
TL;DR – if you try to determine for file equivalence via string comparisons, your code can be broken. Don’t use it for security.
https://en.wikipedia.org/wiki/Canonicalization
I just tried this in bash…
The terminal treats these filenames as unicode sequences and maps them to the same character. However the linux kernel treats file names as raw bytes, without interpreting them as unicode or utf-8, hence the two files with identical looking names in the same directory.
Can someone confirm the same behavior on windows and macos?
Not sure if in the article it was a simplification or not, but the UNC form \\server\C$ (for example) only works because Windows happens to configure a share for each drive letter a that is hidden (the trailing $) by default in most situations (though it can be turned off and in some configurations is already off).
That share is restricted to Administrators only, that’s why it only works for administrators.
You can make any share with a trailing $ to hide it, and you can turn off the automatic drive sharing and create new shares like C$ to replace them, pointing to any folder, with any permissions you like.
It shouldn’t be considered a reliable alias for a drive letter.