Keep OSNews alive by becoming a Patreon, by donating through Ko-Fi, or by buying merch!

Understanding surrogate pairs: why some Windows filenames can’t be read

Windows was an early adopter of Unicode, and its file APIs use UTF‑16 internally since Windows 2000-used to be UCS-2 in Windows 95 era, when Unicode standard was only a draft on paper, but that’s another topic. Using UTF-16 means that filenames, text strings, and other data are stored as sequences of 16‑bit units. For Windows, a properly formed surrogate pair is perfectly acceptable. However, issues arise when string manipulation produces isolated or malformed surrogates. Such errors can lead to unreadable filenames and display glitches—even though the operating system itself can execute files correctly. But we can create them deliberately as well, which we can see below.

↫ Zafer Balkan

What a wild ride and an odd corner case. I wonder what kind of odd and fun shenanigans this could be used for.

17 Comments

  1. 2025-02-26 7:30 pm
    • 2025-02-27 11:58 pm
  2. 2025-02-26 7:38 pm
    • 2025-02-26 8:27 pm
      • 2025-02-27 7:41 am
      • 2025-02-27 8:06 am
        • 2025-02-27 9:36 am
    • 2025-02-26 8:30 pm
      • 2025-02-27 7:44 am
      • 2025-02-27 7:47 am
        • 2025-03-01 8:36 am
          • 2025-03-01 1:28 pm
  3. 2025-02-27 4:48 am
    • 2025-03-01 8:21 am
  4. 2025-03-02 8:41 am
    • 2025-03-02 1:29 pm
    • 2025-03-02 8:10 pm