Some more light reading:
While it was already established that the open source supply chain was often the target of malicious actors, what is stunning is the amount of energy invested by Jia Tan to gain the trust of the maintainer of the
xz
project, acquire push access to the repository and then among other perfectly legitimate contributions insert – piece by piece – the code for a very sophisticated and obfuscated backdoor. This should be a wake up call for the OSS community. We should consider the open source supply chain a high value target for powerful threat actors, and to collectively find countermeasures against such attacks.In this article, I’ll discuss the inner workings of the
↫ Julien Malkaxz
backdoor and how I think we could have mechanically detected it thanks to build reproducibility.
It’s a very detailed look at the situation and what Nix could to prevent it in the future.
It looks like the XZ malware is only enabled on systemd Linux distributions. No?
picamanic,
I think that happens to be the vector they used, but whether this specific vector work on a distro or not, we all should be very concerned because the attacker could exploit XZ software through a myriad of ways. Still, I agree it gives some credence to the idea that systemd has become too complicated. It gets criticized for abandoning the principal that unix tools should do one thing and do one thing well. Systemd has increased tight coupling to a degree that no other init system does, which is generally not a good thing. Even after several years getting used to systemd, I still think this is a legitimate criticism. IMHO init systems shouldn’t be attempting to take over other specialized tools and daemons. Alas, this article isn’t about systemd.
I take issue with nearly everything mentioned in the section “Building software from trusted sources”
* the tarball workflow predates the existence of git and was used in the earliest Linux distributions;
Who cares, do what’s right, follow best practices.
* tarballs are self-contained archives that encapsulate the exact state of the source code intended for release while git repositories can be altered, creating the need for a snapshot of the code;
Everything in Git is a snapshot except branches and tags. Just use a combination tag/commit SHA.
* tarballs can contain intermediary artifacts (for example manpages) used to lighten the build process, or configure scripts to target specific hardware, etc;
These are surely automated by scripts contained within the registry and can be ran locally. If there are dependencies needed to run those scripts use Docker.
* tarballs allow the source code to be compressed which is useful for space efficiency.
So does Git. It’s only decompressed to create your working directory. A shallow clone of a repo results in a .git directory with a .pack file which would be roughly the same size as the corresponding .tar.gz file.
The author then goes on to say it’d be better to rely on release archives generated by GitHub because it would require a compromise of GitHub itself.
This is a terrible argument and introduces another vector for malicious actors. It’s one more needless link in the supply chain.
I’m not really convinced by the author’s overarching point that separate tarballs was the main problem. However I would generally agree with him that the tarballs should be produced from the same source.code and that the process should be completely automated. No reason for the tarball to be different.
The more fundamental problem though is the fact that a “legitimate” maintainer of a trusted code base was compromised.
I’ve always been skeptical of the many eyes argument for FOSS. In theory that can happen but in reality we tend to trust a small set of responsible devs who can be overworked (as was the case with xz). It’s eye opening to see these cases that have come to light. Obviously there is a trust issue with proprietary software too. FOSS has the benefit of providing a public record of events. Proprietary software publishers have the opportunity to hide a lot more, we have to trust them not to. FOSS is better in this regard, but how do we solve the problem of malicious actors especially when teams are under-resourced and we can’t count on redundancy?
Food for thought: How do you know this “Jia Tan” person hasn’t found employment in the companies that design the EFI of your computer’s motherboard? EFI has SMM (aka ring -2) access and can even change how an OS boots (CathodeRayDude has done a video about it: https://www.youtube.com/watch?v=ssob-7sGVWs&t=1938s ). And how do you know this “Jia Tan” person isn’t also working in the team within Intel responsible for Intel’s Management Engine too?
If anything, this is an argument for going back to computers powered by discrete manually-soldered transistors and manually-woven ferrite core memory.
kurkosdr,
I don’t know if you remember, but IME computers had a backdoor/vulnerability a few years back.. It was such a major flaw and yet it didn’t come to light until many years after my affected hardware reached reached EOL. IME can be a genuinely useful feature, but it’s extremely off-putting that it runs proprietary firmware that I can neither verify nor fix. If it were FOSS it could be a genuine game changer. as I would love to be able to program the IME myself instead of having it run intel’s proprietary code.
Best you can do is firewall everything, so that remote access is truly mitigated, but most routers do not firewall outbound access by default and consumers may have a lot of hardware running sketchy firmware: smart TV’s, IOT lights, smart speakers, security cameras, network printers, etc. Any one of these could be vulnerable over the internet and may even have known vulnerabilities.
I wouldn’t go that far, but it would be nice if genuinely open hardware were available and accessible.
And you don’t understand that Intel Management Engine is one of the firmwares that have “negative ring” access, then there is the SMM and the microcode. And then there is the baseband firmware (for example the WiFi and cellular baseband firmware).
Literally, unless your computer is FOSS across the entire “negative ring” stack (IME, SMM, microcode, wireless and wired baseband firmware), the whole “FOSS and reproducible builds” thing only secures the ring 0 level, which is worthless with so much powerful firmware below having “negative ring” access.
And how do you know the chips you got from TSMC or SMIC or whoever were manufactured to the design you submitted? Let’s call that a “ring minus infinite” exploit, since hardware overrides any firmware or software running on top of it.
That’s why I said the only way to be 100% your computer was made according to the designs is to use a computer with discrete manually-soldered transistors and manually-woven ferrite core memory
kurkosdr,
That’s really not the selling point for proprietary code that you think it is. Proposing FOSS for these privileged processors is no more ridiculous to a FOSS user than proposing it must run proprietary software. An owner may want to control the IME running on the CPU for the same reason they want control over router firmware, etc. Usually desktops don’t even have a baseband processor, which normally requires a different chip anyway.
That’s your opinion, which is fine, but I value FOSS much more than you and your reasoning about what’s best or most trustworthy for you has no merit for me and what I want to run on my own hardware.
Sure there’s always that possibility that the hardware could be backdoored. Although to me this is an excellent reason to promote diversification and avoiding monoculture instead of everyone running the same thing. Also it doesn’t follow that proprietary CPUs are free of the same risks, especially when we know that backdoors have existed in production x86 CPUs.