Some more light reading:
While it was already established that the open source supply chain was often the target of malicious actors, what is stunning is the amount of energy invested by Jia Tan to gain the trust of the maintainer of the
xz
project, acquire push access to the repository and then among other perfectly legitimate contributions insert – piece by piece – the code for a very sophisticated and obfuscated backdoor. This should be a wake up call for the OSS community. We should consider the open source supply chain a high value target for powerful threat actors, and to collectively find countermeasures against such attacks.In this article, I’ll discuss the inner workings of the
↫ Julien Malkaxz
backdoor and how I think we could have mechanically detected it thanks to build reproducibility.
It’s a very detailed look at the situation and what Nix could to prevent it in the future.
It looks like the XZ malware is only enabled on systemd Linux distributions. No?
picamanic,
I think that happens to be the vector they used, but whether this specific vector work on a distro or not, we all should be very concerned because the attacker could exploit XZ software through a myriad of ways. Still, I agree it gives some credence to the idea that systemd has become too complicated. It gets criticized for abandoning the principal that unix tools should do one thing and do one thing well. Systemd has increased tight coupling to a degree that no other init system does, which is generally not a good thing. Even after several years getting used to systemd, I still think this is a legitimate criticism. IMHO init systems shouldn’t be attempting to take over other specialized tools and daemons. Alas, this article isn’t about systemd.
I take issue with nearly everything mentioned in the section “Building software from trusted sources”
* the tarball workflow predates the existence of git and was used in the earliest Linux distributions;
Who cares, do what’s right, follow best practices.
* tarballs are self-contained archives that encapsulate the exact state of the source code intended for release while git repositories can be altered, creating the need for a snapshot of the code;
Everything in Git is a snapshot except branches and tags. Just use a combination tag/commit SHA.
* tarballs can contain intermediary artifacts (for example manpages) used to lighten the build process, or configure scripts to target specific hardware, etc;
These are surely automated by scripts contained within the registry and can be ran locally. If there are dependencies needed to run those scripts use Docker.
* tarballs allow the source code to be compressed which is useful for space efficiency.
So does Git. It’s only decompressed to create your working directory. A shallow clone of a repo results in a .git directory with a .pack file which would be roughly the same size as the corresponding .tar.gz file.
The author then goes on to say it’d be better to rely on release archives generated by GitHub because it would require a compromise of GitHub itself.
This is a terrible argument and introduces another vector for malicious actors. It’s one more needless link in the supply chain.
I’m not really convinced by the author’s overarching point that separate tarballs was the main problem. However I would generally agree with him that the tarballs should be produced from the same source.code and that the process should be completely automated. No reason for the tarball to be different.
The more fundamental problem though is the fact that a “legitimate” maintainer of a trusted code base was compromised.
I’ve always been skeptical of the many eyes argument for FOSS. In theory that can happen but in reality we tend to trust a small set of responsible devs who can be overworked (as was the case with xz). It’s eye opening to see these cases that have come to light. Obviously there is a trust issue with proprietary software too. FOSS has the benefit of providing a public record of events. Proprietary software publishers have the opportunity to hide a lot more, we have to trust them not to. FOSS is better in this regard, but how do we solve the problem of malicious actors especially when teams are under-resourced and we can’t count on redundancy?