FOSS infrastructure is under attack by AI companies
What do SourceHut, GNOME’s GitLab, and KDE’s GitLab have in common, other than all three of them being forges? Well, it turns out all three of them have been dealing with immense amounts of traffic from “AI” scrapers, who are effectively performing DDoS attacks with such ferocity it’s bringing down the infrastructures of these major open source projects. Being open source, and thus publicly accessible, means these scrapers have unlimited access, unlike with proprietary projects.
These “AI” scrapers do not respect robots.txt, and have so many expensive endpoints it’s putting insane amounts of pressure on infrastructure. Of course, they use random user agents from an effectively infinite number of IP addresses. Blocking is a game of whack-a-mole you can’t win, and so the GNOME project is using a rather nuclear option called Anubis now, which aims to block “AI” scrapers with a heavy-handed approach that sometimes blocks real, genuine users as well.
The numbers are insane, as Niccolò Venerandi at Libre News details.
Over Mastodon, one GNOME sysadmin, Bart Piotrowski, kindly shared some numbers to let people fully understand the scope of the problem. According to him, in around two hours and a half they received 81k total requests, and out of those only 3% passed Anubi’s proof of work, hinting at 97% of the traffic being bots – an insane number!
↫ Niccolò Venerandi at Libre News
Fedora is another project dealing with these attacks, with infrastructure sometimes being down for weeks as a result. Inkscape, LWN, Frama Software, Diaspora, and many more – they’re all dealing with the same problem: the vast majority of the traffic to their websites and infrastructure now comes from attacks by “AI” scrapers. Sadly, there’s doesn’t seem to be a reliable way to defend against these attacks just yet, so sysadmins and webmasters are wasting a ton of time, money, and resources fending off the hungry “AI” hordes.
These “AI” companies are raking in billions and billions of dollars from investors and governments the world over, trying to build dead-end text generators while sucking up huge amounts of data and wasting massive amounts of resources from, in this case, open source projects. If no other solutions can be found, the end game here could be that open source projects will start to make their bug reporting tools and code repositories much harder and potentially even impossible to access without jumping through a massive amount of hoops.
Everything about this “AI” bubble is gross, and I can’t wait for this bubble to pop so a semblance of sanity can return to the technology world. Until the next hype train rolls into the station, of course.
As is tradition.