Google to websites: let us train our AI on your content, or we’ll remove you from Google Search

Thom Holwerda 2024-08-20 Google 11 Comments

Google now displays convenient artificial intelligence-based answers at the top of its search pages — meaning users may never click through to the websites whose data is being used to power those results. But many site owners say they can’t afford to block Google’s AI from summarizing their content.

That’s because the Google tool that sifts through web content to come up with its AI answers is the same one that keeps track of web pages for search results, according to publishers. Blocking Alphabet Inc.’s Google the way sites have blocked some of its AI competitors would also hamper a site’s ability to be discovered online.
↫ Julia Love and Davey Alba

OSNews still relies partially on advertising right now, and thus Google continues to play a role in our survival. You can help by reducing our dependency on Google by supporting us through Patreon, making donations using Ko-Fi, or buying our merch. The more of you support us, the closer to reality the dream of an ad-free OSNews not dependent on Google becomes. OSNews is my sole source of income, and if that does not work out, OSNews will cease to exist if I’m forced to find another job.

Due to Google’s utter dominance on the internet, websites and publishers have no choice but to accept whatever Google decides to do. Not being indexed by the most popular search engine on the web with like 90% market share is a death sentence, but feeding Google’s machine learning algorithms will be a slow death by a thousands cuts, too, for many publishers. The more content is fed to Google’s AI tools, the better they’ll get at simply copying your style to a T, and the better they’ll get at showing just the little paragraph or line that matters as a Google result, meaning you won’t have to visit the site in question.

It’s also not great for Google in the long-term, either. Google Search relies on humans making content for people to find; if there’s no more quality content for people to find, people aren’t going to be using Google as much anymore. In what is typical of the search giant, it seems they’re not really looking ahead very far into the future, chasing short-term profits riding the AI hype train, while long-term profits take a back seat. Maybe I’m just too stupid to understand the Silicon Valley galaxy brain business boys, but to a simple man like me it seems rather stupid to starve the very websites, publishers, authors, and so on that your main product relies on to be useful in the first place.

I honestly don’t even know how much of OSNews’ traffic comes from Google, so I don’t know how much it would even affect us were we to tell Google’s crawlers to get bent. My guess is that search traffic is still a sizable portion of our traffic, so I’m definitely not going to gamble the future of OSNews. Luckily we’re quite small and I doubt many people are interested in AI generating my writing style and the topics I cover anyway, so I don’t think I have to worry as much as some of the larger tech websites do.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

11 Comments

2024-08-20 10:10 am

teco.sb
You’d think they’d want to take it easy now that they’ve been found to be a monopoly and are facing remedies as extreme as a break-up. I mean, if leveraging their dominance in search and browsing in order to build up their AI business isn’t an abuse of that dominance, then I don’t know what would classify!

It’s also not great for Google in the long-term, either. Google Search relies on humans making content for people to find; if there’s no more quality content for people to find, people aren’t going to be using Google as much anymore. In what is typical of the search giant,

I get what you’re saying, but Chrome currently makes up 65% and Google Search is at 91% (per statcounter) of their respective markets. While some of us actively choose to use a different search engine (DDG, in my case), the majority of the market clearly does not. They’re in a “there’s no other game in town”, kind of position. You either succumb, or risk losing up to 91% of your traffic.
2024-08-20 10:15 am

crystall
I wouldn’t say that Google is chasing short-term profits here. Google has an ad business that is also fueled by websites. If a user doesn’t visit an ad-funded website then they won’t be able to serve them ads. This might or might not yield a positive result in terms of revenue versus showing ads on Google itself. There’s another aspect to this: we know that those summarized results are very expensive to produce for Google, so again it’s unclear if this is really more profitable for them, or it’s just tech hype FOMO.

2024-08-21 10:04 am

dsmogor
Either they chase it or Perplexity / OpenAI will eat their lunch. They have not much choice.
Having said that websites that ban Google AI will likely ban their competitors, so what google shall concentrate on is such bans are well respected by all the companies by (secretly) supporting class actions against AI competitors.

2024-08-20 12:43 pm

Adurbe
I’d probably advise adding some sorts of analytics to the site if you don’t currently know where your traffic is coming from. That’s the data that helps you attract advertisers.

There are plenty of free/opensource options for WP like koko-analytics that are quick, easy and free to plug in. You might find very few come via Google. I, for example, always type the site in directly!

2024-08-21 12:59 am

Alfman verbose=1
Adurbe,

I’d probably advise adding some sorts of analytics to the site if you don’t currently know where your traffic is coming from. That’s the data that helps you attract advertisers.

There are plenty of free/opensource options for WP like koko-analytics that are quick, easy and free to plug in. You might find very few come via Google. I, for example, always type the site in directly!

It helps to know your audience. Osnews is may serve a rather unique audience in terms of technical preferences. So for example FF might be 2% of the entire browser market, it might be higher here. Same deal with duck duck go, ublock, and so on. That said though, bringing advertisers on independently still seems very hard to me, and I don’t have high hopes for smaller sites to attract advertisers on their own even though I’d prefer it that way. Even long term tech & news sites are feeling the struggle.
2024-08-21 3:35 am

jgfenix
But that doesn’t give you the full picture. Most of the sites that I current browse I don’t go via Google but I discovered them with Google. So the analytics won’t say that the traffic comes from Google but without Google it wouldn’t exist.

2024-08-20 4:48 pm

kurkosdr
Serious question though: Why does Google even respect robots.txt files? I expected them to be smarter than that. Why should a little text file prevent anyone from indexing or making transient copies of anything?

Simply put, robots.txt files are voluntarily DRM, and I am surprised anyone pays attention to them.

PS: If you haven’t noticed, I am against any expansion of copyright and reduction in fair-use rights, no matter how “noble” the cause supposedly is

2024-08-21 8:07 am

Lennie
Some times can’t deal with the load on their services or don’t want their site being indexed, so Google should not index them. So they do abide by /robots.txt

You have to remember, their is no use in Google indexing a site which does not want to be indexed, assuming these people know their site should not be indexed. If it really is content which should never show up in Google search results then why should Google index it ?

2024-08-21 9:41 am

kurkosdr
Problem is, it’s not Google or Google Search’s algorithm making that decision (that a given site’s content should never show up in Google search results), it’s the site owner. Which up until now was mutually beneficial, but now with AI scrapping some website owners will start treating the robots.txt file as a voluntary DRM system. That’s why I am perplexed as to why Google’s AI scrapping even cares about the robots.txt file. It shouldn’t.

2024-08-21 10:08 am

dsmogor
I wouldn’t be so sure about voluntary. The AI web scrapping / content reuse is still a fluid issue in legal terms while robots.txt is a clear declaration of will. Google knows that and they don’t want to be sued out of existence in couple of years when there’s a legal precedent set.

2024-08-21 10:00 am

dsmogor
This is 100% DOJ / EC investigation material.
EC have managed to speed up it’s reaction time (up from 10 years) so I think it would make sense to notify the closest known EP member about the issue. After the last fine it I think some warning should suffice.