On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site’s content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.
Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.
↫ Benj Edwards at Ars Technica
Properly paying for the content you’re feeding into your “AI” model is a huge improvement over just taking it without users’ consent, but it does add yet another area of concern for users of all kinds of platforms. Whatever you write, create, or post might be fed into “AI” models without you ever realising it, and while the platform you use gets paid for that, you aren’t.
In any event, OSNews is not selling your comments to an “AI” company, but with how old we are, there’s no doubt both your comments and our stories have already found their way into countless “AI” black holes.
Well…
This is not good. Not good at all…
I am all for using machine learning for improving your services. And I would advocate for “respectful” use of the “user generated content”. After all, these websites are not the owners of the data (even if they claim to be), but rather custodians.
So, improve your algorithms, give better recommendations, supplement user satisfaction.
But never, ever, sell my data.
The moment you sign up to reddit/facebook/platform xyz you hand over ownership of all content to said platform as per the agreement you didn’t read and ticked the little box for.
While I doubt OSNews has something like that, I honestly couldn’t tell you as I signed up Sooooo long ago but 99.9% do and Reddit most certainly does.
If you don’t want to hand over your content, don’t post to online platforms you don’t control.
Adurbe,
A lot of those terms and conditions are so sleazy that you can read them fully and still have no idea what they’re doing with your data. Such legalese enables companies to evade lawsuits, but it’s not “true consent” by any meaningful standard. At least the EU’s digital market’s act requires providers to be somewhat more explicit and honest.
We also have the weird situation when they own it, and can monetise it, but are in no way responsible for the content itself, which you are still liable for as an individual.
This gets even more confusing when they can move that same content into different jurisdictions and my legal (if maybe unwise) comment can Become illegal.
sukru,
I apologize in advance if this is too personal, but I can’t help but ask how someone can agree to work for google if this were your opinion? While google buys the data to mine it and doesn’t sell it directly, it would be odd to scold the selling side of a transaction and not scold buying side of the same transaction. Google, likely more so than any other company in existence, vacuums up private data from hidden trackers across the internet, have used their platforms to illegally track user locations without consent. Worse still, many of those users explicitly opted out of being tracked. Google pays other companies to acquire private data, etc.
https://www.technologyreview.com/2017/05/25/242717/google-now-tracks-your-credit-card-purchases-and-connects-them-to-its-online-profile-of-you/
I think it fell through but they tried to get into the game of peddling medical data. The amount of private 3rd party data they amass to build user profiles, often without consumer knowledge, is staggering. Do you condone google doing that? If so, I’m really confused by your position.
Alfman,
And that is the point. Google does not sell user data. (Or at least did not the last time I was there). But rather uses that data to improve products, including targeted ads.
They go out of their way to implement things like “private join and compute” do improve privacy in associating marketing data joined with financials without leaking any individual information:
https://github.com/google/private-join-and-compute
And if talking personal, I was refusing to use some Google services before I joined with them and saw the code and practices that protect my data.
Since 2009 when Chinese state hacked Google, there was been only one security incident that was actually caused by Google, but not a third party: https://firewalltimes.com/google-data-breach-timeline/
(All others were user issues, like sharing their passwords, or giving malicious apps access).
sukru,
So you are criticizing the establishment of user data markets to improve AI product, but google has been doing the same thing and has your blessing? Do you understand why I’d see this as a bit two sided?
Data breaches are one thing, but honestly I’m more concerned with google’s normal operating practices regarding data that they don’t obtain explicit user consent to use. I know not everyone cares about it, but I find the mobile market especially troubling because the duopoly has effectively 100% of the market and avoiding these giant’s products is sometimes futile.
Alfman,
(Barring some extreme events)
One thing I can be sure today is that Google is not selling my data. They could even be the mystery buyer in this transaction on the other hand, but will never willingly part their ways with user data.
They will of course monetize the heck of it, optimize YouTube recommendation algorithms, or show very desirable ads. They will also do other shenanigans, like shut down perfectly good products, lay people off for wall street bonuses, or even reduce the quality of Search.
But still one thing they (currently) don’t do is sharing the data willingly.
(Compare that to say mobile operators like T-Mobile who would give up your location to anyone who pays $300, including bounty hunters: https://www.theverge.com/2019/1/8/18174024/att-sprint-t-mobile-scandal-phone-location-tracking-black-market-bounty-hunters-privacy-securus)
sukru,
You voiced your opposition to selling your data so strongly that I don’t understand why you have a soft spot for google buying it. It takes two parties to participate in these data transactions and the buyers are obviously complicit in helping to fund these private data collection practices. I think we need to criticize both the sellers and the buyers on the basis that markets for our data shouldn’t exist without our explicit consent. Google have clearly been guilty of this.
What a funny AI that will be. Using as a source of knowledge things like am I the asshole?; toxic relationships or EntitledPeople.
Reddit seems the worst learning platform for AI. Full of inaccurate information that’s up voted (user validated) and almost completely unmoderated (reddit mods don’t actually moderate very much at all).
How so? It’s an expansion of copyright rights for the copyright holder over the absurd 95-year term they already enjoy. IMO any expansion of copyright rights should come at the rationalization of the copyright term to 25 years (as is the recommended minimum by UNESCO). If we want to rewrite copyright for the computer era, let’s do it fairly: the days copyrighted works had to be slowly reproduced by slow machines and copyright holders needed 50 or 70 years of copyright to break even are behind us. Otherwise, the existing copyright law stands as-is.
Anyway, my guess is the company figured out that it’s cheaper to buy the data than scape it and they may even be getting proprietary data (such as engagement data or whatever) not exposed publicly. That’s something I can get behind because it’s an actual service.
kurkosdr,
The copyright situation is all screwed up beyond the original constitutional goals. This is what happens when we allow corporations to influence laws. If anything, the corporate gripe on our laws has increased and they will only agree to make changes that increases their control even more. Here in the US we’re entering a new phase of autocracy where it’s not longer about right and wrong, but who pulls the strings. 🙁
The bright side is that the copyright term didn’t get bumped by another 20 years and Steamboat Willie did enter the public domain, so, for one time in their 100-year corporate existence, Disney heard the word “no”. I was surprised too.
kurkosdr,
Three cheers for congressional gridlock.
(sarcasm)
https://www.axios.com/2023/12/19/118-congress-bills-least-unproductive-chart
Thom Holderda
+1
I for one would like to see this OSNews AI and hear what it has to say 🙂
Funnily enough I asked Gemini to show me a comment from OSNews and it didn’t do it “over privacy concerns”. However if I asked in Spanish or in another language it did it.
And yet the authors of all the posts and comments aren’t getting a cut of this sale, so no the content is not being properly paid for.
The1stImmortal,
You are right, but at the same time from a practical point of view I think making all comments copyrightable could become an administrative nuisance with a bad precedent for public discourse and exchange of ideas. Once comment copyrights can be owned, ownership can be transferred through one sided terms and conditions. In a worst case scenario the actual author of a comment could technically infringe by posting their own comments elsewhere. I just think the cost/benefit here is not conducive to real social benefits.
They are copyrightable already. Copyright in anything substantial enough is automatic and goes to the author (or the author’s employer) but the ToS licenses it to the site with extremely broad rights to that copyrighted content.
I’d make the case that such broad sublicensing for profit shouldn’t be able to be done with just a click wrap license but sadly it currently can be.
I’d be curious as to whether this arrangement is legal under all older versions of the Reddit ToS under which posts and comments were made and where those users never accepted later versions. Think a person who posted years ago then never came back to the site, did the terms under which they posted cover selling the content to an AI firm?
The1stImmortal,
In theory maybe, in practice, nobody’s really monetizing comment ownership yet. AFAIK there aren’t any social media websites claiming to own your words in their TOS, merely the right to publish them. It would be awful if businesses start treating our comments as their property (regardless of whether the law allows it).
I’ve noticed a lot of terms have an update clause and you’re continued use of service automatically signifies agreement. Most of these one sided TOS are morally bankrupt but it’s up to the courts to decide whether they stand legally.
Indeed, that’s why I was curious about historical posts from inactive users. I doubt Reddit tracks ToS rights against posts at time of making, so might be a lawsuit there if old ToS were insufficient.
Update on the report.
And, yes it was Google that is paying for our content:
https://www.thedailybeast.com/google-will-pay-reddit-dollar60m-a-year-to-use-its-content-for-ai-report?via=twitter_page
(Might be worth using the correct language. It is not Reddit’s content, it is our content as the users).
Not good, not good at all.
What did we think would happen? 15 years of user generated data for free.
I overwrote all of my comments, but I’m sure there are a couple dozen accounts still out there that I have forgotten.
Lemmy has been great, but a little sparse.
OSNews would be great as an OS-focused Lemmy instance.