In August, word leaked out that The New York Times was considering joining the growing legion of creators that are suing AI companies for misappropriating their content. The Times had reportedly been negotiating with OpenAI regarding the potential to license its material, but those talks had not gone smoothly. So, eight months after the company was reportedly considering suing, the suit has now been filed.
The Times is targeting various companies under the OpenAI umbrella, as well as Microsoft, an OpenAI partner that both uses it to power its Copilot service and helped provide the infrastructure for training the GPT Large Language Model. But the suit goes well beyond the use of copyrighted material in training, alleging that OpenAI-powered software will happily circumvent the Times’ paywall and ascribe hallucinated misinformation to the Times.
↫ John Timmer at Ars Technica
OpenAI and similar companies are giant copyright infringement machines, and tools like GitHub Copilot are open source license violations at an industrial scale never before seen. They need to face a reckoning for their illegal behaviour, and need to start asking creators – of journalism, of art, of code – for permission to use their works, just like anybody else needs to do.
“AI” needs to play by the rules, or get steamrolled by the justice system.
That’s a very simplistic take, Thom. To me it sounds like you are saying that AIs should not be allowed to read, because they might remember what they read.
Sure you may put the current crop of AIs in quotes for some reason, but surely, even for you there must be a level of capability and sophistication at which “AI” becomes just AI. If not current ones, those future AIs will become persons, and how do you propose copyright law to be as to allow them to see, listen and read, given that they will be able to reproduce all of it?
We need something more sophisticated than to pretend the technology is not improving exponentially.
drstorm,
I agree, this is a very nuanced topic. I feel there’s a lot of bias against computers doing what human have been doing forever. If a human reads and article, remembers the details, takes notes on it, etc, then writes a new article based in whole or in part on the original article, this is allowed by copyright law. Yet now that AI has reached a level of sophistication that enables the process to be mechanized by computer, people are up in arms. The thing is, I understand their frustration, it takes very little work for AI to take everything you’ve done and reword it as a new article. The original authors are going to be pissed off and to the extend they didn’t also rip off someone else’s work that makes sense. But banning AI from doing the same thing that humans have done forever is hard to justify without implicitly accepting legal bias that creates one set of rules for AI and other set for humans.
I’m the first to admit we’re not objectively unbiased. But if we wanted to be, we’d have to get much stricter about human reporters and creators as well. Almost the entire Disney catalogue would be guilty of misappropriation, for example. Should the fact this was done by humans instead of computers make a legal difference? I would say that it should not.
Indeed, people are trying to fight it through regulation. But I don’t think regulators will be able to put the cat back in the bag. At best, it may influence where AI facilities get built. It will be virtually impossible to control access though without drastic investments in national firewalls, and I don’t think AI’s critics would be ready to “embrace” that.
And then there’s another nuance as well… NYTimes is suing under the US constitution, which imposes a very clear conditional for intellectual property. Here’s Article I Section 8 | Clause 8:
(emphasis added).
What happens if NYT’s intellectual property rights hinder, rather than promote the progress of science and useful arts?
Well said sir. Of course, GPT would have read what you said and wrote it five times better in 5 seconds. 😛
I think Thom have same approach to « ai » as i have. At the moment they are fancy statistical models mimicking human text. They lack awareness of spatiotemporal reality and are incapable of original creations. I also think calling these statistical dictionaries intelligent is marketing bullshit useful to lure investors into a fad that will not help humanity much. And as energy get scarcer, we will find better use for our precious electricity and computer power to really useful tasks. My 2¢.
Replying to every point you made would take a wall of text, so I’m not gonna try. Would be nice to talk it over drinks or something, but here, what I can say is that I doubt you spent much time talking to GPT-4.
Essentially, these things are intelligent in a sense that in order to correctly predict the next word, which is what LLMs do, they need to build an internal model of the world. One way to think about is that they compress the internet into a compact format that is bound to extract knowledge from it in order to fit.
This is not too much unlike how humans absorb information. Ilya Sutskever put it well by saying that these are digital brains inside large computers.
Unfortunately we do not have a complete model of the humain brain. I dont remember the name of the researcher but there are test to ai models that can be performed to verify if they are very advanced parrots or aware of the world. As per those phd’s researchers current systems are sophisticated parrots not capable of true creativity. This is where i draw the line to call something intelligent.
I am very much interested in that paper.
As for what constitutes true intelligence, while interesting from a philosophical point of you, it is practically irrelevant. In fact, I suggested talking about it to GPT-4. It will give you very reasonable and nuanced perspective on the matter. It is also likely to be aware of the paper you are referring to.
It has been a while since i watched this lecture, from memory it was a psychological assessment of gpt-4 but it could have been from a philosophy department too. Fundamentally, it was questions built especially to test the edge cases of the ai models. Sorry my memory failed me here. Will post a link as soon as i find it, very interesting.
https://www.osnews.com/wp-login.php?redirect_to=https%3A%2F%2Fwww.osnews.com%2Fstory%2F138141%2Fny-times-copyright-suit-wants-openai-to-delete-all-gpt-instances%2F
I believe this is it, I hope you understand French.
https://www.youtube.com/watch?v=yuDBSbng_8o&t=248s
Sorry, it is late here… going to bed I promise!
YouTube AI can handle the French:
People also ask:
Is there a way to translate YouTube Subtitles?
Changing the language of subtitles on a YouTube video is very simple: Step 1: On the video you’re watching, click the ‘gear icon’ on the toolbar at the bottom of the video. Step 2: Click ‘Subtitles/CC’ and select the language you want to see.
Hopefully it won’t get sued.
gagol2,
In principal, I don’t take issue with such tests, which I’ll call Turing tests. But the thing is people keep wanting to change the tests every time a computer passes them. essentially moving the goalposts. These static models are not yet “general AI”, but they clearly are unambiguously “intelligent” by earlier standards. Today’s AI has already passed the bar exam, for example. While these AI models can make mistakes, they still score better than average humans can even though people aren’t comfortable letting those results speak for themselves.
This is just a statistical language model, once somebody combines one of these models with computational programs like mathematica, then it’s going to be more intelligent than most of the human race with extremely sophisticated problems. Their main deficiency might prove to be that AI models are trained on artificial static datasets instead of experiencing life as it exists in the real world, but this is an obvious step forward and will likely be changing in the medium term, it’s just a matter of time.
For the record, these are very sophisticated systems and i am very impressed by them. They are capable of summarizing human knowledge from their model and that is impressive. But intelligence, in my book, require original ideation and creativity from outside its internal knowledge.
Alpha Fold fold proteins, Zero invents it’s own go moves, I don’t remember which AI improved state of the art sort implementation. GPTs can write poetry and work out original problems, like e.g. working out orbital velocities under ground, which is what I asked. Being so ridiculous, it is unlikely that GPT saw it before, hance the generalization.
No one is saying these things are conscious or anything like that, but I would make a strong case for intelligence.
Defining intelligence is splitting hairs at this point. As much as i am intrigued by the future it opens, i am very worried to call these systems intelligent as people tend to believed it can solve complex problems we cannot and this is dangerous. Thank you for the mental sparring ☺️
gagol2,
My pleasure. 🙂
drstorm,
+1 Exactly!
This differentiation is very important. It is fair to say that AI that is able to train itself to beat humans at chess just by knowing the rules is “intelligent”, but this is not to say it is conscious or has free will.
These topics, while interesting to ponder, are a completely different beast. The concept of “life” itself is also interesting, can an artificial replicator be considered alive? Lots of biological replicators are merely following their programming and not objectively intelligent.
gagol2,
The thing you should ask yourself is whether we, as humans, are that original and creative. Originality isn’t hard, mathematically speaking. You could output random shapes & patterns and be 100% original. Random noise isn’t that interesting to the human brain. What we value in art isn’t simply “originality”, but originality combined with recognizability, the later being very important in triggering our brain’s pattern recognition neurons! Yet this very concept requires the “creativity” that we care about to be derivative in nature, which almost seems like an oxymoron. Nevertheless what makes creative art special is that it takes something recognizable and paints it in a novel way. This is something that artistic AI tools are already extremely proficient at IMHO.
They don’t really build models of the world, they build models of language, and because humans use language to describe and relate to the world, it happens that a statistical model of language also often is a decent approximation of a model of the world. The LLM itself isn’t aware of anything, it just has a database of how words are often strung together in text that in some ways shares characteristics with the context provided.
To use examples of these things passing various exams etc. is disingenuous at best, given their poor performance on zero-shot learning tasks in general, extra prompting in order to pass a lot of said papers, and the fact that a lot of the exam materials in many cases already exist in their training set.
Marvelous things can be done with a large statistical model of tokens, including responses that appear to be intelligent. I keep hearing the ‘just use GPT-4’ argument though, and the ‘internal world model’ argument, amongst others. I still see zero proof that these things are actively reasoning and generating intelligent output.
There are alternative points of view out there, in terms of what these models are actually capable of, and it seems to be silenced or ridiculed by the mainstream ‘AGI-is-imminent’ crowd a lot of the time. Maybe it is, maybe it isn’t, but more importantly, it is entirely appropriate to limit what these things are allowed to slurp up short to medium term, given the impact on the labour markets and human quality of life – given how our capitalist society is currently structured, there is no way people don’t get exploited by the usage of these tools if this keeps going.
The ‘how can you stop it doing what humans are allowed to do’ argument is also disingenuous in my opinion, primarily for this reason.
Finally – proponents of this technology (and believe it or not, I try to keep an open mind) need to face up to the fact that the legal system may be a serious obstacle. It is not a forgone conclusion that it will be adapted to allow this current ‘hoovering up’ approach. It may well go the other way, given the powerful vested interests on the side of existing copyright/IP protections.
PhilPotter,
“Inadequate” might have been a better word here. I don’t find it “disingenuous” at all to test AI using the same tests we use to test ourselves.
Well, to be fair chat GPT is nothing more than a statistical model. But still, that it is a decent approximation for human proficiency in so many domains can say a lot about our own intelligence. This is one of the points that sometimes bugs me from the computers aren’t intelligent group: there’s a tendency to judge machines more strictly while giving humans a benefit of doubt. We have to go back to the basic principals of the Turning test. The criteria for passing an intelligence test should not be what we know about a black box but rather the intelligent qualities of it’s output when we don’t know whether the entity on the other end is a computer or not.
You’re not wrong about the disruption to human labor, although I do think “limits” will be futile.
So to be clear about this, you are advocating for the same action to be both legal and illegal based on whether the action is performed by a computer? That’s certainly something to think about, but it introduces new legal challenges. Like what if the court isn’t able to make a clear determination of whether a job was mechanized versus done by a human? Those who outsource their jobs to machines will start lying about it, especially if the legal system disincentivizes honesty. Furthermore it may be problematic that non-equal access to AI could actually end up harming the competitiveness of the workers covered by these laws.
I believe the “AI laws” will prove irrelevant and unenforceable. AI services be launched outside their reach while users tunnel in remotely using VPN.
Yes, I think some sort of limit on their usage, certainly in the creative fields, is not necessarily a bad idea. You raise interesting points about how it would be enforced etc. but at an industrial level, something along these lines (i.e. an enforced code of practice) is not only possible but quite likely. Would it kill AI dead? No, likely not. But would limit its usefulness for sure.
I respect your opinion of course, and appreciate how nuanced and well-argued your position is. That said, I disagree that legal interventions will not make a difference. It remains to be seen whether they will have a positive effect or not. Ultimately, we are talking about two ‘groups’ for want of a better phrase, both with large pockets. Disney, the NYT etc. winning this isn’t necessarily a good thing either. It would certainly make a difference to the usage of these tools though.
PhilPotter,
Short of some kind of unprecedented world-wide enforcement, I don’t think it would make much of a dent even at industrial levels. Laws like this would simply move the industry into new unregulated jurisdictions where it would be accessible over the internet. There are going to be countries willing to embrace the AI that is banned elsewhere and get rich doing so (similar to how crypto datacenters migrate to regions with low costs of operation).
Yes, I’m glad we can stay respectful even if we disagree 🙂
Yeah, we’ll see what happens. I have very mixed feelings about the role that technology should play, especially as it displaces more intellectual jobs. I’m not sure if this qualifies me as a sort of “Luddite”. I think every generation faces some resistance towards new technology and been fearful of what it means for their jobs. My concerns might just be an echo of this, but this somehow feels different to me. All the previous automation of physical jobs still left a door open to intellectual/creative jobs, but now even these could be at risk for mass automation. This disruption of jobs and the need for significantly less labor may be hugely consequential to the middle class under capitalism.
I don’t think you’re a ‘Luddite’. You have legitimate concerns, and frankly, I share a lot of them. I guess where we differ is in our view of how far this ultimately goes. I’m still erring on the side of ‘perhaps it won’t be so bad’. The mass unemployment envisioned still requires new tech which in my view is decades away at best in my view.
The current approach, whilst it is undoubtedly leading to abuse and job losses in certain sectors, can only go so far. If anything – the companies/tech bros taking the piss in the way they have done, and hyping this tech too far, will lead to damage in and of itself, when the market ultimately readjusts. Another AI winter is even possible, although I imagine something less extreme is likely.
I might go in reverse order
I would say the genie is already out of the bottle, and barring a massive international crackdown, including rival superpowers like China, it is almost impossible. There are already very good open source models you can download on your machine and run locally (Mixtral 8x7b comes to mind). They would need to go door by door, and delete that model from individual users, and ban all use of modern video cards.
Have not idea what you mean here.
Are you hinting on AI potentially replacing unproductive labor?
“Emergent” properties of LLMs are still a research topic. Depending on how you look at it, there is reasoning capability, especially with one shot prompting, however it is still below, significantly below, human capabilities:
https://arxiv.org/abs/2311.09247
So, is there reasoning?
Yes
Is it adequate?
No
One shot is not something foreign to us. When we do a test, say a math one, English, or an IQ test, we usually also get instructions or sometimes examples.
“Here is a short paragraph, the author will discuss a story about their recent vacation. Use it to answer the next three questions” would be a very common pattern for us, and using that for LLMs is probably not “cheating”.
Bottom line LLMs are not the “AGI”, at least not yet. However they also possess significant mental ability especially for a relatively tiny sized model (compared to human brain capacity).
So I don’t disagree in that sense – getting rid of this is not likely, nor potentially even desirable. The point of open source locally running LLMs you bring up is indeed a counter balance to huge proprietary gate keepers like Google/OpenAI. The question of whether the worst excesses/impacts on the labour markets (or indeed certain segments thereof) can be limited with legislation is still very much an open one though. It is not outside the realms of possibility that something can be done here. I guess the next few years, or business realities, will end up showing us what happens one way or the other.
On the point you weren’t sure about what I was saying, I was trying to point out it is (in my view) totally OK to conceptually say computers and people should have different restrictions/limitations, even if both are capable of a given task or set of tasks. Saying that limiting an LLM from doing what a human can do is ‘unfair’ is both wrong (given it’s a computer system, not a person) and missing the point potentially, in that the scale involved is what makes this an entirely different problem.
Finally, I guess it comes down to what we define as ‘reasoning’. People often get accused of moving the goalposts in relation to this sort of discussion. I promise you I’m not trying to do that. What I will say though is what we generally consider as abstract reasoning skills are not being demonstrated by these models, as of now. For example: https://arxiv.org/pdf/2305.19555.pdf
There are plenty of arxiv papers that go either way. I would argue they posses no mental ability whatsoever in their current form though. We are simply impressed by what large-level statistical correlations on such a huge set of tokens in such a complex mathematical model can produce. Is this what reasoning ultimately is? I would argue not, but as you say, the debate is ongoing.
Another one on the issue of emergent properties (or potential lack thereof): https://arxiv.org/abs/2304.15004
Anyway, thank you for taking the time to reply, I appreciate it.
Claiming AI training is a copyright infringement is just plain daft.
Whats the difference between a person who reads a lot of books and websites on a topic and uses that knowledge to write an article and a computer system doing the same?
Copyright is about copying something created by someone else and then trying to make money selling the copy. AI systems don’t simply reproduce existing material they are way more complex and clever than that.
The NY Times claims that AI simply copies but can only get the AI to actually do that by clever prompt engineering specifically done in order to win legal damages. It’s just grift.
This legal case is like the legal actions by newspapers demanding payments from Google’s because it offers links to their articles in a search result. This legal action against AI, like those Google search legal actions, are just attempts by old media whose earnings have been squeezed by technical changes to be given subsidies by companies who are making money by being technical innovators.
Imagine Gutenberg being sued because all he was doing was reproducing stuff that was previously handwritten in manuscripts.