Torvalds thinks “AI” is 90% marketing, and Google claims 25% of its code is “AI”-generated

Thom Holwerda 2024-11-01 Google, Linux 29 Comments

Torvalds said that the current state of AI technology is 90 percent marketing and 10 percent factual reality. The developer, who won Finland’s Millennium Technology Prize for the creation of the Linux kernel, was interviewed during the Open Source Summit held in Vienna, where he had the chance to talk about both the open-source world and the latest technology trends.
↫ Alfonso Maruccia at Techspot

Well, he’s not wrong. “AI” definitely feels like a bubble at the moment, and while there’s probably eventually going to be useful implementations people might actually want to actively use to produce quality content, most “AI” features today produce a stream of obviously fake diarrhea full of malformed hands, lies, and misinformation. Maybe we’ll eventually work out these serious kinks, but for now, it’s mostly just a gimmick providing us with an endless source of memes. Which is fun, but not exactly what we’re being sold, and not something worth destroying the planet for even faster.

Meanwhile, Google is going utterly bananas with its use of “AI” inside the company, with Sundar Pichai claiming 25% of code inside Google is now “AI”-generated.
↫ Sundar Pichai

We’re also using AI internally to improve our coding processes, which is boosting productivity and efficiency. Today, more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers. This helps our engineers do more and move faster.

So much here feels wrong. First, who wants to bet those engineers care a whole lot less about the generated code than they do about code they write themselves? Second, who wants to bet that generated code is entirely undocumented? Third, who wants to bet what the additional costs will be a few years from now when the next batch of engineers tries to make sense of that undocumented generated code? Sure, Google might save a bit on engineers’ salaries now, but how much extra will they have to spend to unspaghettify that diarrhea code in the future?

It will be very interesting to keep an eye on this, and check back in, say, five years, and hear from the Google engineers of the future how much of their time is spent fixing undocumented “AI”-generated code. I can’t wait.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

29 Comments

2024-11-01 5:54 am
r_a_trip
**It will be very interesting to keep an eye on this, and check back in, say, five years, and hear from the Google engineers of the future how much of their time is spent fixing undocumented “AI”-generated code. I can’t wait.**
AI is like violence, if it doesn’t solve your problem, you’re not using enough of it. Google will just put the AI generated code through the AI to get documented. If that becomes human readable remains to be seen, but if not, we can always let the AI handle the code AND the documentation. As long as it runs right?

2024-11-05 12:46 am
cdxx
The AI code is the simplest code and doesn’t really need much documentation. The harder 75% is human written and human documented

2024-11-01 6:37 am
sphinxios
An issue with AI is that it assumes everything that is not explicitly asked for and even then you can’t be sure that you got was the one you that thought asked.
So correcting a script/programpart that wasnt what one asked for is going to be very fun to fix/rewrite.
I have tried AI to assist me with simple linux scripting and at none of those occasions did i get something close to what i asked for, even though i tried to be extremely clear on what I was asking.
Since then i stopped using AI realizing it is very stupid/bad since it doesn’t answer without assuming way too much.
2024-11-01 7:38 am
Enturbulated
Eh …. there have been great results with neural nets trained to perform specific tasks, depending on the tasks, but generative AI is currently rather problematic. We’re seeing incremental improvements in image gen and LLMs, but there’s a ways to go to properly take care of the biggest issues (hallucinations, being steerable to make statements in direct contradiction to their own knowledge, poorly implemented alignment training that hurts capabilities more than it helps with anything else, etc etc etc) and I’ve not yet seen any fundamentally different ideas being looked at. Instead, it’s been more of the same approaches that have lead to the current flawed outcomes. “Let’s go bigger! Train with more data! Multimodal!” Sure, all that may help, but we’re long past seeing corps scraping the bottom of the barrel for find more training data, not clear to me how multimodal helps in and of itself, there’s diminishing returns on going to larger nets, there’s tradeoffs with MoE models, and all of the above balloon resource requirements. Does seem like like some fundamental change in what’s being done will be needed for the next real jump in progress.
Would welcome anyone able to tell me if my impressions are off.

2024-11-01 7:54 pm
mbq
Literally all generative model output is hallucination — that was a name given to image recognition models run backwards, before it was biased towards usefulness and applied for different media. This is not only semantics because no contemporary ML has internal concept of truth/reality or approaching it, the correctness is always exclusively external.

2024-11-01 7:40 am
Lennie
I think it would be a good idea to have a better insight in what they are generating.
What could be for example they use it to generate stuff people find tedious, like unit test code, etc.

2024-11-01 9:50 am
cdxx
Yes exactly. i use AI assist in my coding, and the 25% is the easiest 25%. It’s like power steering in a car or spell check in a doc. It lets you focus on the important/hard parts of the task

2024-11-01 10:19 am
Alfman verbose=1
cdxx,
Yes exactly. i use AI assist in my coding, and the 25% is the easiest 25%. It’s like power steering in a car or spell check in a doc. It lets you focus on the important/hard parts of the task
I think many are critical and disappointed with AI for not having general intelligence, but that stems from a misunderstanding of what AI is in computer science terms. We can train models that are good at problem solving: image recognition, handwriting recognition, speech recognition, and now even generative models, etc. These are all classic AI problems, but it doesn’t rise to the level of “general intelligence”. This was never controversial before the public got involved. The public’s perception of AI seems to be skewed by Hollywood misinterpreting AI as AGI. AI applications, including generative AI are real & legitimate, the mistake is conflating it with AGI.

2024-11-01 10:52 am
CaptainN-
Part of the problem even with AGI is that when we get real machine intelligence, it won’t be artificial. The term “Artificial Intelligence” was meant to apply the word “artificial” to “intelligence” – as in, it’s artificial, not intelligent. In Hollywood though, AI is a set of highly intelligent robots that are going to steal your liver. The term has been completely debased in pop culture, but also, with the manager class.
LLMs are either barely AI, or the thing the term originally meant – which is just some algorithm that looks superficially like intelligence (the ghosts in Pac-Man) – let’s call it an artifice. An LLM is just a probabilistic derivative art token generator. It can’t reason, will never be able to reason, and absolutely cannot generate anything novel – ever. It simply takes some input, and then simulates what a comment might look like on Stack Overflow using a statistical model of all the answers that came before. That’s it. A very basic understanding of the algorithm would have stopped the hype cycle in its track, but then Sam Altman couldn’t be a paper billionaire. That’s not to say LLMs have no use – they are kind of okay at processing natural language…

2024-11-01 12:39 pm
Alfman verbose=1
CaptainN-
Part of the problem even with AGI is that when we get real machine intelligence, it won’t be artificial.
AGI merely means the intelligence was created artificially as opposed to intelligence that evolved in nature.
LLMs are either barely AI, or the thing the term originally meant – which is just some algorithm that looks superficially like intelligence
AI is intelligent, deep blue was intelligent, watson was intelligent, alpha go was intelligent….however these are all specialized within confined domains, which is the opposite of AGI. Obviously we can agree they have no will or self awareness, they can’t learn and grow themselves.
It can’t reason, will never be able to reason, and absolutely cannot generate anything novel – ever.
I agree that current LLMs don’t reason but rather infer. However I strongly disagree with you about not being able to generate novel output. AI can absolutely do this and I’d suggest the difficulty is not generating novel output but instead generating meaningful output. But this is exactly what makes reinforcement learning so cool.
Take for example an AI that learns to play a game. It can absolutely uncover and perfect new strategies that it was never exposed to.
https://www.youtube.com/watch?v=CI3FRsSAa_U
I would put forward the notion that a random NN can generate novel output (though obviously unskilled). I also put forward the notion that through reinforcement learning, an unskilled NN can become more proficient. The fact of whether or not a human has ever done something before is irrelevant to the AI’s capability to generating meaningful output because meaningful output is an emergent byproduct of the reinforcement learning algorithm. The ability for AI to create novel output doesn’t just apply to video game strategies, but art, writing, music also.
Just as in the gameplay training scenario, the random NN is already blessed with novelty by default, the purpose of training is to make output less random and more meaningful. In order to trigger the pattern receptors in our brains, a generative AI has to be trained on the same types of patterns than trained our own brains. Sometimes NN can have a problem with over fitting the data (in which case the output is a virtual copy), but when it’s done well the NN becomes generalized. These LLMs do generate novel output within the constraints of the fitness function. The dilemma is that novelty and recognizably stand at opposite sides of the spectrum. When we deliberately train AI to promote familiar patterns and kill off unrecognizable ones, that’s the output we get, but that’s not an intrinsic limitation of the AI.
2024-11-01 4:06 pm
CaptainN-
I said LLMs can’t reason or produce anything novel – other kinds of “AI” can and do all the time.
I disagree with your definition of AI – though I accept that the common understanding of the term is closer to that. To me, a distinction between intelligence running on biological machines like brains, and running on silicon and created by man does not matter.The machine doesn’t matter. The creator (evolution or man or god) doesn’t matter. What matters is whether the intelligence is real. Machine intelligence is a better term, IMHO. it’s not artificially intelligent, it’s actually intelligent.
2024-11-01 5:22 pm
Alfman verbose=1
CaptainN-,
I said LLMs can’t reason or produce anything novel – other kinds of “AI” can and do all the time.
The justification for novelty shouldn’t be self fulfilling and what you are describing lies dangerously close to this line. In order to make a scientifically fair determination of what qualifies as novel, the only thing that should matter is the work itself. Judges should blindly asses novelty without knowledge of “who” created a work since that would taint the results (ie “oh look it’s an LLM, which can’t create anything novel”).
For example, if an employee at a copyright office has to ask whether an LLM was used in order to determine whether a work qualifies for copyright as a novel work, then I’d assert it as a double standard since LLM works are subjected to a higher bar for novelty than human works, which is unfair. If we use a fair test for novelty, then I believe that LLMs could indeed pass such a test in the real world.
The machine doesn’t matter. The creator (evolution or man or god) doesn’t matter. What matters is whether the intelligence is real. Machine intelligence is a better term, IMHO. it’s not artificially intelligent, it’s actually intelligent.
Whether it matters that it was created artificially or not is a fair point. Just to be clear though most of us use the term AI to broadly mean intelligence that was created artificially with no insinuation about the quality of said intelligence. I understand that you’d like to interpret it differently, but I don’t think that interpretation is going to catch on. People are going to continue to refer to artificially created intelligence as AI.
2024-11-03 9:16 am
CaptainN-
I’m fine with making the blanket statement that an LLM can only produce derivative art, and banning it it outright from new copyright claims. Yes please!
2024-11-03 11:46 am
Alfman verbose=1
CaptainN-,
I’m fine with making the blanket statement that an LLM can only produce derivative art, and banning it it outright from new copyright claims. Yes please!
Yes, but so is 100% of art in all of history because that’s just how our neurons work. From a young age our brains are formed from the world around us and our art derives from all that input. We share this property with LLMs.
What you propose is a double standard. Besides being hypocritical, the problem is that if you tried to pass a law banning LLMs, they won’t actually go away. Data centers will just get built in other countries. And if you try to criminalize LLM users, this won’t stop LLMs either, they’ll be pushed underground and pass generated work as their own.
There’s a reason “lady justice is blind” and I don’t believe in having one law for AI and another for humans.
https://unveilthehidden.blogspot.com/2019/01/reason-for-blind-folded-of-lady-of.html
2024-11-04 10:13 am
CaptainN-
It’s not remotely the same. Human neurons, and neuronets reference the whole of their experience, and the limits of their biologically derived imperatives to produce thought. LLMs don’t think. They don’t even mimic neurological structures (especially not on the output side). They just predict the next token based on heuristics, and a vast library of prior art. They simulate an existing answer – exclusively. Human brains do much more than that. That you keep comparing LLMs to neuronets means that you either don’t know how the technology works, or you have an agenda. NOTHING they produce should be copyrightable. Period.
2024-11-04 12:21 pm
Alfman verbose=1
CaptainN-,
It’s not remotely the same. Human neurons, and neuronets reference the whole of their experience, and the limits of their biologically derived imperatives to produce thought. LLMs don’t think.
Actually, the similarities are far closer than you might initially realize. To this end, it’s absolutely fascinating to study split brain patients. Experiments reveal how our brain’s neurons create mechanized responses to stimuli that don’t necessarily reflect reality. They’re just extrapolating from limited data to make convincing assertions that are nevertheless fake.
https://youtu.be/_TYuTid9a6k?t=911
So in one experiment a patient was shown a chicken claw to his left hemisphere and a snow scene to his right. He was then asked to choose a picture that related to what he was shown so his right hand pointed to a picture of a chicken, which made sense because the guy saw a chicken foot. But the left hand pointed to a shovel, which makes sense because his original picture was a snow scene, but he didn’t consciously see the snow scene so when they asked him why he selected that shovel he said “oh, that’s simple. The chicken claw goes with the chicken and you need a shovel to clean out the chicken shed.” The left hemisphere, without hesitation came up with a justification for what the right hemisphere chose. It didn’t say “I don’t know”, which would have been the correct answer, it didn’t say “oh maybe it’s this I’m just guessing”, no it confidently declared chicken poop. And this was a result that repeated itself over and over and over again.
…
In every one of these examples the interpreter module was wrong. Often, confidently wrong. It got limited amount of information and assumed it knew everything about it. Might as well call it the Dunning Kruger module. And yet, this module is kind of running the brain, or at least running our thoughts. Further testing revealed that the interpreter module is just one of multiple modules in the brain, all of them kind of black boxes. Like we don’t really know how they make their choices, but they make choices, and then the interpreter module justifies it.
These split brain patients reveal just how strikingly similar the “interpreter module” in the human brain is to an LLM.
They just predict the next token based on heuristics, and a vast library of prior art. They simulate an existing answer – exclusively. Human brains do much more than that.
I don’t understand why people have such a problem with algorithms “predicting the next token”. That just seems biased. An algorithm that successfully predicts the output of another algorithm can be said to be mathematically equivalent to the other algorithm.
I’m not claiming that LLMs today are as capable as humans, but I do wish more people would understand the principal of mathematical equivalence. Any algorithm, including a human brain, could be translated into a predictive model that is mathematically identical. Ergo, the use of a predictive model is not in and of itself a legitimate reason to be dismissive of LLM.
That you keep comparing LLMs to neuronets means that you either don’t know how the technology works, or you have an agenda. NOTHING they produce should be copyrightable. Period.
You’re entitled to your view, but it’s just hypocritical is all I’m saying since human art is clearly ALSO derivative. It does sound like you are ok with a double standard, one for machine and another for humans, but…honestly that doesn’t seem to align with something else you said earlier: “To me, a distinction between intelligence running on biological machines like brains, and running on silicon and created by man does not matter.” I understand that you don’t feel machines are intelligent, which may be how you justify this discrepancy, However as someone who believes intelligence is emergent, I still don’t think I can agree with you.
2024-11-04 2:05 pm
CaptainN-
That’s another cute trick – you keep broadening the strawman you are arguing against to include all “AI” or anything generated by a machine – I’m talking about LLMs, not machines in general. LLMs are a specific thing. LLM is not a pseudonym for AI in any general way… Does your paycheck rely on pounding this LLM bible?
2024-11-04 3:12 pm
Alfman verbose=1
CaptainN-,
That’s another cute trick – you keep broadening the strawman you are arguing against to include all “AI” or anything generated by a machine – I’m talking about LLMs, not machines in general. LLMs are a specific thing.
I believe the only way to compare the technology fairly is to treat all AI as a black box. Yes this absolutely includes LLM too.
LLM is not a pseudonym for AI in any general way…
I never said or implied it was, however it doesn’t (or at least shouldn’t) matter if the model is an LLM or not when the technology is treated as a black box like I am saying. The failure to treat it as a black box elevates pre-conceptual opinions at the expense of fairness. This is the whole point behind lady justice being blind.
Does your paycheck rely on pounding this LLM bible?
Assuming a genuine question: the answer is no. Quite the opposite actually, I worry many of us could become redundant in the coming years 🙁

2024-11-01 8:45 am
Shiunbird
It is a well-known fact that the “human in the loop” tends to see their abilities atrophy and be more likely to approve (or just rubberstamp) pre-generated data rather than thoroughly review it. There’s also the fatigue factor.
It’s the case with medical work, systems monitoring and engineering; and I see no reason why software engineers would fail where others have not.
2024-11-01 9:44 am
cdxx
If you text with autocorrect then 25% of your texts are AI generated too. It’s not that sinister
2024-11-01 10:43 am
CaptainN-
On one hand, it’s almost impossible to believe that Google number, 25% of code – yeah, okay. I’ll take 2 of those bridges too. I suspect they asked their managers for that figure, not the people who actually write code, or they were like, did you have co-pilot enabled? All your code counts!
On the other hand, it would explain the rapid enshitification at Google….

2024-11-05 12:50 am
cdxx
What fraction of your text autocorrect comes from AI vs what fraction do you literally type yourself? I’d wager it’s something not far from 25%/75%. AI coding isn’t that different. It autocompletes function names and arguments, and suggests log messages. I think it’s genuinely helpful and time saving, giving more time for concentrating on the actual important code

2024-11-01 7:41 pm
dsmogor
I think AI will change role of code and its lifecycle. Code re-use will no longer be a priority and a lot of code and subsystems will be constantly rewritten instead of modified. The key to make it work that is even more important than code generation is test generation, automated requirements analysis and validation.
I share Thom sentiment, it’s not only readability of code that will be a problem but the sheer amount of it, because it’s so cheap to generate. At some point it will no be accessible to human beings.
Providing a clear and concise specifications is a challenge for most intelligent human beings .Words have frequently very specific and context dependent meanings that are rarely explicitly stated. I’m not sure LLMs in their current generation would be capable of capturing those nuances, especially that there’s not nearly enough text data from specific project (and broader company) context and the data that exists is often obsolete and self contradictory. Making sense of it is often interactive, iterative and error prone process. Maybe future AI agents will be able to capture entire flow of information exchange (emails, documents, meetings) correctly tell what is relevant from noise and confusion (hard for even seasoned BAs) and reason without hallucinations but we’re not there yet.
2024-11-02 12:08 am
dark2
The biggest problem with AI is even if the studies and math are already around saying it’s less efficient than not using it, it will be like full self driving cars. Always supposedly on the horizon and getting millions spent on it for over a decade, while those of us that understand the underlying tech just know it isn’t possible.

2024-11-02 1:14 am
Alfman verbose=1
dark2,
The biggest problem with AI is even if the studies and math are already around saying it’s less efficient than not using it, it will be like full self driving cars. Always supposedly on the horizon and getting millions spent on it for over a decade, while those of us that understand the underlying tech just know it isn’t possible.
There’s not doubt a gold rush bubble of sorts, but to be fair reinforcement learning is making great inroads at training machines to solve problems that are traditionally very difficult to solve using conventional programming.

2024-11-02 5:48 am
NaGERST
At this point in time AI can not even write proper documentation on what it’s code does. I am not letting AI near my production code base.

2024-11-02 1:14 pm
Alfman verbose=1
NaGERST,
At this point in time AI can not even write proper documentation on what it’s code does. I am not letting AI near my production code base.
Most coding LLMs are being trained to do the very opposite of that: code from descriptions. However now that you mention it I actually think documentation is a fantastic and appropriate use of LLM. Instead of just writing code, we could have LLMs that are specially trained and specialized on documenting code. A lot of human developers do this job poorly. Auto documentation would be an awesome IDE feature and a good time saver.
2024-11-03 11:17 am
Iapx432
I had an LLM write some code for fun. Its description of what it intended to wrote was correct The actual code it wrote was not.
2024-11-04 10:16 am
CaptainN-
It makes me so happy that so much of my competition is betting their entire business, and their personal futures on LLMs. Good luck! LOLOL