Did GitHub Copilot really increase my productivity?

Thom Holwerda 2024-05-08 General Development 41 Comments

Yuxuan Shui, the developer behind the X11 compositor picom (a fork of Compton) published a blog post detailing their experiences with using GitHub Copilot for a year.

I had free access to GitHub Copilot for about a year, I used it, got used to it, and slowly started to take it for granted, until one day it was taken away. I had to re-adapt to a life without Copilot, but it also gave me a chance to look back at how I used Copilot, and reflect – had Copilot actually been helpful to me?
Copilot definitely feels a little bit magical when it works. It’s like it plucked code straight from my brain and put it on the screen for me to accept. Without it, I find myself getting grumpy a lot more often when I need to write boilerplate code – “Ugh, Copilot would have done it for me!”, and now I have to type it all out myself. That being said, the answer to my question above is a very definite “no, I am more productive without it”. Let me explain.
↫ Yuxuan Shui

The two main reasons why Shui eventually realised Copilot was slowing them down were its unpredictability, and its slowness. It’s very difficult to understand when, exactly, Copilot will get things right, which is not a great thing to have to deal with when you’re writing code. They also found Copilot incredibly slow, with its suggestions often taking 2-3 seconds or longer to appear – much slower than the suggestions from the clangd language server they use.

Of course, everybody’s situation will be different, and I have a suspicion that if you’re writing code in incredibly popular languages, say, Python or JavaScript, you’re going to get more accurate and possibly faster suggestions from Copilot. As Shui notes, it probably also doesn’t help that they’re writing an independent X11 compositor, something very few people are doing, meaning Copilot hasn’t been trained on it, which in turn means the tool probably has no clue what’s going on when Shui is writing their code.

As an aside, my opinion on GitHub Copilot is clear – it’s quite possibly the largest case of copyright infringement in human history, and in its current incarnation it should not be allowed to continue to operate. As I wrote over a year ago:

If Microsoft or whoever else wants to train a coding “AI” or whatever, they should either be using code they own the copyright to, get explicit permission from the rightsholders for “AI” training use (difficult for code from larger projects), or properly comply with the terms of the licenses and automatically add the terms and copyright notices during autocomplete and/or properly apply copyleft to the newly generated code. Anything else is a massive copyright violation and a direct assault on open source.
Let me put it this way – the code to various versions of Windows has leaked numerous times. What if we train an “AI” on that leaked code and let everyone use it? Do you honestly think Microsoft would not sue you into the stone age?
↫ Thom Holwerda

It’s curious that as far as I know, Copilot has not been trained on Microsoft’s own closed-source code, say, to Windows or Office, while at the same time the company claims Copilot is not copyright infringement or a massive open source license violation machine. If what Copilot does is truly fair use, as Microsoft claims, why won’t Microsoft use its own closed-source code for training?

We all know the answer.

Deeply questionable legality aside, do any of you use Copilot? Has it had any material impact on your programming work? Is its use allowed by your employer, or do you only use it for personal projects at home?

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

41 Comments

2024-05-08 8:22 pm
ssokolow (Hey, OSNews, U2F/WebAuthn is broken on Firefox!)
The only “A.I.” tool I use is the copy of Stable Diffusion I pootle around with privately as a brainstorming aid. (Much as talking to a friend about writing ideas helps to break you out of being fixated on a single solution, seeing Stable Diffusion interpret what you wrote differently can also help that way… sort of like pair coders and linters are both useful, but neither is a substitute for the other.)
…and the reason it’s the only one I can use is that I can run it offline, where I don’t have to worry about someone deciding the trial period has gone on long enough and I need to pay for it now.

2024-05-08 9:05 pm
Alfman verbose=1
ssokolow,
…and the reason it’s the only one I can use is that I can run it offline, where I don’t have to worry about someone deciding the trial period has gone on long enough and I need to pay for it now.
+1.
AI will continue to become more useful, but being locked into proprietary services is a huge set back. Open AI models are clearly important but it does bring up the question of whether open community AI software will be able to compete against proprietary AI services if large AI models are destined to be vendor locked by corporations that have significantly more resources. to train AI models.

2024-05-08 9:27 pm
Alfman verbose=1
Edit: should have proof read this before submitting, haha.

2024-05-08 10:02 pm
Andreas Reichel
Well, a ChatGPT plugin on this site, proof reading and correcting automatically would be really nice and helpful 🙂
I wrote a fierce response to a requested yesterday. ChatGPT rephrased it into “make it friendly” for me and nobody got offended yet. I love it!

2024-05-08 9:26 pm
Alfman verbose=1
Thom Holwerda,
As an aside, my opinion on GitHub Copilot is clear – it’s quite possibly the largest case of copyright infringement in human history, and in its current incarnation it should not be allowed to continue to operate. As I wrote over a year ago:
My objection to this logic is that it vilifies AI learning from copyrighted works while overlooking the fact that humans have been doing it forever. It leaves me with the impression that the copyright argument against AI but not humans may be a form of prejudice. If so, can we justify this prejudice against AI and should it be codified into law?

2024-05-09 3:44 am
Mote
That presupposes that ai “learning” is similar to human learning. A highly doubtful assumption, since humans also take into account semantics, whilst ai is purely based on the statistics of syntax.

2024-05-09 5:13 am
Andreas Reichel
I don’t see exactly how “semantics” is different from a very very deep and broad network of probabilities.
A human baby is born “blank” (plus some basic instinct, call it the ROM) and it learns from observing only. It takes quite a while until anything like semantics develops.
The only real difference in learning I can see is the absolute will to survive (and so keeping trying), while a Neural Network does not care yet (thank God 🙂 ). We do not understand this will (it does not even make any sense) and we can’t ignite it.
The thrilling question in my book was: would this will ignite itself when the network reaches a critical complexity or was there something else?

2024-05-09 11:26 am
Alfman verbose=1
Andreas Reichel,
A human baby is born “blank” (plus some basic instinct, call it the ROM) and it learns from observing only. It takes quite a while until anything like semantics develops.
I don’t take “learns from observing only” to be self-evident. Humans are said to be born before their brains are fully developed, but in the animal kingdom it’s not unusual to be able to walk at birth, which is a fairly advanced skill to know how to do. Obviously I agree we learn from observation too.
We should distinguish between learning things oneself versus learning by copying others. Most of what kids learn is likely to be this sort of “monkey see monkey do”, which IMHO NN technology can already replicate., albeit not very practically given the limitations of hardware form factors. The capacity to learn by oneself would be another level of intelligence on top of this, and I imagine we can all agree that AI isn’t there yet.
The only real difference in learning I can see is the absolute will to survive (and so keeping trying), while a Neural Network does not care yet (thank God ). We do not understand this will (it does not even make any sense) and we can’t ignite it.
I think the Darwinian theory of evolution solves the philosophical problem of the “will to survive”. It’s more of an emergent property. A bacteria survives because it’s a replicator that has evolved to be well suited to the environment. Replicators that cease replicating or become ill suited to the environment naturally take themselves out of the gene pool. Those that remain alive are naturally good at adapting to their environment.
I think it’s conceptually very easy to achieve these conditions artificially but the amount of compute resources we need today to train just one LLM instance today already stretch the bank. To do this for trillions of instances in parallel will not be feasible until artificial platforms become far more efficient than today. However we may be able to “cheat” our way there by initializing AI with our own intelligence rather than evolving ti front scratch.
The thrilling question in my book was: would this will ignite itself when the network reaches a critical complexity or was there something else?
Maybe I misread you before…by “will” do you mean self awareness/consciousness? If so, I find that to be a philosophically complex issue that I don’t fully understand even in terms of what it means for humans, much less AI.
As a thought experiment, we could theoretically emulate a human brain using an artificial physics simulator. This physics simulator may be fully understood with no notion of consciousness at all. Yet would we consider the emulated beings to be conscious? If so, that’s remarkable. It would imply consciousness is really an emergent property of the programming I find these concepts really bizarre.

2024-05-14 4:17 am
dsmogor
I would say the key difference is the ability to replicate learned knowledge. LLMS “knowledge” can be copied and distributed with 0 cost. Human learning process does not scale and thus we have a new factor to consider. In a sense the LLM weights could be considered a transformation of code they use during training and as such could be e.g. contested in court as GPL derivative work when distributed (similarly to complets binaries who are also result of lossy transformation). As GPL 3 didn’t really catch on OpenAI and the likes are mostly immune to that danger though.
2024-05-14 8:39 am
Alfman verbose=1
dsmogor,
I would say the key difference is the ability to replicate learned knowledge. LLMS “knowledge” can be copied and distributed with 0 cost.
I would say that (at least today) training an LLM is much more expensive than training a human. But an already trained LLM is cheaper to replicate than replicating a human (ie we basically have to train a new human).
In a sense the LLM weights could be considered a transformation of code they use during training and as such could be e.g. contested in court as GPL derivative work when distributed (similarly to complets binaries who are also result of lossy transformation). As GPL 3 didn’t really catch on OpenAI and the likes are mostly immune to that danger though.
I understand why you’d want to argue this, however in principal I think it’s hypocritical of us to argue against artificial neurons but not natural ones.
And lets say the law did make this kind of distinction…we’re already starting to train biological neurons artificially (and the process might ultimately prove to be even more efficient and scalable than a mathematically simulated NN on vector computers). Personally I don’t think it should matter, but would it change your opinion in any way if artificial NN used brain tissue? That is coming and having laws that discriminate against computer NN could conceivably actually accelerate it.

2024-05-09 11:55 am
Alfman verbose=1
Mote,
That presupposes that ai “learning” is similar to human learning. A highly doubtful assumption, since humans also take into account semantics, whilst ai is purely based on the statistics of syntax.
I agree with Andreas Reichel’s point. our brains neurons are essentially giant statistical models. We’re not equivalent by any means, but I do think we are giving humans the benefit of doubt because they are human. Our nature is to treat ourselves differently.
2024-05-11 7:03 pm
sukru
Agreed,
But humans and machines are put to different standards.
For instance, when a self driving vehicle causes a major accident, the entire fleet it taken offline. When a human does drunk driving, they get a few points on their drivers license and a fee. There are many other examples like this expecting 100% accuracy from AI, whereas humans at best could do 80% or 90%.
Same here on learning. If we look at how ML and neuroscience co-developed, it is easy to see the patterns are shared both in AI and also “wetware”. However it is easier, and also more lucrative to sue billion dollar companies than enthusiastic kids hogging a library.

2024-05-08 9:52 pm
Barty
I used copilot for work for about two years, starting pretty soon after it was introduced. My experience is that it’s quality is highly dependent on the language you are using.
Python is often startlingly good, as long as you can describe your plan in comments or there is sufficient context for what you are doing. The first time I noticed this was when I was trying to make a bunch of charts from some complex data structures using matplotlib. A two line comment and it popped out roughly thirty lines that very nicely graphed what I needed. I ended up slightly tweaking the legend, but that was it. It does tend to use slightly older Python styles, so I don’t see many of the features added after 3.5 or so.
It’s C and C++ are often laughably terrible. I regularly saw hallucinated libraries and function calls, nonsensical use of pointers and arithmetic, and in the case of C++ absolutely no understanding of the STL. It was good at generating or extending basic boilerplate and ok at generating test cases. The one thing it was consistently good at was generating textbook algorithms when I would otherwise look them up.
For writing documentation, it was useless.

2024-05-08 9:58 pm
Andreas Reichel
Absolutely same experience here!
2024-05-09 12:13 am
Alfman verbose=1
Barty,
Python is often startlingly good, as long as you can describe your plan in comments or there is sufficient context for what you are doing.
…
It’s C and C++ are often laughably terrible. I regularly saw hallucinated libraries and function calls…
This is just a theory, but is it possible that python libraries are better defined/documented?
There are times when using C libraries that I found the documentation missing/out of date/wrong and I want to pound my head on the wall. Projects like openssl and ffmpeg are notorious for inadequate documentation and breaking changes such that I’ve had to resort to debugging the library source code myself just to figure out how to use it correctly. A NN solution is probably going to be at a major loss here since even if it follows the documentation the code can still be wrong.
It makes me wonder if languages that have good documentation are easier for AI to get right? For all my complaints about PHP, at least it’s well documented. If somebody tries AI generation for PHP let me know how well it works 🙂

2024-05-09 2:04 am
Andreas Reichel
> This is just a theory, but is it possible that python libraries are better defined/documented?
I don’t think so because I have observed hallucinated libraries or methods in java too — which really scores high on documentation and samples.

2024-05-09 11:30 am
Alfman verbose=1
Andreas Reichel,
I don’t think so because I have observed hallucinated libraries or methods in java too — which really scores high on documentation and samples.
I’d like to see examples, but there must be a reason even if we don’t know what the reason is.

2024-05-09 11:54 am
Andreas Reichel
Just ask to write you an ARIMA process in Java(!) or ask how to use PROPHET in Java.
It will point on all kind of R and Python libs and import phantom classes and packages that don’t exist anywhere.
I can only assume that those algorithms are explained on Websites which mix a lot of R and Python code with Java Snippets (for different illustrations and purposes) and so the correlation spikes even when there is zero causality.

2024-05-09 10:32 am
MarkHughes
I also found C and C++ support poor, However I recently wrote an app using WPF and C# and it helped me a lot, however it did make a few things up and I had to have it clarify. Still though, It allowed me to proceed much faster than I could have by trawling though the documentation.
2024-05-14 4:21 am
dsmogor
I wonder if this fact will influence languages popularity making those that are better supported by these tools win in the market.

2024-05-08 9:57 pm
Andreas Reichel
Spot on! And it would be so much fun because what exactly is “A.I”? Which algorithms to ban exactly? Just anything with pattern matching and probabilities? How about Markov Chains and Monte Carlo simulations? How are those less “A.I” then Language Models?
And please can we stop calling it “A.I”? There is nothing intelligent about it (yet), just pattern recognition (an excellent though!). Only when it knows that its lying (and it does that a lot) and when it can take “no” for an answer, we can talk about intelligence traits.
Beside that: I love those tools! Best API quick search ever especially when writing boilerplate code, text for the auditors or management or code in an un-familar syntax (like R or Julja).
Btw, I am actively publishing code on Github and I have zero concern about AI although I do earn my money with it: anything what AI can do with it is trivial anyway and any smart Indian will figure himself when you give him a budget. Writing the new code and understanding the customers requirements is where the frog has its curls.
Example: SQL is a standard, any RDBMS publishes its API and SQL implementation online. Should be the most comfortable field for an “A.I” because so many patterns to learn. Yet, give any “A.I” a simple task: to write you a SQL parser or formatter. Simple technical task, not deeper knowledge or experience needed, no room for interpretation. Go and see what happens and you will be chilled about all those tools.

2024-05-08 10:52 pm
Alfman verbose=1
Andreas Reichel,
And please can we stop calling it “A.I”? There is nothing intelligent about it (yet), just pattern recognition (an excellent though!).
People have a tendency to read too much into “AI”. In computer science we’ve been using the term for rather mundane tasks that are intelligent in the context of the task but not for anything else. AI merely means it contains artificial intelligence, not that it approaches human intelligence. Rather than redefining AI to exclude basic intelligence, I’m more in favor of introducing a new term for higher level intelligence, which we’ve dubbed “artificial general intelligence”.
https://en.wikipedia.org/wiki/Artificial_general_intelligence
Only when it knows that its lying (and it does that a lot) and when it can take “no” for an answer, we can talk about intelligence traits.
By that definition, I seriously think we’d have to exclude humans.
Example: SQL is a standard, any RDBMS publishes its API and SQL implementation online. Should be the most comfortable field for an “A.I” because so many patterns to learn. Yet, give any “A.I” a simple task: to write you a SQL parser or formatter. Simple technical task, not deeper knowledge or experience needed, no room for interpretation. Go and see what happens and you will be chilled about all those tools.
We’re still in the early phases of AI, It’s likely to improve with time. The static models are limited by training data. What we are asking AI to do today is akin to having a human read a book about an unfamiliar language and then write flawless programs without ever having run the compiler or tested the software. I’m impressed that it works as well as it does today. Future evolution will start giving AI the ability to compile and test it’s own code, this should help back-fill gaps in training data the same way human coders do. This will prove to be important for AI to be able to gain even more knowledge than was available in the training set.

2024-05-08 10:59 pm
Andreas Reichel
> What we are asking AI to do today is akin to having a human read a book about an unfamiliar language and then write flawless programs without ever having run the compiler or tested the software. I’m impressed that it works as well as it does today.
100% d’accord.
> Future evolution will start giving AI the ability to compile and test it’s own code, this should help back-fill gaps in training data the same way human coders do.
This one is interesting though! I agree that it should be able to compile and test during a learning phase. But once it has learned, should it not be able to write flawless code — at least within the syntax and API of a language?
I find that a very fascinating question because indeed I have no good answer how to differentiate between “learning” and “executing”. First instinct response was: its different, but when thinking about it indeed its the same because every execution is learning at the same time when hitting new challenges.

2024-05-08 11:49 pm
Alfman verbose=1
Andreas Reichel,
This one is interesting though! I agree that it should be able to compile and test during a learning phase. But once it has learned, should it not be able to write flawless code — at least within the syntax and API of a language?
I agree that in principal a learned pattern should be more or less flawless. In practice though I’m not really sure if “100% perfect” is possible using strait up static NNs. Consider that the neurons in machine learning are often represented by half floats, which is only a mathematical approximation.
https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
https://en.wikipedia.org/wiki/Half-precision_floating-point_format
These approximations, compounded over deeply nested neural pathways will likely introduce some mathematical errors in the model. I’m tempted to call this “the kraken”, as in kerbal space program – the lack of simulation precision results in unexpected chaotic behaviors that defy the simulation’s programming. Obviously this tradeoff is done for performance and memory reasons, but in theory the lack of precision could cause a “perfect” NN to diverge somewhat.
I find that a very fascinating question because indeed I have no good answer how to differentiate between “learning” and “executing”. First instinct response was: its different, but when thinking about it indeed its the same because every execution is learning at the same time when hitting new challenges.
IMHO, in order to work towards the “general” in Automated General Intelligence, it’s not enough to train a NN and call it a day, that NN has to be able to perform it’s own proper testing/debugging/research.

2024-05-09 12:47 am
Andreas Reichel
Insightful!
> that NN has to be able to perform it’s own proper testing/debugging/research.
And here it gets interesting: Why exactly should it do that? It has no purpose and no fear to die. You can beat it into the pattern matching, but you can’t enforce curiosity or creativity. Without our intrinsic hunger for life, there was no curiosity and no will to develop or to conquer. As long as this magic spark — the equivalence of the will to live — does not happen, we won’t see any intelligence, just mechanic skill.
2024-05-09 11:40 am
Alfman verbose=1
Andreas Reichel,
And here it gets interesting: Why exactly should it do that? It has no purpose and no fear to die. You can beat it into the pattern matching, but you can’t enforce curiosity or creativity. Without our intrinsic hunger for life, there was no curiosity and no will to develop or to conquer. As long as this magic spark — the equivalence of the will to live — does not happen, we won’t see any intelligence, just mechanic skill.
I’d argue this is already a solved problem. When you boil nature down, it is essentially a giant complex fitness function that decides who/what lives & dies. With AI we can exploit the same mechanism in virtual space. This turns out to be very effective in practice too. This essentially incentivizes problem solving without explicitly telling the AI how to do it.
https://www.youtube.com/watch?v=kojH8a7BW04

2024-05-08 10:55 pm
Xanady Asem
Copilot is a great productivity tool for us. Specially once it has been trained enough in our own patterns from our repositories.
There is definitely no going back for us.
2024-05-09 12:20 am
cmdrlinux
It’s genuinely useless. If you use a tool and it works fine two times then doesn’t work, how many times will you keep using that tool before you give up? And even worse, when it doesn’t work, it creates MORE work than not using a tool at all! Until it can consistently write functional code that runs the first time damn near 100%, it’s completely useless.
And my experience with Copilot is that it far too frequently creates gibberish that looks seemingly accurate. I’ve seen it create repeated formatting errors and call modules in ansible that didn’t even exist. It was wrong more often than it was write and then it was up to me to try and debug the mess.
No thanks, it’s got a long way to go before it’s worth hassling with. And it’s entirely possible that changes and I will be following it closely but right now it’s basically useless for me.

2024-05-09 12:31 am
Alfman verbose=1
cmdrlinux,
Until it can consistently write functional code that runs the first time damn near 100%, it’s completely useless.
Going by this standard, you’d have to concede that human programmers are completely useless as well, haha.
No thanks, it’s got a long way to go before it’s worth hassling with. And it’s entirely possible that changes and I will be following it closely but right now it’s basically useless for me.
Fair enough. We can probably agree that it will be important for AI to gain the ability to automatically test and improve the code on a real tool chain. This feature is likely to come to AI solutions in the future.

2024-05-09 2:09 am
Andreas Reichel
Question please: If it was not capable to write a) technically correct or b) function correct code at the first time and finds it out using the tool chain, then what exactly do you expect it to do?
I mean, entering a missing closing bracket or a semicolon may be possible and often correct but even such a “fix” can be wrong or even more wrong (depending on the exact problem).
So what will it do? Throwing syntax dices or brute force? If it would know what to do on error, then why did it not do at first (given, that it won’t just “forget” a bracket or mistype the semicolon?

2024-05-09 2:11 am
Andreas Reichel
Furthermore, how will know if the code is functional correct? How would it figure out unit tests when it did not figure out to write the code?
See how fascinating humans are and how far away we are from “A.I”?
2024-05-09 2:19 am
Andreas Reichel
One more: if you connect two Language Models to interact with each other and let the chat, what will happen? How far will the chat go? Forever? If so it does not serve any purpose. Die out immediately? Again, no aim or purpose.
If there was really any intelligence both sides would try to drain each other until there is no new information to gain and (only) then stop talking. Like a curious child interacting with its parents until it gets bored.
My main point is: We have great mechanical pattern matching and very very impressive. But no perpetual learning and understanding (yet), just more patterns and probabilities.

2024-05-09 12:10 pm
Alfman verbose=1
Andreas Reichel,
One more: if you connect two Language Models to interact with each other and let the chat, what will happen? How far will the chat go? Forever? If so it does not serve any purpose. Die out immediately? Again, no aim or purpose.
If there was really any intelligence both sides would try to drain each other until there is no new information to gain and (only) then stop talking. Like a curious child interacting with its parents until it gets bored.
Is it fair for me to summarize your points as being about a conscious entity making it’s own conscious decisions? I don’t have any answers for consciousness, but then I don’t really agree that we need to solve the consciousness problem to have effective AI. Even taking AI out of the equation, it’s hard to definitively prove that other humans are conscious, we may be mindless automata following our biological programming with no free will.

2024-05-09 12:03 pm
Alfman verbose=1
Andreas Reichel,
Question please: If it was not capable to write a) technically correct or b) function correct code at the first time and finds it out using the tool chain, then what exactly do you expect it to do?
I mean, entering a missing closing bracket or a semicolon may be possible and often correct but even such a “fix” can be wrong or even more wrong (depending on the exact problem).
So what will it do? Throwing syntax dices or brute force? If it would know what to do on error, then why did it not do at first (given, that it won’t just “forget” a bracket or mistype the semicolon?
We need to train an LLM to catch and correct mistakes. I don’t see a reason for this not to be possible but debugging is a skill that needs to be learned and it’s very likely we lack the training data to train an LLM to do it. This make it harder to train an LLM to debug, but I don’t think it’s impossible, it will just take time.

2024-05-09 12:23 pm
Andreas Reichel
I don’t agree with you on this one: the LLM gave it its best shot on the first attempt. Even if it finds out, that it has failed: what should it do from here and what would be pushing it to proceed?
Right now I am pushing it with my endless complaints. And it still can fail when it has exceeded all the less likelier patterns.
But since the LLM has no curiosity, why would it keep “digging” and start to investigate? It has no incentives.
At the same time, imagine we manage to get it “digging”! Then why would ever stop?
2024-05-09 1:15 pm
Alfman verbose=1
Andreas Reichel,
I don’t agree with you on this one: the LLM gave it its best shot on the first attempt Even if it finds out, that it has failed: what should it do from here and what would be pushing it to proceed?
First identify errors then correct them.
Why do you exclude the possibility of training an LLM to debug? There is a difference between “it hasn’t been done yet” and “it cannot be done at all”.
Right now I am pushing it with my endless complaints. And it still can fail when it has exceeded all the less likelier patterns.
It’s evolving, training is key to that. It’s no surprise that some skills are lacking when they haven’t been part of the training.
But since the LLM has no curiosity, why would it keep “digging” and start to investigate? It has no incentives.
You keep coming back to this, but I don’t understand why. AI does not need to be conscious. That can be another evolutionary step down the line, but it’s not necessary for AI to have a consciousness for it provide us with problem solving tools today under the goals that we set out for it.

2024-05-09 3:17 pm
Book Squirrel
I’m very much aligned with Thom on this and I always appreciate tech writers cutting through the hype to the actual issues.
So long as these models are not something we would assign personhood, rights and responsibilities to, they remain basically tools that transform data. The training corpus for a model can be seen both as its input and its source code, and in both cases the licensing of that data very much matters, for the model itself and its output. Or at the very least it *should*. If we lived in a more equitable society and culture where most people’s livelihoods were not directly dependent on work and ownership, then perhaps we could afford to be a bit more loose about this, but we do not.
In any case, I have somewhat unwillingly been kept painstakingly up to date on AI news, examples and developments. But I still haven’t prompted a single line or image of anything from any of the models, and there are actually more reasons for that than just ethics and copyright issues (though I find those to be perfectly sufficient):
As a developer and system administrator I need to understand as much as possible of what’s going on in the systems i develop/manage. Even if I could trust a generative model to do parts of that job correctly most of the time, it would just dull the skillset I need to maintain for when things are on fire and the models can’t help. If I *can’t* trust it to do things for me (which is what it very much looks like, with no sign of it getting much better in spite of an entire industry dumping ridiculous volumes of capital into the problem for more than a year), then I have to doublecheck everything it barfs out on the screen, and then I might as well just do it myself.
Crucially, what it comes down to is that I can’t actually assign any responsibility to a generative model as I would a coworker, and good luck making any of the companies behind them willingly take responsibility for any of their output.
If I use them, any issues are inevitably going to fall back on me, and if I screw something up in my work the error should be *mine* so that I can actually learn from it.
I’ve actually been there before in a way, which may be why I react to generative models in this particular way. I’ve worked with mechanical theorem provers in the past, and they come with some very helpful model checkers to automate parts of the process. But they’re similar to generative models in that they are fickle and unpredictable. They’ll help you just enough that you can make a lot of progress, but not enough to solve anywhere near everything you need.
I found that the more I used them, the less I learned from the experience, and the more I would actually get stuck because I spent less time trying to work out proofs manually, and more time attempting to appease the model checkers.
And I see way too many similarities in how generative models would function for programming or problem solving in general.

2024-05-09 5:43 pm
Alfman verbose=1
Book Squirrel,
I get the moral quandaries that you share with Thom. Everyone is going to have their own personal feeling about it and that is fair…but no sign of getting better? That seems so far off base when it’s gotten so much better in such a short period…These models are learning the collective works of humanity so quickly…everything we throw at them they absorb like a sponge and are already passing exams at a college level. LLMs are not perfect nor are they artificial general intelligence, but sometimes I get the feeling people are being dismissive of it because they don’t want it to succeed even though the technology has been making outstanding progress. They’re learning bottlenecks may be limited by our inability to train them fast enough, but it seems likely that we are going to see more specialized LLMs in the future.
Is it fair to say that you don’t actually want the technology to improve?
I see both good and bad potential outcomes, and those bad outcomes are still very much on the table. We’re probably in agreement on this, but I don’t think that denying AI’s potential is a good strategy to shape the future towards better outcomes.
The long term value for employers is hugely compelling because their most expensive operating cost is labor by far. If we’ve learned anything about corporations it should be this: corporations will always cave into financial pressure to put profits first above moral objections. Some people go so far as to say maximizing corporate profits is the board’s only job. We may not like it but we need to recognize how significant corporations will be for AI’s future. The money and demand will be there! This dynamic is not a typical “bubble” or fad driven by an increasing supply with a lack of long term demand.

2024-05-10 8:12 am
Book Squirrel
Whether I want the technology to improve or not is somewhat irrelevant, I don’t have any power over that.
In fact, if we could replace my work with some AI-model there’s a big part of me that would be very happy to no longer have to deal with the anxiety of being partly responsible for a fairly important server-environment.
And I have *some* hope that if it were to become possible to replace most intellectual labor with some form of AI-model, it might cause enough of a disaster to force us to finally reconsider the whole capitalism thing. But who knows?
With regards to the models improving, the phrase was “…no sign of it getting much better”, “much” being an important word, because there are pretty much always ways that you can improve a technology. But as with every other technology, the return on investment tends to shrink significantly over time. You could in fact say that most technological advances tend to follow a logarithmic curve of improvement from their inception, with a seemingly exponential advance at the beginning (which very often tricks us into thinking that’s going to continue).
The point being, GPT-4 was released a little over a year ago. Since then the industry has thrown absolutely insane mountains of cash at trying to utilise or improve it because, as you note, corporations would just love to save money on all the expensive labor (that makes them function at all).
And useful applications have certainly been found. As language models they have become practically perfect at that particular task: generating language that makes sense in terms of syntax and semantics. This apparently makes them pretty good at many translation tasks, for instance. But what they haven’t become much better at is fact checking, reasoning, logic, operating with some sort of consistent model of the world, etc. Because that’s not really what they do fundamentally.
Through RLHF (reinforcement learning from human feedback) GPT-4 can be said to have improved somewhat over time in its responses to specific tasks. But as with every other type of neural network, reinforcement learning will improve specific responses, but in return degrades the quality of the rest of the network, because every parameter in the network affects a multitude of responses. And it’s difficult to predict the exact consequences whenever you’re swinging the reinforcement learning hammer. There has been multiple pretty clear examples of this through GPT-4’s lifetime.
Since GPT-4 we’ve seen several competing model releases (like Gemini and Claude 3), but when you cut through the hype it turns out they’re not much better at actually solving problems than GPT-4.
And specialised models already exists for programming tasks, but somewhat predictably they’re also not doing much better than GPT-4.
Now, what they *can* do in terms of solving problems is solving the kind of problems they’ve seen a bunch of solutions to already. It’s just that these are not generally problems we need to solve again.
But that’s the reason they can actually be pretty good at handling say, college level exams: Because they tend to be problems of limited scope that the models have seen a multitude of solutions to already, and so they can often synthesise passable answers by statistical inference.
This kind of makes a mockery of the purpose of such exams though. As you observe, no human can absorb the sheer volume of information that a generative model has been trained on. So the purpose of me solving exam questions is to show that I’ve *understood* the theory needed to produce the answer without having seen an answer already.
I’ve never actually read the source code of an implementation of the game Snake, but I can produce an implementation regardless. An LLM may also be able to produce an implementation, but it will have been synthesised from the some 50-odd implementations that were included in its training corpus.
I also don’t feel the need to write an implementation of Snake, because I don’t feel like I have much new to contribute with in that space. But it’s in the nature of the education system to have to retread a lot of ground to allow students to gain a fundamental understanding of the problem domains. We let students study and write sorting algorithms. I’ve never had to do this at work, I just hook into a library.
And here we come back to the issue I experienced a long time ago with my overreliance on model checkers in the world of theorem provers. I’ve already heard reports from old friends in the industry of some interns being inflicted with “GPT-4 brain”. They overrely on using GPT-4 to solve problems and then they get stuck and become dismayed when it turns out that it’s actually very bad at solving the kinds of problems we spend most of our time on in a real-life setting.
This was actually something I predicted might happen, and it’s concerning to me that we’re seeing people like this a mere year after the release of GPT-4. I thought the lead-time would be a bit longer.
But I am somewhat optimistic that those interns (like me) will learn from the experience, and given a little time, hopefully the education sector will find ways to adapt so we don’t lose too many students to this.

2024-05-10 9:50 am
Alfman verbose=1
Book Squirrel
Whether I want the technology to improve or not is somewhat irrelevant, I don’t have any power over that.
I agree, the market will move independently of what we want.
With regards to the models improving, the phrase was “…no sign of it getting much better”, “much” being an important word, because there are pretty much always ways that you can improve a technology. But as with every other technology, the return on investment tends to shrink significantly over time.
History doesn’t really back this up though. Those with first mover advantages can be extremely lucrative over the long term. Not everyone will wins, markets have looser too, but the winners can end up turning their investments into perpetual cash cows that pay for themselves over and over and over again. We’ve seen this many times and it’s likely to happen here as well.
And useful applications have certainly been found. As language models they have become practically perfect at that particular task: generating language that makes sense in terms of syntax and semantics. This apparently makes them pretty good at many translation tasks, for instance. But what they haven’t become much better at is fact checking, reasoning, logic, operating with some sort of consistent model of the world, etc. Because that’s not really what they do fundamentally.
I think you are overlooking something: specialization. Businesses don’t care so much about these “know it all” LLMs that we are playing with today. To replace employees they want AI that specializes in the actual operational work they have. This specialized training going to improve quality & consistency of the models, outperforming regular employees at the same tasks. While these business models will be much less interesting to the public, with their much improved job training it is very likely they will be responsible for a lot of future job displacement.
Since GPT-4 we’ve seen several competing model releases (like Gemini and Claude 3), but when you cut through the hype it turns out they’re not much better at actually solving problems than GPT-4.
And specialised models already exists for programming tasks, but somewhat predictably they’re also not doing much better than GPT-4.
Just like human, they’re deficient at tasks they haven’t been trained on, but these shortcomings can be tackled over time. People have always been dismissive of new technology, but in the end I think that AI’s evolution is far from concluded!
But I am somewhat optimistic that those interns (like me) will learn from the experience, and given a little time, hopefully the education sector will find ways to adapt so we don’t lose too many students to this.
I understand why so many feel reprehension towards AI given their livelihoods stand to be upended if AI succeeds. But I think people should be taking the looming treat of major job losses more seriously than we have been. IMHO too much attention gets focused on extremely short term AI issues on today’s models while missing the Forrest through the trees.

2024-05-11 7:08 pm
sukru
For those who can use these tools effectively, they are real “10x productivity enablers”. But obviously it does not work for everyone.
There were prior smart tools, like ReShaper for C# from JetBrains (that also does IntelliJ among others), which provide very successful plugins, that “read your mind”. But they had to implement common patterns manually. Here the AI has the advantage of recognizing those patterns, even some unique to your codebase (depending on the models used).
And they also have the advantage to replace multiple tools. You’d use another tool for generating documentation, automatically writing (basic) unit tests, or understanding a piece of code. Now the same model can do all.
Again, the downside is: (1) not all languages are supported at the same level, (2) not all models can be “fine tuned” to your own codebase, and more importantly (3) like any other tool, it requires certain skillsets to use effectively.
However after using similar tools for a while, I could see it would be really counter-productive to even try to go back.