What if you want to find out more about the PS/2 Model 280? You head out to Google, type it in as a query, and realise the little “AI” summary that’s above the fold is clearly wrong. Then you run the same query again, multiple times, and notice that each time, the “AI” overview gives a different wrong answer, with made-up details it’s pulling out of its metaphorical ass. Eventually, after endless tries, Google does stumble upon the right answer: there never was a PS/2 Model 280, and every time the “AI” pretended that there was, it made up the whole thing.
Google’s “AI” is making up a different type of computer out of thin air every time you ask it about the PS/2 Model 280, including entirely bonkers claims that it had a 286 with memory expandable up to 128MB of RAM (the 286 can’t have more than 16). Only about 1 in 10 times does the query yield the correct answer that there is no Model 280 at all.
An expert will immediately notice discrepancies in the hallucinated answers, and will follow for example the List of IBM PS/2 Models article on Wikipedia. Which will very quickly establish that there is no Model 280.
The (non-expert) users who would most benefit from an AI search summary will be the ones most likely misled by it.
How much would you value a research assistant who gives you a different answer every time you ask, and although sometimes the answer may be correct, the incorrect answers look, if anything, more “real” than the correct ones?
↫ Michal Necasek at the OS/2 Museum
This is only about a non-existent model of PS/2, which doesn’t matter much in the grand scheme of things. However, what if someone is trying to find information about how to use a dangerous power tool? What if someone asks the Google “AI” about how to perform a certain home improvement procedure involving electricity? What if you try to repair your car following the instructions provided by “AI”? What if your mother follows the instructions listed in the leaflet that came with her new medication, which was “translated” using “AI”, and contains dangerous errors?
My father is currently undertaking a long diagnostic process to figure out what kind of age-related condition he has, which happens to involve a ton of tests and interviews by specialists. Since my parents are Dutch and moved to Sweden a few years ago, language is an issue, and as such, they rely on interpreters and my Swedish wife’s presence to overcome that barrier. A few months ago, though, they received the Swedish readout of an interview with a specialist, and pasted it into Google Translate to translate it to Dutch, since my wife and I were not available to translate it properly.
Reading through the translation, it all seemed perfectly fine; exactly the kind of fact-based, point-by-point readout doctors and medical specialists make to be shared with the patient, other involved specialists, and for future reference. However, somewhere halfway through, the translation suddenly said, completely out of nowhere: “The patient was combative and non-cooperative” (translated into English).
My parents, who can’t read Swedish and couldn’t double-check this, were obviously taken aback and very upset, since this weird interjection had absolutely no basis in reality. This readout covered a basic question-and-answer interview about symptoms, and at no point during the conversation with the friendly and kind doctor was there any strife or modicum of disagreement. Still, being into their ’70s and going through a complex and stressful diagnostic process in a foreign healthcare system, it’s not unsurprising my parents got upset.
When they shared this with the rest of our family, I immediately thought there must’ve been some sort of translation error introduced by Google Translate, because not only does the sentence in question not match my parents and the doctor in question at all, it would also be incredibly unprofessional. Even if the sentence were an accurate description of the patient-doctor interaction, it would never be shared with the patient in such a manner.
So, trying to calm everyone down by suggesting it was most likely a Google Translate error, I asked my parents to send me the source text so my wife and I could pour over it to discover where Google Translate went wrong, and if, perhaps, there was a spelling error in the source, or maybe some Swedish turn of phrase that could easily be misinterpreted even by a human translator. After pouring over the documents for a while, we came to a startling conclusion that was so, so much worse.
Google Translate made up the sentence out of thin air.
This wasn’t Google Translate taking a sentence and mangling it into something that didn’t make any sense. This wasn’t a spelling error that tripped up the numbskull “AI”. This wasn’t a case of a weird Swedish expression that requires a human translator to properly interpret and localise into Dutch. None of the usual Google Translate limitations were at play here. It just made up a very confrontational sentence out of thin air, and dumped it in between two other sentence that were properly present in the source text.
Now, I can only guess at what happened here, but my guess is that the preceding sentence in the source readout was very similar to a ton of other sentences in medical texts ingested by Google’s “AI”, and in some of the training material, that sentence was followed by some variation of “patient was combative and non-cooperative”. Since “AI” here is really just glorified autocomplete, it did exactly what autocomplete does: it made shit up that wasn’t there, thereby almost causing a major disagreement between a licensed medical professional and a patient.
Luckily for the medical professional and the patient in question, we caught it in time, and my family had a good laugh about it, but the next person this happens to might not be so lucky. Someone visiting a foreign country and getting medicine prescribed there after an incident might run instructions through Google Translate, only for Google to add a bunch of nonsense to the translation that causes the patient to misuse the medication – with potentially lethal consequences.
And you don’t even need to add “AI” translation into the mix, as the IBM PS/2 Model 280 queries show – Google’s “AI” is entirely capable of making shit up even without having to overcome a language barrier. People are going to trust what Google’s “AI” tells them above the fold, and it’s unquestionably going to lead to injury and most likely death.
And who will be held responsible?
There is no Peugeot 606, yet everyone i know claims to have seen one. The 605 phase2 was discontinued (designed by pinafarina) and had 237 hp in the basic model ,and it was awesome.
Thos who lives in sweden knows this predicament, the villagers put up roadblocks so large long haulers will not use that road (this in very common in lumber areas.) and that a fourth of all trees in the world is in the second smallest continent (europe) says something.
Wow, this is like a drug but legal 🙂
Ask your AI to speak Bothnian of any kind, i bet your wife will laugh her head of. It CLAIMS it knows bothnian (nordic laguage descended from old orse, like swedish, norwegian and danish) Very few still speak it fluently (around 20+k) and it has a lot of variations, pijt and lulj are undrestandable dialects (lulj would be the one of your wife) whilst kölish is a tiny bit harder, överkölish is very hard but the letters still say the same, whilst the pronounciation changes. Jakobstad for instance sound very pijt, uleborg sounds very lulj. Bothnian is the harderst language of all recorded to learn, since it can have dual diphtons and quadruple meanings depending the vowel.
“the duck” is something AI gets wrong in swedish an bothnian. In swedish it basicly just have two meanings and you are just suppored to know if you mean a duck or a spirit. In bothnian you have anden, andén, andèn andën aender, and andeer* (pluralis) And yes they ALL have pluralic and dual meanings.
“Hardest language” is very subjective, as it depends on personal factors. Even the claim of “hardest language for most people” requires people to have read/heard, if not tried learning, the language to assess its difficulty.
Please look into the isolated South East Asian language (Thai, Khmer) before guessing wildly.
Those south African languages with click-sounds appear pretty difficult from my Scandinavian viewpoint too. However, it seems the above text was AI generated or at least written to appear that way 🙂
You are certainly correct, though I also would factor in the number of native speakers for consideration. Thai is spoken by 72 million people, which is an amazing high number for an isolated language.
Wild how super important, even life-critical software being nondeterministic is now cool and okay, isn’t it.
Wild how super important, even life-critical humans being nondeterministic is now cool and okay, it isn’t.
Nah, just kidding, set the temperature to 0 my man, you’re welcome 😉
Automated translation now goes through English. So not only did the sentence get made up, most likely the text went swedish -> english -> dutch.
This results in really stupid translations.
Answer: The person who used AI without running the results through a human validator. Your parents shouldn’t be using automatic translation (AI or otherwise) when the result matters, it’s entirely their fault.
This is btw why AI’s biggest successes are in the arts, where mistakes don’t matter. Funny how that turned out.
For other tasks, AI will boost productivity (much like old autocomplete does) and will reduce the need for human labor but won’t completely eliminate it.
Thom Holwerda,
Maybe, but I think a lot of people could be in the same boat. I agree with Thom there are problems and I think it’s useful to shed light on these kinds of cases. Although the coverage is very one sided.
Quite right.
I see it this way as well. It’s primary use today is as an aide to make tasks faster and more efficient. It doesn’t have to do 100% of the job at expert level to bring real cost savings to corporations. I won’t contest the fact that these technologies aren’t perfect, but judging AI as useless because it can’t do 100% of the job seem to be arbitrary. It doesn’t have to displace a human entirely to make employees more productive.
Weren’t we there before? When people were stranded in the desert or were trying to drive through solid walls because Google Maps had a mistakes?
At some point you have to accept the fact that if you blindly rely on technology that you don’t understand then you only have yourself to blame.
zde,
I don’t know the incident you refer to, but:
1) The driver still has to drive responsibly (which is kurkosdr’s point)
2) Whatever the map error was, it could have just as easily been a human error.
In other words acknowledging AI has faults is not the same as ruling it out because it’s not competing against something that was ever perfect to begin with.
Let’s take all these air traffic problems happening in the US today. The idea of putting it all in the hands of automated computers is terrifying to some. However the more you learn about the current system the more one realizes that the current human system is terrifying and has contributed to real accidents in recent years.
For all their faults, computers could probably automate 95% of the job and allow humans to focus on exceptions to the rule. Everyone’s going to fight the automation, but in the end it may be inevitable.
I am even going that far saying: the combination of both, AI and Human supervision likely will give the best results.
In my understanding, AI is great at recognition of regular patterns and correlations. While humans are way better at handling exceptions and anomalies. So anything like cancer detection screening, traffic control, life stock etc. probably benefits from AI allowing humans to take over when exceptions arise.
Andreas Reichel,
I agree when it comes to the types of AI that strictly relies on input training data, I expect it should handle well represented cases with great consistency, but it’s just not going to do well in situations for which there is too little known data. Such a situation should prompt an escalation to a supervisor.
We’re not limited to “monkey see monkey do”. Adversarial & reinforcement learning are extremely powerful tools to let AI learn basically from scratch. It’s easiest to apply this to games, because the simulations are already readily accessible and therefor the AI can be setup to learn from itself instead of just mimicking humans.
“Training an unbeatable AI in Trackmania”
https://www.youtube.com/watch?v=Dw3BZ6O_8LY
I believe this will also be harnessed to solve problems in the real world too. It’s a numbers game, AI can simulate billions of scenarios, including “unrealistic” ones. And humans, even at expert levels, may struggle to match AI’s level of depth. Of course this is all contingent on having a quality simulator that matches reality. This may be a limiting factor. A simulator that lack authentic failure modes and responses isn’t ideal. Naturally the simulator should have all known (as well as unknown) failure modes, and let the AI solve them all in the sim.
That’s a sensible approach, especially in the near term. Although my gut feeling is that adversarial training AI is going to outperform humans by a large margin. These are the methods that led to beating the best humans at chess/go/etc and now we have exponentially more compute power. I thought I read somewhere that AI classifiers were already super human at cancer screening, more effective picking up clues sooner than human experts can.
People need to be educated that automatic translation is inherently unreliable and not use it without a human validator where it matters.
kurkosdr,
Even more so for the type of translations likely to be lacking in the training dataset. I suspect there may be disproportionate representation of literary works and bibles, which are readily available in several languages for training. It probably didn’t get many (or even any) examples for professionally translated private health records. This could impact on the quality of translation because it’s easy to see how & why sparse training data leads to false generalizations.
Google’s AI (and other AI products) are however not marketed as flawed hunks of junk that’s held together with sticks and tape. They are supposedly awesome and not designed to make shit up. If they had a big red label saying “don’t use this for anything important, you’ll be sorry” things would be better.
Google’s AI has an “AI responses may include mistakes.” warning rights under the results. But Google Translate doesn’t have one, because Google assumes “everyone” knows that automatic translations that haven’t being reviewed by a human validator are inherently unreliable. Turns out, not “everyone” knows.
These AI results will be a boon for those vaccine skeptics and conspiracy theory nutters that like to bandy around the phrase “do your research!”.
Not a whole lot of good can come of it.
Unneeded overkill, since we have TikTok already. They overthrew whole countries with TikTok only (Burkina Faso, Ibrahim Traore).
Nah, I’m sure Google already told its AI which kind of wrongthink is not allowed. They’re fine if it hallucinates stuff, as long as it’s politically correct.
j0scher,
If you study downloadable LLMs like LLAMA looking for bias you will find that they do include a hardcoded list of moral rules. I think it takes the form of “if X comes up, then say Y”. In this case X and Y are not literal match and substitutions, but instead the LLM abrupt pivots the discussion around Y. This adds a practical ability to censor the LLM without having to retrain the whole thing.
Theoretically these rules might contain political misinformation. I don’t know of any case of this but an AI company could be ordered to do it. I consider this quite different from other hallucinations though because one is plainly deliberate whereas the other is the LLM creating a conversation around wrongly interpolated data points. I also consider these hallucinations technically different than bad training data, which create data points that are present, but wrong.
Of course, the person, who took your money under a SLA with hard constraints or forced you to use this technology at gunpoint.
So tell me, who did this heinous crime?
It’s a valid question though. If you hire a human translator and they insert a sentence into the translation that didn’t exist in the source material which then leads to physical, emotional, or financial harm, the translator will be the one facing a lawsuit, along with their employer if they are not an independent contractor. In the case of “AI”, logic dictates that the company that owns the translation “AI” software is now the target of the lawsuit, or at least should be, as well as any third party involved in making it consumer-facing when it clearly isn’t ready for public consumption.
Of course, the very idea that a competent human translator would (accidentally or purposely) insert malicious information into a translation is so absurd as to be nearly beyond belief; they certainly won’t be employed after a few mistakes like that. However, “AI” does it all the time with search results, translations, and other fact-based responses. Hallucinations and fabricated nonsense are par for the course for today’s glorified autocomplete bots. There is no intelligence present in these systems as they stand today, they can only regurgitate what they have taken in as source material; they can’t make sense of it, they can’t reason about it, they can only repeat it like a parrot.
We can all agree the translation AIs don’t understand anything they are translating. It’s merely a statistical model. It should never be considered a primary source because it isn’t one.
I looked at google translate, and there are no published disclaimers or terms of service. I’m actually really surprised, it seems like a glaring oversight by google’s legal team. On the one hand that may be an invitation for someone to sue google over bad translations. but on the other hand I’m wondering just how accountable google can be for a free service that never once makes any guarantee about translation accuracy. It would be interesting to hear from a real lawyer on this.
Not that I ever claimed to be a laywer, but ouch. That felt personal.
Morgan,
Oh, that wasn’t intentional.
If it were one of us putting up a free service, I wonder if the courts would hold us liable in a lawsuit against the free service without a contract? My gut feeling is that it shouldn’t hold weight without a contract/agreement, but then I’m not a lawyer and we’ve all seen counterintuitive rulings.
@Alfman:
Yeah sorry I took it the wrong way, no worries.
I feel like if there is material harm, it doesn’t matter if the service was free or paid, under contract or no contract. Harm allegedly resulted due to use of the service, and it’s the court’s job to decide if the service provider is liable and if so, what recompense is warranted.
It’s not quite the same thing, but it makes me think of the laws here in the US about someone getting hurt on your property. Even if you didn’t cause it, even if they didn’t have permission to be on your property, if someone gets hurt while in your yard or home then you can be sued to cover any expenses related to making them whole, and this is usually where homeowner’s insurance comes into play.
This article is an excellent example of what I found in the analysis I did of Google Translate across 130 languages. I call it “MUSA”, the Make Up Stuff Algorithm. It is the imperative to provide an answer, even if the data does not justify it. It factors throughout my book, “Teach You Backwards”, with a specific section here: https://www.teachyoubackwards.com/introduction/#musa .
In your case, you are quite right that the offending sentence probably came from the training data. I don’t know if it still does it, but for a while DeepL would translate something like “Yours sincerely” to French with the equivalent of something like, “With deep and thoughtful sentiments, Mr. President” – clearly referencing some parallel it had found between two documents, with no human intervention to stop the bleeding.
Pix or it didn’t happen
I just asked Google Gemini to tell me about the IBM PS/2 Model 280 and it immediately corrected me and told me that there was no such model on record. ChatGPT also informed that there was no such thing as a PS/2 Model 280. So it seems they have corrected this particular error.