At Google I/O, Google demonstrated Google Duplex, an AI-generated voice assistent that can make phone calls for you to perform tasks like making a restaurant reservation or booking a hair salon appointment. After the event, a whole Google Duplex truther movement sprung up, who simply couldn’t believe technology could do anything even remotely like this, and who accused Google and its CEO Sundar Pachai of lying on stage.
Today, a whole slew of media outlets have published articles about how they were invited to an event at a real restaurant, where the journalists themselves got to talk to Google Duplex. The journalists took on the role of restaurant workers taking reservations requested by Google Duplex. The results? It works exactly as advertised – better, even. Here’s Ars Technica’s Ron Amadeo:
Duplex patiently waited for me to awkwardly stumble through my first ever table reservation while I sloppily wrote down the time and fumbled through a basic back and forth about Google’s reservation for four people at 7pm on Thursday. Today’s Google Assistant requires authoritative, direct, perfect speech in order to process a command. But Duplex handled my clumsy, distracted communication with the casual disinterest of a real person. It waited for me to write down its reservation requirements, and when I asked Duplex to repeat things I didn’t catch the first time (“A reservation at what time?”), it did so without incident. When I told this robocaller the initial time it wanted wasn’t available, it started negotiating times; it offered an acceptable time range and asked for a reservation somewhere in that time slot. I offered seven o’clock and Google accepted.
From the human end, Duplex’s voice is absolutely stunning over the phone. It sounds real most of the time, nailing most of the prosodic features of human speech during normal talking. The bot “ums” and “uhs” when it has to recall something a human might have to think about for a minute. It gives affirmative “mmhmms” if you tell it to hold on a minute. Everything flows together smoothly, making it sound like something a generation better than the current Google Assistant voice.
One of the strangest (and most impressive) parts of Duplex is that there isn’t a single “Duplex voice.” For every call, Duplex would put on a new, distinct personality. Sometimes Duplex come across as male; sometimes female. Some voices were higher and younger sounding; some were nasally, and some even sounded cute.
Duplex conveyed politeness in the demos we saw. It paused with a little “mmhmm” when the called human asked it to wait, a pragmatic tactic Huffman called “conversational acknowledgement”. It showed that Duplex was still on the line and listening, but would wait for the human to continue speaking.
It handled a bunch of interruptions, out of order questions, and even weird discursive statements pretty well. When a human sounded confused or flustered, Duplex took a tone that was almost apologetic. It really seems to be designed to be a super considerate and non-confrontational customer on the phone.
All calls started with Duplex identifying itself as an automated service that would also record the calls, giving the person on the receiving end of the line the opportunity to object. Such objections are handled gracefully, with the call being handed over to a human operator at Google on an unrecorded line. The human fallback is a crucial element of the system, according to Google, because regardless of permission, not every call will go smoothly.
Google Duplex will roll out in limited testing over the coming weeks and months.
Why don’t they optionally employ this technology on the back end of Google Voice? Surely, if it’s smart enough to make dinner reservations, it would be smart enough to detect telemarketers and robocallers.
Hell, I would pay a monthly subscription fee for this, assuming it actually worked. I might even pay $50 a month, just so I never had to get interrupted by a robocall again.
Edited 2018-06-27 17:25 UTC
Heh, who do you think will be the first to employ this tech once it hits mainstream?
Heh, robots talking to each other…
I don’t care, as long as they’re not talking to me
Now it’s just waiting until automated robotic reservation making systems like this are going to have a conversation with automated robotic reservation taking systems. Oh, the convolution…
Surely for efficiency robots would quickly detect the other is a robot and switch to beep-beep-drr-beep style communication.
It’s not so much the idea of booking a table automatically in the absence of a table booking api, but the idea that Google has just extended it’s information crawling engine into the human space.
As well as going through your emails and parsing web pages it can now just phone up somebody and ask!
Imagine security agency want to know more about you – pull out all possible contacts from their computer records then enter the questions they want to ask and let a bot phone up 100’s or 1000’s of people – using information they know about them as a cover story.
This has huge security implications – last week I had a simple automated call telling me that my internet had been hacked and I should press 1 if… etc – clearly a scam – but rather than being called by ‘Bob’ from India it’s been automated ( even scammers are losing their jobs… 🙂 )
Imagine if you had Google Duplex like quality – you could really scale identity theft scams, or even commercial or national espionage – the weak points in the system are people – takes phishing to a whole new level.
Scary.
The key thing to remember is scale through automation makes new things economic. That includes nefarious as much as legitimate activities.
I want a ‘report’ button on my phone.
Just as bad… Political robo calls from campaigns could argue back with you.
There is a lot more ‘fun’ to be had, people have no idea what is coming:
https://techcrunch.com/2017/04/25/lyrebird-is-a-voice-mimic-for-the-…
Excellent point – security agencies would have no problem gathering the requisite voice samples for a high quality mimic.
Imagine the chaos the CIA or FSB could cause ( are causing? ) with accurate voice impersonation technology.
In 1992 the film Sneakers showed how biometrics are a bad idea – “My voice is my passport” – with a simple cassette recorder.
Out of date politicians are pushing biometric passports and ID cards – without realising that *once* ( not if ) the your biometrics become compromised you can’t simply change them – unlike a password!
Maybe I’m wrong, but I think the reason they want my face for my passport is so they can recognize my face through CCTV. I’m not so sure it’s really used as biometric data.
Security is one, but I think more importantly I don’t think realize what this means for the workforce.
Remember all those entry level jobs for support desks, sales, etc.
Remember how you might hear when you first call them: “this may be used for training purposes”
In a few years it will be used for training purposes, specifically machine learning. And good bunch of those jobs will be gone.
Any task which is repetitive will be automated. Even if one company just does it one time per year, but lots of companies do the same thing ? That means someone can make a company which does that for lots of companies.
At law firms it used to be that the entry level job was reading all the documents and getting all the facts out of those documents and find discrepancies. Now they use a computer to search through all the documents. A huge time saver.