I am not a tin-foil-hat kind of person. But last week, I replaced my voice mail greeting (made in my own voice) with a synthetic voice of an actor saying to leave a message. I will explain the reasoning behind this, and you can decide whether I should now accessorize my future outfits with the hat.
Last month, Techcrunch ran a story about the perils of audio deepfakes and mentions how the CEO of Wiz, an Israeli cybersecurity firm that I have both visited and covered in the past, had to deal with a deepfake phone call that was sent out to many of its employees. It sounded like the CEO’s voice on the call. Almost. And fortunately, enough people at Wiz were paying attention and realized it was a scam. The call was assembled from snippets of a recorded conference session. But even the most judicious audio editing still can’t be perfect, and people at Wiz caught the unnaturalness of the assemblage. The reason for the difference had nothing to do with AI, but everything to do with human nature. This is because his speaking voice is somewhat strained, because he is uncomfortable in front of an audience, and that isn’t his conversational voice.
But it is just a matter of time before the AI overlords figure this stuff out.
AI-based voice impersonations — or deepfakes or whatever you want to call them — have been around for some time. I have written about this technology for Avast’s blog in 2022 here. The piece mentioned impersonated phone calls from the mayor of Kyiv to several other European politicians. This deepfake timeline begins in 2017 but only goes up to the summer of 2021. Since then, there have been numerous advances in tech. For example, a team of Microsoft researchers have developed a text-to-speech program called VALL-E that can take a three-second audio sample of your voice and be used in an interactive conversation.
And another research report, written earlier this summer, “involves the automation of phone scams using LLM-powered voice-to-text and text-to-voice systems. Attackers can now craft sophisticated, personalized scams with minimal human oversight, posing an unprecedented challenge to traditional defenses.” One of the paper’s authors, Yisroel Mirsky, wrote that to me recently when I asked about the topic. He posits a “scam automation loop” where this is possible, and his paper shows several ways the guardrails of conversational AI can be easily circumvented, as shown here. I visited his Ben Gurion University lab in Israel back in 2022. There I got to witness a real-time deepfake audio generator. It needed just a sample of a few seconds of my voice, and then I was having a conversation with a synthetic replica of myself. Eerie and creepy, to be sure.
So now you see my paranoia about my voicemail greeting, which is a bit longer than a few seconds. It might be time to do an overall “audio audit” for lack of better words, as just another preventative step, especially for your corporate officers.
Still, you might argue that there is quite a lot of recorded audio of my voice that is available online, given that I am a professional speaker and podcaster. Anyone with even poor searching skills — let alone AI — can find copious samples where I drone on about something to do with technology for hours. So why get all hot and bothered about my voicemail greeting?
Mirsky said to me, “I don’t believe any vendors are leading the pack in terms of robust defenses against these kinds of LLM-driven threats. Many are still focusing on detecting deepfake or audio defects, which, in my opinion, is increasingly a losing battle as the generative AI models improve.” So maybe changing my voicemail greeting is like putting one’s finger in a very leaky dike. Or perhaps it is a reminder that we need alternative strategies (dare I say a better CAPTCHA? He has one such proposal in another paper here.) So maybe a change of headgear is called for after all.
This is a real concern. My Fidelity account for example uses my voice as a login. I remember when that seemed to be a good step to improve my account security . Not anymore . I haven’t had to phone them in a while, but I should do a test just to find out if they back that up by sending a code to my phone or email address. Which of course only protects me if my email Isn’t hacked or some bad actor hasn’t swapped my phone SIM card. It’s a much bigger problem for you David because you are a speaker and your recorded presentations are not hard to find. Yikes
And then there is this, from a UK mobile operator, to try to waste phone scammer’s time