The Terrifying A.I. Scam That Uses Your Loved One’s Voice – The New Yorker

On a recent night, a woman named Robin was asleep next to her husband, Steve, in their Brooklyn home, when her phone buzzed on the bedside table. Robin is in her mid-thirties with long, dirty-blond hair. She works as an interior designer, specializing in luxury homes. The couple had gone out to a natural-wine bar in Cobble Hill that evening, and had come home a few hours earlier and gone to bed. Their two young children were asleep in bedrooms down the hall. Im always, like, kind of one ear awake, Robin told me, recently. When her phone rang, she opened her eyes and looked at the caller I.D. It was her mother-in-law, Mona, who never called after midnight. Im, like, maybe its a butt-dial, Robin said. So I ignore it, and I try to roll over and go back to bed. But then I see it pop up again.

She picked up the phone, and, on the other end, she heard Monas voice wailing and repeating the words I cant do it, I cant do it. I thought she was trying to tell me that some horrible tragic thing had happened, Robin told me. Mona and her husband, Bob, are in their seventies. Shes a retired party planner, and hes a dentist. They spend the warm months in Bethesda, Maryland, and winters in Boca Raton, where they play pickleball and canasta. Robins first thought was that there had been an accident. Robins parents also winter in Florida, and she pictured the four of them in a car wreck. Your brain does weird things in the middle of the night, she said. Robin then heard what sounded like Bobs voice on the phone. (The family members requested that their names be changed to protect their privacy.) Mona, pass me the phone, Bobs voice said, then, Get Steve. Get Steve. Robin took thisthat they didnt want to tell her while she was aloneas another sign of their seriousness. She shook Steve awake. I think its your mom, she told him. I think shes telling me something terrible happened.

Steve, who has close-cropped hair and an athletic build, works in law enforcement. When he opened his eyes, he found Robin in a state of panic. She was screaming, he recalled. I thought her whole family was dead. When he took the phone, he heard a relaxed male voicepossibly Southernon the other end of the line. Youre not gonna call the police, the man said. Youre not gonna tell anybody. Ive got a gun to your moms head, and Im gonna blow her brains out if you dont do exactly what I say.

Steve used his own phone to call a colleague with experience in hostage negotiations. The colleague was muted, so that he could hear the call but wouldnt be heard. You hear this??? Steve texted him. What should I do? The colleague wrote back, Taking notes. Keep talking. The idea, Steve said, was to continue the conversation, delaying violence and trying to learn any useful information.

I want to hear her voice, Steve said to the man on the phone.

The man refused. If you ask me that again, Im gonna kill her, he said. Are you fucking crazy?

O.K., Steve said. What do you want?

The man demanded money for travel; he wanted five hundred dollars, sent through Venmo. It was such an insanely small amount of money for a human being, Steve recalled. But also: Im obviously gonna pay this. Robin, listening in, reasoned that someone had broken into Steves parents home to hold them up for a little cash. On the phone, the man gave Steve a Venmo account to send the money to. It didnt work, so he tried a few more, and eventually found one that did. The app asked what the transaction was for.

Put in a pizza emoji, the man said.

After Steve sent the five hundred dollars, the man patched in a female voicea girlfriend, it seemedwho said that the money had come through, but that it wasnt enough. Steve asked if his mother would be released, and the man got upset that he was bringing this up with the woman listening. Whoa, whoa, whoa, he said. Baby, Ill call you later. The implication, to Steve, was that the woman didnt know about the hostage situation. That made it even more real, Steve told me. The man then asked for an additional two hundred and fifty dollars to get a ticket for his girlfriend. Ive gotta get my baby mama down here to me, he said. Steve sent the additional sum, and, when it processed, the man hung up.

By this time, about twenty-five minutes had elapsed. Robin cried and Steve spoke to his colleague. You guys did great, the colleague said. He told them to call Bob, since Monas phone was clearly compromised, to make sure that he and Mona were now safe. After a few tries, Bob picked up the phone and handed it to Mona. Are you at home? Steve and Robin asked her. Are you O.K.?

Mona sounded fine, but she was unsure of what they were talking about. Yeah, Im in bed, she replied. Why?

Artificial intelligence is revolutionizing seemingly every aspect of our lives: medical diagnosis, weather forecasting, space exploration, and even mundane tasks like writing e-mails and searching the Internet. But with increased efficiencies and computational accuracy has come a Pandoras box of trouble. Deepfake video content is proliferating across the Internet. The month after Russia invaded Ukraine, a video surfaced on social media in which Ukraines President, Volodymyr Zelensky, appeared to tell his troops to surrender. (He had not done so.) In early February of this year, Hong Kong police announced that a finance worker had been tricked into paying out twenty-five million dollars after taking part in a video conference with who he thought were members of his firms senior staff. (They were not.) Thanks to large language models like ChatGPT, phishing e-mails have grown increasingly sophisticated, too. Steve and Robin, meanwhile, fell victim to another new scam, which uses A.I. to replicate a loved ones voice. Weve now passed through the uncanny valley, Hany Farid, who studies generative A.I. and manipulated media at the University of California, Berkeley, told me. I can now clone the voice of just about anybody and get them to say just about anything. And what you think would happen is exactly whats happening.

Robots aping human voices are not new, of course. In 1984, an Apple computer became one of the first that could read a text file in a tinny robotic voice of its own. Hello, Im Macintosh, a squat machine announced to a live audience, at an unveiling with Steve Jobs. It sure is great to get out of that bag. The computer took potshots at Apples main competitor at the time, saying, Id like to share with you a maxim I thought of the first time I met an I.B.M. mainframe: never trust a computer you cant lift. In 2011, Apple released Siri; inspired by Star Treks talking computers, the program could interpret precise commandsPlay Steely Dan, say, or, Call Momand respond with a limited vocabulary. Three years later, Amazon released Alexa. Synthesized voices were cohabiting with us.

Still, until a few years ago, advances in synthetic voices had plateaued. They werent entirely convincing. If Im trying to create a better version of Siri or G.P.S., what I care about is naturalness, Farid explained. Does this sound like a human being and not like this creepy half-human, half-robot thing? Replicating a specific voice is even harder. Not only do I have to sound human, Farid went on. I have to sound like you. In recent years, though, the problem began to benefit from more money, more dataimportantly, troves of voice recordings onlineand breakthroughs in the underlying software used for generating speech. In 2019, this bore fruit: a Toronto-based A.I. company called Dessa cloned the podcaster Joe Rogans voice. (Rogan responded with awe and acceptance on Instagram, at the time, adding, The future is gonna be really fucking weird, kids.) But Dessa needed a lot of money and hundreds of hours of Rogans very available voice to make their product. Their success was a one-off.

In 2022, though, a New York-based company called ElevenLabs unveiled a service that produced impressive clones of virtually any voice quickly; breathing sounds had been incorporated, and more than two dozen languages could be cloned. ElevenLabss technology is now widely available. You can just navigate to an app, pay five dollars a month, feed it forty-five seconds of someones voice, and then clone that voice, Farid told me. The company is now valued at more than a billion dollars, and the rest of Big Tech is chasing closely behind. The designers of Microsofts Vall-E cloning program, which dbuted last year, used sixty thousand hours of English-language audiobook narration from more than seven thousand speakers. Vall-E, which is not available to the public, can reportedly replicate the voice and acoustic environment of a speaker with just a three-second sample.

Voice-cloning technology has undoubtedly improved some lives. The Voice Keeper is among a handful of companies that are now banking the voices of those suffering from voice-depriving diseases like A.L.S., Parkinsons, and throat cancer, so that, later, they can continue speaking with their own voice through text-to-speech software. A South Korean company recently launched what it describes as the first AI memorial service, which allows people to live in the cloud after their deaths and speak to future generations. The company suggests that this can alleviate the pain of the death of your loved ones. The technology has other legal, if less altruistic, applications. Celebrities can use voice-cloning programs to loan their voices to record advertisements and other content: the College Football Hall of Famer Keith Byars, for example, recently let a chicken chain in Ohio use a clone of his voice to take orders. The film industry has also benefitted. Actors in films can now speak other languagesEnglish, say, when a foreign movie is released in the U.S. That means no more subtitles, and no more dubbing, Farid said. Everybody can speak whatever language you want. Multiple publications, including The New Yorker, use ElevenLabs to offer audio narrations of stories. Last year, New Yorks mayor, Eric Adams, sent out A.I.-enabled robocalls in Mandarin and Yiddishlanguages he does not speak. (Privacy advocates called this a creepy vanity project.)

But, more often, the technology seems to be used for nefarious purposes, like fraud. This has become easier now that TikTok, YouTube, and Instagram store endless videos of regular people talking. Its simple, Farid explained. You take thirty or sixty seconds of a kids voice and log in to ElevenLabs, and pretty soon Grandmas getting a call in Grandsons voice saying, Grandma, Im in trouble, Ive been in an accident. A financial request is almost always the end game. Farid went on, And heres the thing: the bad guy can fail ninety-nine per cent of the time, and they will still become very, very rich. Its a numbers game. The prevalence of these illegal efforts is difficult to measure, but, anecdotally, theyve been on the rise for a few years. In 2020, a corporate attorney in Philadelphia took a call from what he thought was his son, who said he had been injured in a car wreck involving a pregnant woman and needed nine thousand dollars to post bail. (He found out it was a scam when his daughter-in-law called his sons office, where he was safely at work.) In January, voters in New Hampshire received a robocall call from Joe Bidens voice telling them not to vote in the primary. (The man who admitted to generating the call said that he had used ElevenLabs software.) I didnt think about it at the time that it wasnt his real voice, an elderly Democrat in New Hampshire told the Associated Press. Thats how convincing it was.

Read the original:
The Terrifying A.I. Scam That Uses Your Loved One's Voice - The New Yorker

Related Posts

Comments are closed.