This startup wants to deepfake clone your voice and sell it to the highest bidder – Digital Trends

Theres a video that pops up periodically on my YouTube feed. Its a conversation between rappers Snoop Dogg and 50 Cent bemoaning the fact that, compared to their generation, all modern hip-hop artists apparently sound the same. When a person decides to be themselves, they offer something no-one else can be, says 50 Cent. Yeah, cos once you be you who can be you but you? Snoop responds.

When the video was uploaded in October 2014, that may have broadly been true. But just a few years later it certainly isnt. In a world of audio deepfakes, its possible to train an A.I. to sound eerily similar to another person by feeding it an audio corpus consisting of hours of their spoken data. The results are unnervingly accurate.

Public figures like the rapper Jay-Z and the psychologist Jordan Peterson have already complained about people misappropriating their voices by creating audio deepfakes and then making them say silly things on the internet. Wake up, wrote Peterson. The sanctity of your voice, and your image, is at serious risk. Those are just the mischievous cases. In others,the results can tip over into un-nuanced criminality. In one 2019 incident, criminals used an audio deepfake to impersonate the voice of the CEO of an energy company and persuade an underling over the phone to urgently transfer $243,000 to a bank account.

Veritone, an A.I. company that creates smart tools for labeling media for the entertainment industry, is putting the audio deepfake power back in the hands (or, err, the throats) of those to whom it rightly belongs. This month, the company announced Marvel.ai, what company president Ryan Steelberg described to Digital Trends as a complete voice-as-a-service solution. For a fee, Veritone will build an A.I. model that sounds just like you (or, more likely, a famous person with an immediately recognizable voice), which can then be licensed out on loan like a high-tech version of Ariels voice-as-collateral bargain from The Little Mermaid.

Your voice is just as valuable as any other content or brand attribute that you have, said Steelberg. [Its on a level with] your name and likeness, your face, your signature, or a song youve written or piece of content youve created.

Certain individuals have, of course, long sold their voices in the form of recording commercials or voiceovers, singing songs, and countless other forms of monetization. But these endeavors all required the person to actually say the words. What Veritones solution promises to do is to make this individually scalable.

What if, for instance, it was possible for Kevin Hart to license his voice out to a luxury brand that could then use it to create personalized ads featuring the name of the viewer, the location of their nearest brick-and-mortar sales outlet, and the particular product they could be most likely to buy? Rather than spending literally days in the recording booth, A.I. could allow this to be done with little more (on Harts part, at least) than signing on the dotted line to agree for his voice likeness to be harnessed by said third party. While he was off shooting a movie, or doing a comedy tour, or taking a vacation, or even sleeping, his digital voice could be raking in the cash.

We can repurpose a lot, Steelberg explained, regarding the training process. People who are already speaking a ton, if theyre producing a podcast or in the media, theres a lot of data out there. We probably have a ton of it already if they happen to be a customer of ours.

What we find so fascinating about this new category of A.I. is the extensibility and the variability.

Steelberg said that the voice-as-a-service idea occurred to Veritone several years ago. However, at the time he was unconvinced that machine learning models were able to create the hyper-realistic synthetic voices he was looking for. This is especially important when it comes to voices we know intimately, even if weve never actually met the speaker in question. The results could be some kind of audible uncanny valley, with every wrong sound alerting listeners to the fact that theyre listening to a fake. But here in 2021 he is convinced that things have advanced to the point where this is now possible. Hence Marvel.ai.

Steelberg speaks in excited buzzwords about the massive potential of the technology, talking up its possible plethora of modalities of execution. Veritone can create models for text-to-speech. It can also build models for speech-to-speech, whereby a voice actor can drive a vocal performance by reading the words with suitable inflection and then having the finished voice overlaid at the end like a Snapchat filter. The company can also fingerprint each voice so it can tell if a piece of apparently real audio that pops up someplace was created using its technology.

The more you think about it youll literally come up with 50 more [possible use-cases], he said. What we find so fascinating about this new category of A.I. is the extensibility and the variability.

Consider some others. A famous athlete might be a god on the basketball court, but a devil when it comes to reading lines in a script in a way that sounds natural. Using Veritones technology, their part in video game cutscenes or reading an audio book of their memoir (which they may also not have written) could be performed by a voice actor, which is then digitally tweaked to sound like the athlete. As another possibility, a movie could be translated for other countries with the same actor voice now reading the lines in French, Mandarin, or any other one of a number of languages, even if the actor doesnt actually speak them.

A big question hanging over all of this, of course, is how members of the public are going to respond to it all. This is the tricky, unpredictable bit. Celebrities today must play a complex role: Both larger-than-life figures worthy of having their face plastered on billboards, and also relatable individuals who have relationship problems, tweet about watching TV in their pajamas, and make silly faces when they eat hot sauce.

What happens, then, when ads appear that not only feature a celebrity reading lines, but in cases when we know that said performer never actually said those lines, but rather had their voice programmatically utilized to bring us a targeted ad? Steelberg said that it is little different to a celebrity handing over control of their social media to a third party account manager. If we see Taylor Swift tweet, we know that its quite possibly not Taylor herself tapping out the message, especially if its an endorsement or piece of promotional content.

But voice is, in a very real way, different, precisely because its more personal. Especially if its accompanied by a degree of personalization, which is one of the use-cases that makes the most sense. The truth is that, to quote the screenwriter William Goldman, nobody knows what the public response will be precisely because nobody has done exactly this before.

Its going to run the spectrum, right? Steelberg said. [Some] people are going to say, Im going to use this tool a little bit to augment my day to help me save time. Others are going to say, full-blown, I want my voice everywhere to extend my brand, and Im going to license it out.

His best guess is that acceptance will be on a case-by-case basis. You need to be in tune with the reaction of your audience, and if you see things are working or not working, he said. They may love it. They may say, You know what? I love the fact that youre putting out 10 times more content or more personal content to me, even though I know you used synthetic content to augment it. Thank you. Thank you.

As for the future? Steelberg said that We want to work with all the major talent agencies. We think anybody who is in the business of making money around a scarce brand should be thinking about their voice strategy.

And dont expect it to remain purely about audio, either. Weve always been fascinated by the potential of using synthetic content to either extend, augment, or potentially completely replace some of the legacy forms of content production, he continued. Be that in an audio sense or, ultimately in the future, a video sense.

Thats right: Once it has cornered the market in the world of audio deepfakes, Veritone plans to go one step further and enter the world of fully realized virtual avatars that both sound and look indistinguishable from their source.

Suddenly those personalized ads from Minority Report sound a whole lot less like science fiction.

Read more:
This startup wants to deepfake clone your voice and sell it to the highest bidder - Digital Trends

Related Posts

Comments are closed.