After Sora In February, OpenAI presents Voice Engine; after the virtual image, voice cloning! The US firm further broadens the scope of applications of artificial intelligence (AI). And relaunches the debate on the supervision of worrying uses.
It's confirmed: OpenAI is moving fast and OpenAI knows how to communicate! Since the end of 2023, the Californian firm seems to be exploring new areas, accelerating the pace of announcements, unveiling a new, accomplished and impressive tool every month. However, some tools, such as those dedicated to creating video images or voice cloning, are already within reach, developed by more discreet but no less innovative start-ups, opening up the field of possibilities and... abuses even more.
At the beginning of this year, Open AI, creator of ChatGPT, is tackling the generation of virtual images from texts, and the cloning of human voices. Recognizing without hesitation the "potential for misuse of synthetic voices, particularly significant in this election year", The American flag bearer of Artificial Intelligence has rushed to announce a series of measures to prevent and detect dishonest or even criminal uses of its tools. With what effectiveness? Explanations:
Sora: or how to easily create videos from prompts
In February, the former San Francisco-based nonprofit unveiled its stunning app for generating videos from simple text.
"Sora can generate videos up to one minute long. It is capable of generating complex scenes with multiple characters, specific types of movements, and precise details of the subject and background."
Open AI
The model understands not only what the user asked for in the prompt, but also how those things exist in the physical world.”
Voice Engine: A 15-second sample is enough to create a voice clone
Tested “in vitro” since 2022 in the OpenAI labs, the tool Voice Engine was presented to the general public at the end of March. This human voice cloner that uses AI and its deep learning algorithms was designed from the language recognition tool Whisper. Well known to experts, Whisper is the product of the exploitation of… 650,000 hours of recorded multilingual languages!
“Today we are sharing insights and preliminary results from a small-scale preview of a model called Voice Engine.”
Open AI
Voice engine uses text input and a single 15-second audio sample to generate natural speech that closely resembles that of the original speaker. It should be noted, the US company was keen to point out, , that a small model with a single 15-second sample can create moving and realistic vocals".
Already many cloned personalities
Enough to create vocal clones indistinguishable from the originals, especially of famous personalities. Examples, created from other tools, have already made the rounds of social networks, sometimes in a virtuous way, like the cloning carried out by Synthesis reproducing Barack Obama's voice to promote awareness of deepfakes. Sometimes in harmful ways, such as the Joe Biden voice clone campaigning on phone during Democratic primary.
Fake conversations from artists like Billie Eilish and Freddie Mercury have also been heard using tools like Looks like AI. Many platforms also offer selections of famous voices from politics, entertainment, sport, etc.
The possibilities now seem endless. And they have multiplied since the advent, last year, of tools such as Elevenlabs, which highlights its ethics, or as Voice.ai, Description, PlayHT, Chinese Streamvoice able to clone a voice in real time, or even Vidnoz, free and easy to use thanks to its pretty celebrity bank… Not to mention Apple's iPhone, which since iOS 17 can clone its owner's voice to replace Siri.
On the proper use of freedoms
In short, everything is becoming possible and the tools are multiplying, becoming more and more accessible through the hundreds of tutorials available on the web and thanks to very affordable, even free, rates, which are stimulating the new cloning business. The world is therefore at a crucial turning point in the use of freedoms. The American Robert Weissman, president of the civil rights protection group Public Citizen, recently called for a surge : "Parliamentarians must act quickly to erect protections or we are heading towards political chaos." Nothing less.
An ethical turning point, but what next?
The proliferation of deepfakes has highlighted the risks posed by the irruption of AI into our daily lives and even, let's risk the term, into our democracies. OpenAI, which is therefore aware of this, has announced that its tool Voice Engine is not currently available to the general public.. It is currently reserved for developers, media, experts in fake news, AI and ethics, all hand-picked and tasked with providing feedback on the extent of the risks of abuse and proposing avenues for a sort of charter for the proper use of cloners.
The San Francisco firm has nevertheless announced that efforts will be deployed quickly to develop countermeasures to detect audio manipulation and raise public awareness of these issues. He also said he is working closely with several governments, particularly those involved in elections that could suffer from malicious uses of these technologies.
First tracks
For his part, following the affair of his vocal clone, Last October, President Biden passed a law to, on the one hand, punish abuses more severely, and on the other hand, encourage research and the fight against these abuses. Not sure that this will be enough because cloning platforms are multiplying and becoming more specialized at high speed.
Some, like Vijay Balasubramaniyan, co-founder and CEO of cybersecurity firm Pindrop, recommend the rapid and systematic installation of audio watermarks or digital signatures in cloning tools to easily identify fakes. An interesting lead but one that we imagine will quickly be circumvented.