Meta Unveils Voicebox, Text-to-Speech Generative AI Tool

by Agne Cimerman

Published: June 19, 2023 at 12:00 pm Updated: June 19, 2023 at 11:32 am

by William Savage

Edited and fact-checked: June 19, 2023 at 12:00 pm

In Brief

Voicebox, Meta’s latest innovation, is a revolutionary text-to-speech generative AI tool that transforms written text into realistic speech.

With capabilities comparable to renowned models like ChatGPT and Dall-E, Voicebox can perform various speech generation tasks, such as content editing, sampling, style conversion, noise removal, text-to-speech synthesis, and cross-lingual style transfer.

Voicebox isn’t publicly available yet.

Voicebox is Meta’s breakthrough in generative speech AI, which transforms text into realistic and expressive speech. The AI tool, which works similarly to ChatGPT or Dall-E, is an advanced AI model capable of performing speech generation tasks like content editing, sampling, and style conversion, even without specific training, thanks to in-context learning.

Meta Unveils Voicebox, Text-to-Speech Generative AI Tool

It sets itself apart from other text-to-speech models by excelling in various tasks such as noise removal, text-to-speech synthesis and cross-lingual style transfer, pushing the boundaries of synthetic speech generation. Voicebox also surpasses current models in speed, operating at a 20 times faster rate.

Voicebox underwent extensive training using a dataset comprising over 50,000 hours of unfiltered audio. The AI model was trained using Meta’s innovative “Flow Matching” technique, a versatile alternative to diffusion-based learning methods employed by other generative models.

Meta’s training dataset includes recorded speech and transcripts from public-domain audiobooks in multiple languages, such as English, French, Spanish, German, Polish, and Portuguese.

According to Mark Zuckerberg, Voicebox is “the first ever generative AI speech model that can do tasks it wasn’t specifically trained on.”

Source: Mark Zuckerberg

In the future, Voicebox and similar AI models can provide natural-sounding voices for virtual assistants and non-player characters in the metaverse. They can also enable visually impaired individuals to hear written messages in familiar voices through AI and offer creators easy tools for editing audio tracks in videos.

Voicebox and the Dangers of Deepfakes

However, Voicebox might pose some ethical and social challenges, especially in the context of deepfakes. Deepfakes, created by AI models, are synthetic media that manipulate a person’s voice, often maliciously. Voicebox could create convincing deepfakes that impersonate someone’s voice or make them say things they never said. This could have serious implications for privacy, security, and trust.

Microsoft’s president Brad Smith raised concerns last month about the harm caused by deepfakes. He emphasized the need for mechanisms to differentiate between genuine and AI-generated material, particularly in cases of malicious intent. He called for accountability and safety measures to maintain human control over critical infrastructure governed by AI systems. Furthermore, he proposed a system where developers monitor usage and provide transparency to identify manipulated videos, similar to a KYC approach.

Meta claims that it’s aware of the potential harm that Voicebox could cause and that the company is working on an effective way to distinguish between authentic speech and audio generated by Voicebox. While Voicebox is still undergoing development and not currently accessible to the public, Meta acknowledges the potential risks associated with advanced AI technology.

Read more:

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Agne is a journalist who covers the latest trends and developments in the metaverse, AI, and Web3 industries for the Metaverse Post. Her passion for storytelling has led her to conduct numerous interviews with experts in these fields, always seeking to uncover exciting and engaging stories. Agne holds a Bachelor’s degree in literature and has an extensive background in writing about a wide range of topics including travel, art, and culture. She has also volunteered as an editor for the animal rights organization, where she helped raise awareness about animal welfare issues. Contact her on [email protected].

Agne Cimerman