News Report SMW Technology
May 30, 2023

SoundStorm: Google Unveils Terrifying AI Tool Capable of Real-Time Voice Replication

Google has introduced its latest breakthrough in artificial intelligence technology with SoundStorm, a cutting-edge model for efficient and non-autoregressive audio generation. With the ability to synthesize dialogues with different voices, SoundStorm opens up new possibilities for applications such as generating audio content from written text and creating realistic podcasts.

SoundStorm: Google Unveils Terrifying AI Tool Capable of Real-Time Voice Replication
@Midjourney

Unlike its predecessor AudioLM, SoundStorm employs a novel architecture that generates audio in chunks of 30 seconds, enhancing efficiency. By utilizing bidirectional attention and confidence-based parallel decoding, the model produces high-quality audio while significantly reducing generation time. On Google’s TPU-v4 hardware, SoundStorm can generate 30 seconds of audio in just 0.5 seconds, marking a substantial speed improvement.

SoundStorm’s training was conducted using a massive dataset of 100,000 hours of dialogue, ensuring a robust understanding of spoken language patterns. The model achieves impressive consistency in voice and acoustic conditions while maintaining the audio quality achieved by AudioLM. This breakthrough makes SoundStorm two orders of magnitude faster than its predecessor, demonstrating its potential for scalable audio generation.

One of the key capabilities of SoundStorm is its ability to synthesize natural dialogues by leveraging the text-to-semantic modeling stage of SPEAR-TTS. By providing transcripts with speaker turns and short voice prompts, users can control the spoken content and the voices of the speakers. During testing, SoundStorm demonstrated the ability to synthesize 30-second dialogue segments in just 2 seconds on a single TPU-v4, showcasing its efficiency and versatility.

Voice Prompt

Synthesized Dialogue

When compared to standard baselines, the audio generated by SoundStorm is of equivalent quality to AudioLM and demonstrates superior consistency and acoustic integrity. Notably, when prompted to give a speech sample, the model preserves the speaker’s voice with amazing accuracy, greatly boosting its capacity to generate lifelike dialogue.

While SoundStorm’s capabilities are outstanding, it is critical to recognize and solve possible ethical concerns. The training data for the algorithm may introduce biases relating to accents and voice features. The capacity to imitate voices could be abused for impersonation or to circumvent biometric identification. Google underlines the significance of putting protections in place to prevent such abuse and assuring the detectability of created audio through dedicated classifiers.

Google’s ethical AI principles drive its continuing efforts to address potential hazards and constraints. The organization realizes the need to do a thorough study of training data and the implications for model outputs. They also plan to investigate additional approaches, such as audio watermarking, for detecting synthesized speech to make ethical use of this technology.

  • SoundStorm is a big step forward in AI-powered audio production, providing high-quality and efficient neural audio codec-derived audio representations. Google expects that SoundStorm’s lower memory and processing needs will make audio generation research more accessible to a wider community. Google remains dedicated to preserving responsible AI practices and ensuring the safe and responsible use of SoundStorm and comparable breakthroughs in the field as technology evolves.
  • VALL-E, Microsoft’s latest text-to-speech (TTS) model, is a huge step forward in enhancing how these systems generate voice. VALL-E is a TTS model based on transformers that can generate speech in any voice after only hearing a three-second sample of that voice. This is a big advancement over earlier models, which required a significantly longer training period to develop a new voice.

Read more about AI:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

More articles
Damir Yalalov
Damir Yalalov

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

Hot Stories
Join Our Newsletter.
Latest News

Supply and Demand Zones

Cryptocurrency, like any other currency, is a financial instrument based on the fundamental economic principles of supply ...

Know More

Top 10 Crypto Wallets in 2024

With the current fast-growing crypto market, the significance of reliable and secure wallet solutions cannot be emphasized ...

Know More
Join Our Innovative Tech Community
Read More
Read more
Fidelity Updates Spot Ethereum ETF to Incorporate Staking Services
Business News Report
Fidelity Updates Spot Ethereum ETF to Incorporate Staking Services
March 19, 2024
Crypto Market Faces Decline as Bitcoin Hovers Near $65,000 Amid Spike in Liquidations, Market Correction, and Panic Selling
News Markets News Report
Crypto Market Faces Decline as Bitcoin Hovers Near $65,000 Amid Spike in Liquidations, Market Correction, and Panic Selling
March 19, 2024
Web3 Infra Provider COTI Launches $10M V2 Airdrop Campaign by the End of March
Markets News Report
Web3 Infra Provider COTI Launches $10M V2 Airdrop Campaign by the End of March
March 18, 2024
Solana-Based Slerf (SLERF) Founder Pledges Compensation Efforts After Burning $10M Worth of Presale Funds
Markets News Report
Solana-Based Slerf (SLERF) Founder Pledges Compensation Efforts After Burning $10M Worth of Presale Funds
March 18, 2024
What You
Need to Know

Subscribe To Our Newsletter.
Daily search marketing tidbits for savvy pros.