News Report SMW Technology
May 30, 2023

SoundStorm: Google Unveils Terrifying AI Tool Capable of Real-Time Voice Replication

SoundStorm: Google Unveils Terrifying AI Tool Capable of Real-Time Voice Replication

Google has introduced its latest breakthrough in artificial intelligence technology with SoundStorm, a cutting-edge model for efficient and non-autoregressive audio generation. With the ability to synthesize dialogues with different voices, SoundStorm opens up new possibilities for applications such as generating audio content from written text and creating realistic podcasts.

Unlike its predecessor AudioLM, SoundStorm employs a novel architecture that generates audio in chunks of 30 seconds, enhancing efficiency. By utilizing bidirectional attention and confidence-based parallel decoding, the model produces high-quality audio while significantly reducing generation time. On Google’s TPU-v4 hardware, SoundStorm can generate 30 seconds of audio in just 0.5 seconds, marking a substantial speed improvement.

SoundStorm’s training was conducted using a massive dataset of 100,000 hours of dialogue, ensuring a robust understanding of spoken language patterns. The model achieves impressive consistency in voice and acoustic conditions while maintaining the audio quality achieved by AudioLM. This breakthrough makes SoundStorm two orders of magnitude faster than its predecessor, demonstrating its potential for scalable audio generation.

One of the key capabilities of SoundStorm is its ability to synthesize natural dialogues by leveraging the text-to-semantic modeling stage of SPEAR-TTS. By providing transcripts with speaker turns and short voice prompts, users can control the spoken content and the voices of the speakers. During testing, SoundStorm demonstrated the ability to synthesize 30-second dialogue segments in just 2 seconds on a single TPU-v4, showcasing its efficiency and versatility.

Voice Prompt

Synthesized Dialogue

When compared to standard baselines, the audio generated by SoundStorm is of equivalent quality to AudioLM and demonstrates superior consistency and acoustic integrity. Notably, when prompted to give a speech sample, the model preserves the speaker’s voice with amazing accuracy, greatly boosting its capacity to generate lifelike dialogue.

While SoundStorm’s capabilities are outstanding, it is critical to recognize and solve possible ethical concerns. The training data for the algorithm may introduce biases relating to accents and voice features. The capacity to imitate voices could be abused for impersonation or to circumvent biometric identification. Google underlines the significance of putting protections in place to prevent such abuse and assuring the detectability of created audio through dedicated classifiers.

Google’s ethical AI principles drive its continuing efforts to address potential hazards and constraints. The organization realizes the need to do a thorough study of training data and the implications for model outputs. They also plan to investigate additional approaches, such as audio watermarking, for detecting synthesized speech to make ethical use of this technology.

  • SoundStorm is a big step forward in AI-powered audio production, providing high-quality and efficient neural audio codec-derived audio representations. Google expects that SoundStorm’s lower memory and processing needs will make audio generation research more accessible to a wider community. Google remains dedicated to preserving responsible AI practices and ensuring the safe and responsible use of SoundStorm and comparable breakthroughs in the field as technology evolves.
  • VALL-E, Microsoft’s latest text-to-speech (TTS) model, is a huge step forward in enhancing how these systems generate voice. VALL-E is a TTS model based on transformers that can generate speech in any voice after only hearing a three-second sample of that voice. This is a big advancement over earlier models, which required a significantly longer training period to develop a new voice.

Read more about AI:

Disclaimer

Any data, text, or other content on this page is provided as general market information and not as investment advice. Past performance is not necessarily an indicator of future results.


The Trust Project is a worldwide group of news organizations working to establish transparency standards.

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

More articles
Damir Yalalov
Damir Yalalov

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

Hot Stories
Join Our Newsletter.
Latest News

20 Most Underrated AI Startups in 2023: Ranked by Funding

AI remains a constant focal point for investors and entrepreneurs alike. While the spotlight often falls on ...

Know More

Ranked: Top 10 Countries by Estimated AI Contribution to Economy by 2030

AI stands at the cusp of a transformative era, poised to reshape virtually every sector and ignite ...

Know More
Join Our Innovative Tech Community

Read More

Read more
Farmville Creator Raises $33M Funding to Develop Blockchain Games
Business News Report
Farmville Creator Raises $33M Funding to Develop Blockchain Games
September 21, 2023
Chainlink Integrates with Arbitrum for Web3 Interoperability and Cross-Chain DApp Development
Business News Report
Chainlink Integrates with Arbitrum for Web3 Interoperability and Cross-Chain DApp Development
September 21, 2023
Microsoft to Launch 365 Copilot AI in November, Adds DALL-E 3 to Bing Chat
News Report Technology
Microsoft to Launch 365 Copilot AI in November, Adds DALL-E 3 to Bing Chat
September 21, 2023
Mesh Raises $22M in Series A to Bolster its Embedded Crypto Platform
Business News Report
Mesh Raises $22M in Series A to Bolster its Embedded Crypto Platform
September 21, 2023
What You
Need to Know

Subscribe To Our Newsletter.
Daily search marketing tidbits for savvy pros.