Google AI Announced the First-ever Text-to-Music Generator AudioLM

by Damir Yalalov

Published: October 21, 2022 at 12:09 pm Updated: October 21, 2022 at 12:10 pm

In Brief

AudioLM can produce music just by listening to sounds

Mubert AI to continue human speech and piano music

With GPT-3 and others, the idea of generative AI has a good chance of moving forward. We also discovered the concepts of inpainting and outpainting; AI skillfully completes the images while keeping the theme and the style. What about music?

And yet again! Since all of this is based on AI language models that retain meaning, it was just a matter of time before this technology would be applied to music. And now the time has come.

Google AI announced first-ever text-to-music generator AudioLM

According to recent Google research, a new framework for audio production called AudioLM may be taught to create realistic speech and piano music simply by listening to sounds. Due to its long-term consistency and excellent fidelity, AudioLM surpasses earlier systems and advances audio creation with applications in voice synthesis and computer-assisted music.

We have developed a system to recognize AudioLM-produced synthetic sounds using the same AI concepts that underpinned the creation of our previous models.

AudioLM from Google AI can extend an acoustic passage while keeping “intent.” As of now, it has been trained to continue human speech and piano music, based on a limited sample of input data. Check the sample below.

The criteria for speech were straightforward: Listeners were asked to assess whether the continuation sounded like human speech. With the music, it was discovered that the “continuation” of the section supplied for input is far superior in quality than all current music generators from scratch, such as JukeBox. With a suggestion at the input, the AI continues the music considerably better.

Human raters listened to audio samples to confirm the results. They determined whether they were hearing a real continuation of a human voice that had been recorded or an artificial voice produced by AudioLM. Their data indicate a 51.2% success rate. As a result, it will be challenging for the average listener to distinguish between speech produced by AudioLM and actual human speech.

Does text-to-music technology alter the music business?

A text-to-music generator based on the Mubert API was recently announced by another AI model, Mubert. Mubert creates a different set of sounds for each request that you send. The likelihood of a repeat is really slim. Music is created when a request is made; it is not pulled from a database of finished tunes. How truly generative this music is is a common question.

Sounds are chosen before being created. Both the input prompt and the Mubert API tags are encoded to a transformer neural network’s latent space vector. The closest tags vector for each query is then chosen, and the accompanying tags are transmitted to our API to create music. No neural network was used to construct any of the sounds (separate loops for bass, leads, etc.); all of the sounds were produced by musicians and sound designers.

Mubert’s next significant step is to take items from the current world, such as photos, movies, scenarios, and presentations, and create the music of the world around you.

Here’s what you can get by recklessly putting text prompts into the mouth of the musical Mubert AI:

This is the initial stage in the process of building a more sophisticated and precise generating algorithm, but this will take time and money.

However, text-to-music technology is already available, so you can generate albums in bulk by switching out “input prompt” for “write a random prompt script.” Seems artists are no longer required.

Read more related news:

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet.

Damir Yalalov