Meta has developed an AI-powered text-to-speech technology that can identify 4,000 languages.
The project’s aim is to preserve languages.
The company is using the Bible and other religious texts to train its Massively Multilingual Speech models.
The Trust Project is a worldwide group of news organizations working to establish transparency standards.
To improve your local-language experience, sometimes we employ an auto-translation plugin. Please note auto-translation may not be accurate, so read original article for precise information.
Tech giant Meta announced a new AI-powered text-to-speech tool. According to the announcement, it can identify more than 4,000 languages. The initiative aims to preserve languages. Notably, the company is using religious texts and the Bible to do so.
“Collecting audio data for thousands of languages was our first challenge because the largest existing speech datasets cover 100 languages at most. To overcome this, we turned to religious texts, such as the Bible, that have been translated into many different languages and whose translations have been widely studied for text-based language translation research,” writes Meta in a blog post.
According to the company, the original data is obtained from the Bible. In addition, the Meta AI team got audio recordings and text from FaithComesByHearing.com, GoTo.Bible, and Bible.com.
Meta says it has recorded more than 6,255 languages and dialects in the project, including Bible stories, evangelistic messages, scripture readings, and song recordings. It also states that its models work equally well for women’s voices, even though readings usually feature men’s voices.
Notably, the data of readings of the New Testament provides approximately 32 hours of readings per language. Overall, the dataset features over 1,100 languages. According to Christian ethicists that advised Meta AI on this project, most Christians do not consider the New Testament and its translations too sacred to be used in machine learning. The same applies to other religious texts.
“While the content of the audio recordings is religious, our analysis shows that this doesn’t bias the model to produce more religious language,” states the blog post.
So, the religious training data would not bias the systems into a particular point of view. The systems will not produce religious-style text either.
Read more related articles:
- OpenAI Will No Longer Default to Using Customer Data for Training Its Models
- Google Overcomes Meta by Launching a New Text-to-Video AI Generator, Imagen Video
- Meta Introduces Segment Anything, Its New AI Model for Image Segmentation
Any data, text, or other content on this page is provided as general market information and not as investment advice. Past performance is not necessarily an indicator of future results.