Meta AI Introduces Omnilingual ASR, Advancing Automatic Speech Recognition Across More Than 1,600 Languages

by Alisa Davidson

Published: November 11, 2025 at 8:40 am Updated: November 11, 2025 at 8:40 am

by Ana

Edited and fact-checked: November 11, 2025 at 8:40 am

In Brief

Meta AI has launched the Omnilingual ASR system, providing speech recognition for over 1,600 languages and released open-source models and a corpus for 350 underserved languages.

Meta AI Introduces Omnilingual ASR, Advancing Automatic Speech Recognition Across More Than 1,600 Languages

Research division of technology company Meta specializing in AI and augmented reality, Meta AI announced the release of the Meta Omnilingual Automatic Speech Recognition (ASR) system.

This suite of models delivers automatic speech recognition for over 1,600 languages, achieving high-quality performance at an unprecedented scale. In addition, Meta AI is open-sourcing Omnilingual wav2vec 2.0, a self-supervised, massively multilingual speech representation model with 7 billion parameters, designed to support a variety of downstream speech tasks.

Alongside these tools, the organization is also releasing the Omnilingual ASR Corpus, a curated collection of transcribed speech from 350 underserved languages, developed in partnership with global collaborators.

Automatic speech recognition has advanced in recent years, achieving near-perfect accuracy for many widely spoken languages. Expanding coverage to less-resourced languages, however, has remained challenging due to the high data and computational demands of existing AI architectures. The Omnilingual ASR system addresses this limitation by scaling the wav2vec 2.0 speech encoder to 7 billion parameters, creating rich multilingual representations from raw, untranscribed speech. Two decoder variants map these representations into character tokens: one using connectionist temporal classification (CTC) and another using a transformer-based approach similar to those in large language models.

This LLM-inspired ASR approach achieves state-of-the-art performance across more than 1,600 languages, with character error rates under 10 for 78% of them, and introduces a more flexible method for adding new languages.

Unlike traditional systems that require expert fine-tuning, Omnilingual ASR can incorporate a previously unsupported language using only a few paired audio-text examples, enabling transcription without extensive data, specialized expertise, or high-end compute. While zero-shot results do not yet match fully trained systems, this method provides a scalable way to bring underserved languages into the digital ecosystem.

Meta AI To Advance Speech Recognition With Omnilingual ASR Suite And Corpus

The research division has released a comprehensive suite of models and a dataset designed to advance speech technology for any language. Building on FAIR’s prior research, Omnilingual ASR includes two decoder variants, ranging from lightweight 300M models for low-power devices to 7B models offering high accuracy across diverse applications. The general-purpose wav2vec 2.0 speech foundation model is also available in multiple sizes, enabling a wide range of speech-related tasks beyond ASR. All models are provided under an Apache 2.0 license, and the dataset is available under CC-BY, allowing researchers, developers, and language advocates to adapt and expand speech solutions using FAIR’s open-source fairseq2 framework in the PyTorch ecosystem.

Omnilingual ASR is trained on one of the largest and most linguistically diverse ASR corpora ever assembled, combining publicly available datasets with community-sourced recordings. To support languages with limited digital presence, Meta AI partnered with local organizations to recruit and compensate native speakers in remote or under-documented regions, creating the Omnilingual ASR Corpus, the largest ultra-low-resource spontaneous ASR dataset to date. Additional collaborations through the Language Technology Partner Program brought together linguists, researchers, and language communities worldwide, including partnerships with Mozilla Foundation’s Common Voice and Lanfrica/NaijaVoices. These efforts provided deep linguistic insight and cultural context, ensuring the technology meets local needs while empowering diverse language communities globally.

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

Alisa Davidson