News Report Technology
January 29, 2026

Qwen Open-Sources Advanced ASR And Forced Alignment Models With Multi-Language Capabilities

In Brief

Alibaba Cloud has open-sourced its Qwen3-ASR and Qwen3-ForcedAligner AI models, delivering state-of-the-art speech recognition and forced alignment performance across multiple languages and challenging acoustic conditions.

Qwen Open-Sources Advanced ASR And Forced Alignment Models With Multi-Language Capabilities

Alibaba Cloud announced that it has made its Qwen3-ASR and Qwen3-ForcedAligner AI models open-source, offering advanced tools for speech recognition and forced alignment. 

The Qwen3-ASR family includes two all-in-one models, Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and transcription across 52 languages and accents, leveraging large-scale speech data and the Qwen3-Omni foundation model. 

Internal testing indicates that the 1.7B model delivers state-of-the-art accuracy among open-source ASR systems, while the 0.6B version balances performance and efficiency, capable of transcribing 2,000 seconds of speech in one second with high concurrency. 

The Qwen3-ForcedAligner-0.6B model uses a non-autoregressive LLM approach to align text and speech in 11 languages, outperforming leading force-alignment solutions in both speed and accuracy. 

Alibaba Cloud has also released a comprehensive inference framework under the Apache 2.0 license, supporting streaming, batch processing, timestamp prediction, and fine-tuning, aimed at accelerating research and practical applications in audio understanding.

Qwen3-ASR And Qwen3-ForcedAligner Models Demonstrate Leading Accuracy And Efficiency

Alibaba Cloud has released performance results for its Qwen3-ASR and Qwen3-ForcedAligner models, demonstrating leading accuracy and efficiency across diverse speech recognition tasks. 

The Qwen3-ASR-1.7B model achieves state-of-the-art results among open-source systems, outperforming commercial APIs and other open-source models in English, multilingual, and Chinese dialect recognition, including Cantonese and 22 regional variants. 

It maintains reliable accuracy in challenging acoustic conditions, such as low signal-to-noise environments, child or elderly speech, and even singing voice transcription, achieving average word error rates of 13.91% in Chinese and 14.60% in English with background music.

The smaller Qwen3-ASR-0.6B balances accuracy and efficiency, delivering high throughput and low latency under high concurrency, capable of transcribing up to five hours of speech in online asynchronous mode at a concurrency of 128. 

Meanwhile, the Qwen3-ForcedAligner-0.6B outperforms leading end-to-end forced alignment models including Nemo-Forced-Aligner, WhisperX, and Monotonic-Aligner, offering superior language coverage, timestamp accuracy, and support for varied speech and audio lengths.

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Alisa, a dedicated journalist at the MPost, specializes in crypto, AI, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

More articles
Alisa Davidson
Alisa Davidson

Alisa, a dedicated journalist at the MPost, specializes in crypto, AI, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

Hot Stories
Join Our Newsletter.
Latest News

How Minmax Is Building The Professional AI Trading Terminal Prediction Markets Still Lack In 2026

Minmax processed roughly $100,000 in volume in the first three days of June, most of it through ...

Know More

The Calm Before The Solana Storm: What Charts, Whales, And On-Chain Signals Are Saying Now

Solana has demonstrated strong performance, driven by increasing adoption, institutional interest, and key partnerships, while facing potential ...

Know More
Read More
Read more
Gate Update: From Commodity Futures To World Cup Predictions — Gate Reports Growth Across All Fronts
Digest News Report Technology
Gate Update: From Commodity Futures To World Cup Predictions — Gate Reports Growth Across All Fronts
June 12, 2026
Glassnode: Bitcoin Options Market Shows Initial Selloff Shock Has Been Absorbed
Markets News Report Technology
Glassnode: Bitcoin Options Market Shows Initial Selloff Shock Has Been Absorbed
June 12, 2026
The Sponsorship Is The Deployment: Sport And The New Logic Of AI Integration
Opinion Lifestyle Technology
The Sponsorship Is The Deployment: Sport And The New Logic Of AI Integration
June 12, 2026
Morgan Stanley, Visa & Flutterwave: Crypto Partnerships From June’s 2nd Week
Business News Report Technology
Morgan Stanley, Visa & Flutterwave: Crypto Partnerships From June’s 2nd Week
June 12, 2026