Qwen Rolls Out New Vision‑Language Model To Advance Coding, Reasoning, And Multimodal AI Performance

by Alisa Davidson

Published: February 16, 2026 at 9:30 am Updated: February 16, 2026 at 9:04 am

by Anastasiia O

Edited and fact-checked: February 16, 2026 at 9:30 am

In Brief

Qwen team has launched the open‑weight Qwen3.5‑397B‑A17B model, introducing major advances in multimodal performance, reinforcement learning, and training efficiency as part of a broader push toward more capable, general‑purpose AI agents.

Qwen Rolls Out New Vision‑Language Model To Advance Coding, Reasoning, And Multimodal AI Performance

Alibaba Cloud’s Qwen team has introduced the first model in its new Qwen3.5 series, unveiling the open‑weight Qwen3.5‑397B‑A17B.

Positioned as a native vision‑language system, the model delivers strong performance across reasoning, coding, agent tasks, and multimodal understanding, reflecting a significant advance in the company’s large‑scale AI development efforts.

The model is built on a hybrid architecture that combines linear attention through Gated Delta Networks with a sparse mixture‑of‑experts design, enabling high efficiency during inference. Although the full system contains 397 billion parameters, only 17 billion are activated for each forward pass, allowing it to maintain high capability while reducing computational cost. The release also expands language and dialect coverage from 119 to 201, broadening accessibility for users and developers worldwide.

Qwen3.5 Marks A Major Leap In Reinforcement Learning And Pretraining Efficiency

The Qwen3.5 series introduces substantial gains over Qwen3, driven largely by extensive reinforcement learning scaling across a wide range of environments. Rather than optimizing for narrow benchmarks, the team focused on increasing task difficulty and generalizability, resulting in improved agent performance across evaluations such as BFCL‑V4, VITA‑Bench, DeepPlanning, Tool‑Decathlon, and MCP‑Mark. Additional results will be detailed in an upcoming technical report.

Pretraining improvements span power, efficiency, and versatility. Qwen3.5 is trained on a significantly larger volume of visual‑text data with strengthened multilingual, STEM, and reasoning content, enabling it to match the performance of earlier trillion‑parameter models. Architectural upgrades—including higher‑sparsity MoE, hybrid attention, stability refinements, and multi‑token prediction—deliver major throughput gains, particularly at extended context lengths of 32k and 256k tokens. The model’s multimodal capabilities are strengthened through early text‑vision fusion and expanded datasets covering images, STEM materials, and video, while a larger 250k vocabulary improves encoding and decoding efficiency across most languages.

The infrastructure behind Qwen3.5 is designed for efficient multimodal training. A heterogeneous parallelism strategy separates vision and language components to avoid bottlenecks, while sparse activation enables near‑full throughput even on mixed text‑image‑video workloads. A native FP8 pipeline reduces activation memory by roughly half and increases training speed by more than 10 percent, maintaining stability at massive token scales.

Reinforcement learning is supported by a fully asynchronous framework capable of handling models of all sizes, improving hardware utilization, load balancing, and fault recovery. Techniques such as FP8 end‑to‑end training, speculative decoding, rollout router replay, and multi‑turn rollout locking help maintain consistency and reduce gradient staleness. The system is built to support large‑scale agent workflows, enabling seamless multi‑turn interactions and broad generalization across environments.

Users can interact with Qwen3.5 through Qwen Chat, which offers Auto, Thinking, and Fast modes depending on the task. The model is also available through Alibaba Cloud’s ModelStudio, where advanced features such as reasoning, web search, and code execution can be enabled through simple parameters. Integration with third‑party coding tools allows developers to adopt Qwen3.5 into existing workflows with minimal friction.

According to the Qwen team, Qwen3.5 establishes a foundation for universal digital agents through its hybrid architecture and native multimodal reasoning. Future development will focus on system‑level integration, including persistent memory for cross‑session learning, embodied interfaces for real‑world interaction, self‑directed improvement mechanisms, and economic awareness for long‑term autonomous operation. The objective is to move beyond task‑specific assistants toward coherent, persistent agents capable of managing complex, multi‑day objectives with reliable, human‑aligned judgment.

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

Alisa Davidson