Inception Labs Launches Mercury 2, Diffusion-Based Reasoning Model Achieving Over 1,000 Tokens Per Second

by Alisa Davidson

Published: February 26, 2026 at 4:38 am Updated: February 26, 2026 at 4:38 am

by Anastasiia O

Edited and fact-checked: February 26, 2026 at 4:38 am

In Brief

Inception Labs has launched Mercury 2, a diffusion-based reasoning model capable of generating over 1,000 tokens per second, three times faster than comparable models.

Inception Labs Unveils Mercury 2: A Diffusion-Based LLM Delivering Over 1,000 Tokens Per Second For Low-Latency AI Applications

Inception Labs, an AI startup, has launched Mercury 2, a diffusion-based Large Language Model (LLM) designed to significantly accelerate reasoning tasks in production AI applications.

Unlike traditional autoregressive models that generate text sequentially, Mercury 2 uses a parallel refinement process, producing multiple tokens simultaneously and converging over a small number of steps, enabling speeds of over 1,000 tokens per second on NVIDIA Blackwell GPUs—approximately three times faster than competing models in the same price range.

The model is optimized for real-time responsiveness in complex AI workflows, where latency compounds across multiple inference calls, retrieval pipelines, and agentic loops. Mercury 2 maintains high reasoning quality while reducing latency, allowing developers, voice AI systems, search engines, and other interactive applications to operate at reasoning-grade performance without the delays associated with sequential generation. It supports features such as tunable reasoning, 128K token context windows, schema-aligned JSON output, and native tool integration, providing flexibility for a range of production deployments.

Mercury 2 Enables Low-Latency AI Across Coding, Voice, And Search Workflows

The report highlights several use cases where low-latency reasoning is critical. In coding and editing workflows, Mercury 2 delivers rapid autocomplete and next-edit suggestions that integrate seamlessly with developers’ thought processes. In agentic workflows, the model allows for more inference steps without exceeding latency budgets, improving the quality and depth of automated decision-making. Voice-based AI and interactive applications benefit from its ability to generate reasoning-quality responses within natural speech cadences, enhancing user experiences in real-time conversation scenarios. Additionally, Mercury 2 supports multi-hop search and retrieval pipelines, enabling rapid summarization, reranking, and reasoning without compromising response times.

Early adopters have noted significant improvements in throughput and user experience. Mercury 2 has been described as at least twice as fast as GPT-5.2 while maintaining competitive quality, with applications spanning real-time transcript cleanup, interactive human-computer interfaces, autonomous advertising optimization, and voice-enabled AI avatars.

The model is compatible with the OpenAI API, allowing integration into existing stacks without extensive modification, and Inception Labs offers support for enterprise evaluations, performance validation, and workload-specific deployment guidance. Mercury 2 represents a step forward in diffusion-based LLMs, redefining the balance between reasoning quality and latency in production AI environments.

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

Alisa Davidson