O.XYZ’s Next Leap: From Wafer-Scale Chips To Routing Intelligence

by Alisa Davidson

Published: October 27, 2025 at 2:00 am Updated: November 11, 2025 at 5:12 am

by Ana

Edited and fact-checked: October 27, 2025 at 2:00 am

In Brief

After discovering the limits of single-chip performance, Ahmad Shadid is redefining what “fast” means in AI — transforming its Cerebras-powered OCEAN engine into an intelligent routing platform serving 100,000 models.

O.XYZ Sets Sights On AGI With OCEAN And ORI, Integrating 100,000 Models Into Unified AI Platform

Independent AI developer O.XYZ introduced OCEAN earlier this year, a next-generation decentralized AI search engine powered by Cerebras CS-3 wafer-scale processors. Designed to deliver performance up to ten times faster than ChatGPT, OCEAN aimed to redefine both consumer and enterprise AI experiences. With fast response times, integrated voice interaction, and a decentralized framework, the platform marked an advancement in global AI accessibility and performance.

OCEAN’s defining feature was speed and real-time responsiveness, which stemmed from its hardware design.

Ahmad Shadid, founder of O.XYZ and IO, noted that the use of Cerebras’s advanced computing architecture played a key role in achieving such high performance. The Cerebras CS-3 chip, also known as the Wafer Scale Engine (WSE-3), integrates 900,000 AI-optimized cores and four trillion transistors onto a single chip, enabling scalable performance without the need for complex distributed programming typical of GPU-based systems. This architecture allowed models ranging from one billion to 24 trillion parameters to run seamlessly without code modification, reducing latency and improving overall efficiency.

With a memory bandwidth of 21 PB/s, Cerebras-based computation provided fast and consistent processing capabilities that surpass conventional GPU configurations. However, as development progressed, the O.XYZ team identified a key limitation — while Cerebras hardware excelled in memory capacity and single-model performance, the company’s vision required an architecture capable of supporting up to 100,000 models in parallel.

“Initially, we explored leveraging Cerebras’ massive wafer-scale compute for ultra-fast, memory-intensive inference—ideal for a few high-demand models. However, after thorough technical assessments and due diligence conducted by our team on-site at Cerebras’ offices in Palo Alto, we quickly realized the limitation: Cerebras excels at depth, not breadth,” Ahmad Shadid told Mpost.

“While it can run a single large model with extraordinary speed and memory bandwidth, it doesn’t scale economically to host more than one model on a single WSE-3 chip. Although the team initially indicated we would have access to the kernel to customize each of the 900,000 cores to host our desired models, we later discovered that handling models with unique dependencies, quantization schemes, and memory footprints is not feasible on the current Cerebras infrastructure,” he added.

In order to enable unified access to over 100,000 open models through a single API endpoint, the O.XYZ‘s architecture was fundamentally redesigned into a hybrid inference infrastructure.

The system transitioned from relying solely on a monolithic Cerebras setup to a tiered, multi-cloud inference network. High-demand models, such as the top 500 by usage, along with the long-tail models comprising the remaining 99,500+, are dynamically deployed on io.net clusters equipped with H100 and H200 GPUs during peak demand periods. Additionally, requests are federated across more than 200 external inference providers, including Together AI, Fireworks, and Anyscale, allowing ORI — O Routing Intelligence — to function as a universal gateway for model access.

“User behavior forced a fundamental redefinition of what ‘performance’ actually means in AI search,” said Ahmad Shadid.

“Early in development, the OCEAN engine was optimized for raw speed—leveraging Cerebras’ massive on-wafer memory to run a small set of high-performance models with ultra-low latency. On paper, it was impressive: sub-100ms responses, deterministic throughput, and minimal cold-start overhead. But during our public beta, we observed a critical disconnect: users consistently preferred slower, more accurate answers from specialized models over fast, generic responses,” he explained.

O.XYZ’s Plans For Advanced Routing Intelligence

Outlining future plans, O.XYZ noted that it aims to evolve OCEAN into a fully integrated AI platform powered by advanced routing intelligence. The company’s proprietary system, known as O Routing Intelligence (ORI) and developed by its AI research lab, is designed to intelligently distribute computational tasks across the most appropriate models—whether open-source or specialized—depending on the complexity of the request. This approach is intended to optimize operational efficiency and cost while maintaining high standards of speed and accuracy.

ORI represents a foundational step toward building an extensive AI library capable of supporting over 100,000 models. Comparable in concept to unified intelligence systems introduced by major AI developers, ORI will be capable of selecting and routing tasks among more than 100,000 open-source models in real time. The evolution of OCEAN into ORI will position it as the central component of O.XYZ’s vision for multi-model intelligence, where users can access and interact with a wide range of AI capabilities through a single, cohesive environment.

“From the outset, our vision for OCEAN, which has now evolved into ORI, was ambitious: to build the most capable, accurate, and responsive AI search engine on the market. But as a bootstrapped, self-funded startup competing against well-resourced giants like OpenAI, Anthropic, and Perplexity, we knew we couldn’t win on data, scale, or brand alone. Instead, we bet on intelligence over brute force: a routing-first architecture that could dynamically select the best model for every query from a vast universe of open-source AI,” said Ahmad Shadid.

“This multi-model philosophy fundamentally shaped our hardware strategy and taught us hard-won lessons about the trade-offs between compute, memory, and flexibility,” he added.

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

Alisa Davidson