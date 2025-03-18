Roblox Unveils Cube 3D: An Open-Source AI For Generating 3D Objects And Scenes From Text Prompts

Online gaming platform and game development system Roblox announced the release and open-source availability of Cube 3D, an AI model designed to generate 3D objects and environments from text prompts.

Cube 3D will serve as the foundation for many of the AI tools Roblox plans to develop in the future, including advanced scene-generation tools. Over time, it will evolve into a multimodal model, incorporating text, images, video, and other forms of input, and will integrate with Roblox’s existing AI creation tools. The AI model is capable of generating 3D models and environments directly from text descriptions and, in the future, from images as well.

In order to create a truly immersive 3D world, it is essential to design fully functional structures—such as garages to drive into, stands to sit in, and podiums for victory lanes. To achieve this, Roblox has drawn inspiration from advanced models that are trained on text tokens to predict the next token and form a sentence. The innovation is based on this same principle. Roblox has developed the ability to tokenize 3D objects and recognize shapes as tokens, training Cube 3D to predict the next shape token in order to build complete 3D objects. When extended to full scene generation, Cube 3D predicts the layout and recursively predicts the shapes to complete that layout. Users can fine-tune, develop plugins for, or train Cube 3D using their own data to meet their specific needs.

Roblox Innovates Object Creation With 3D Tokenization

The primary technical challenge was linking text and images with 3D shapes. The key innovation is 3D tokenization, which allows the platform to represent 3D objects as tokens, similar to how text is represented as tokens. This enables Roblox to predict the next shape in the same way language models predict the next word in a sentence.

In order to achieve 3D generation, Roblox has developed a unified architecture for autoregressive generation, which includes generating single objects, completing shapes, and designing multi-object or scene layouts. Autoregressive transformers are neural networks that use previous inputs to predict the next component. This architecture supports both scalability and multimodal compatibility, allowing the model to handle various types of input (text, visuals, audio, and 3D). Roblox is open-sourcing this model, and in this initial phase, creators will be able to generate 3D objects from text prompts. In the future, it aims for creators to generate entire scenes using multiple input types.

For training the generative pretrained transformer (GPT) for shape creation, Roblox uses discrete 3D shape tokens, aligning them with text prompts. This novel approach positions us to create fully playable 3D scenes in the future.

Roblox is an online gaming platform and game creation system that allows users to design, develop, and play games created by other users. It provides a vast virtual environment where individuals can create and share interactive 3D experiences, ranging from simple games to complex virtual worlds.

