Standard Intelligence Launches FDM-1, AI System Capable Of Learning Complex Computer Tasks From Video

Share this article







by Alisa Davidson by Victor Dey To improve your local-language experience, sometimes we employ an auto-translation plugin. Please note auto-translation may not be accurate, so read original article for precise information.

In Brief Standard Intelligence unveiled FDM-1, an AI model that learns computer tasks from video, demonstrating capabilities from CAD design to software testing and real-world driving.





Standard Intelligence, a boutique consultancy focused on AI and data strategy, announced the release of FDM-1, a new computer-action model designed to learn how to operate digital interfaces by observing video recordings of real user activity.

The company said in the release statement that the system is trained on more than 11 million hours of screen recordings, making it larger than any publicly available dataset previously used for computer-use modeling. To generate training signals at this scale, the firm applied an automated technique that reconstructs likely user actions, such as keystrokes and cursor movements, directly from visual changes on the screen. This approach allows the model to infer how interactions unfold without relying primarily on manually annotated data.

FDM-1 Demonstrates Long-Horizon Video Understanding And Real-World Computer Control Across Complex Workflows

FDM-1 is built to process long and continuous video streams, enabling it to follow nearly two hours of uninterrupted screen activity in a single session. The extended context window allows the model to capture complex workflows that unfold over longer time horizons, such as engineering, design, and financial operations. The company said this capability enables the system to reason over more visual context than earlier computer-use agents, which are typically limited to short sequences or static screenshots.

In demonstrations released alongside the announcement, the model was shown performing a range of tasks, including building mechanical components in computer-aided design software, identifying software bugs through automated interface exploration, and controlling a real vehicle using live visual feeds and keyboard inputs on public streets in San Francisco. According to the company, the driving demonstration required less than one hour of task-specific fine-tuning.

The firm stated that FDM-1 is designed to operate directly on raw video rather than simplified visual snapshots, enabling the model to learn continuous actions such as scrolling, dragging, and three-dimensional manipulation. By predicting the next user action based on both visual frames and prior interaction history, the system aims to generalize across a wide range of software environments without the need for task-specific reinforcement learning setups.

The company said the broader objective behind the launch is to move computer-use agents from a data-constrained development model to a compute-constrained one, allowing far larger volumes of publicly available instructional and workflow video to be used for training. Executives described the release as a step toward enabling AI systems to learn how people work with digital tools in practice, in a similar way that LLMs learned patterns of writing and communication from internet text.

Disclaimer In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance. More articles Alisa Davidson

