Sakana AI Introduces Self-Improving Agent That Boosts Performance By Up To 50% On SWE-Bench

by Alisa Davidson

Published: June 03, 2025 at 6:00 am Updated: June 02, 2025 at 8:39 am

by Anastasiia O

Edited and fact-checked: June 03, 2025 at 6:00 am

In Brief

Sakana AI launched the Darwin Gödel Machine, a self-improving agent that boosts performance by up to 50.0% on SWE-bench and by up to 30.7% on Polyglot.

Sakana AI Introduces Self-Improving Agent That Boosts Performance By Up To 50% On SWE-bench

Japanese AI company Sakana AI introduced the Darwin Gödel Machine (DGM), a self-modifying agent capable of altering its own code. Drawing inspiration from evolutionary principles, the system maintains a growing lineage of agent variants, enabling ongoing exploration within the broad range of self-improving agent designs.

While current agent systems are typically static and unchanging after deployment, the DGM emphasizes continuous self-improvement as a crucial factor for advancing AI capabilities. The machine is designed to support AI systems that can learn and evolve their abilities over time, similarly to human development.

Our experiments demonstrate that the Darwin Gödel Machine can continuously self-improve by modifying its own codebase. On SWE-bench, DGM automatically improved its performance from 20% to 50%.

The figure here shows the performance progress over iterations, and also a summary of… pic.twitter.com/RjxapMTQN3
— Sakana AI (@SakanaAILabs) May 30, 2025

The DGM represents a notable advancement toward AI systems capable of autonomously identifying and building upon their own learning milestones to continually innovate. The system expands its archive by selecting an agent from its existing collection and employing a foundation model to generate a new, improved variant of that agent. This process of open-ended exploration creates a growing tree of diverse, high-quality agents, enabling simultaneous exploration of multiple pathways within the search space.

Empirical results demonstrate that the DGM enhances its coding abilities over time—improving tools such as code editing, long-context management, and peer-review mechanisms—leading to increased performance on benchmarks like SWE-bench (from 20.0% to 50.0%) and Polyglot (from 14.2% to 30.7%). The system consistently outperforms baseline models that lack self-improvement or open-ended exploratory capabilities.

Notably, the evolution toward the most effective agent sometimes involved intermediate agents that performed worse than their predecessors but were retained in the lineage, illustrating the advantages of an open-ended search strategy. This approach preserves a diverse archive of useful intermediate agents rather than exclusively focusing on branching from the highest-performing agent, demonstrating that progress does not always follow a linear path.

The research further indicates that the improved performance of agents discovered by the DGM can be generalized across different foundation models, such as transferring from Claude to o3-mini, and across various programming languages and task domains, including Python, Rust, C++, Go, and others.

Sakana AI: Developing AI Systems Inspired By Nature And Collective Intelligence

Sakana AI is an AI research company based in Tokyo that focuses on developing AI systems inspired by natural processes. The company’s approach involves integrating multiple smaller, autonomous models to form a collective intelligence, similar to how a school of fish operates. This method differs from traditional large-scale AI models by prioritizing adaptability, resource efficiency, and long-term sustainability.

Among Sakana AI’s research projects is the “Evolutionary Model Merge” technique, which applies evolutionary algorithms to combine existing AI models. This process generates new models with targeted capabilities while minimizing the need for extensive computational power. Additionally, Sakana AI has developed the “AI Scientist,” a system designed to automate scientific research by allowing foundation models to independently carry out investigations and discovery processes.

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

Alisa Davidson