In Brief Amazon AGI Labs has unveiled the Nova Act AI model designed to perform tasks within a web browser, and has released a research preview of its SDK, allowing developers to experiment with the model‘s early version.

Amazon AGI Labs, the company’s dedicated division focused on advancing Artificial General Intelligence (AGI), has unveiled the Amazon Nova Act, a new AI model designed to perform tasks within a web browser.

In conjunction with this, Amazon AGI Labs has released a research preview of the Amazon Nova Act software development kit (SDK), which will allow developers to experiment with an early version of the model. Through this SDK, developers can create agents capable of completing a variety of tasks in a web browser, such as submitting an out-of-office request in an internal system, setting calendar holds, or sending “away from office” email notifications.

The Nova Act SDK provides developers with the ability to break down complex workflows into smaller, manageable commands, such as searching, checking out, or answering questions based on what appears on the screen. Additionally, developers can include detailed instructions within these commands (e.g., “do not accept the insurance upsell”), call APIs, and even use Playwright to manipulate the browser directly, enhancing reliability in tasks like entering passwords. The SDK also allows for integration of Python code, enabling testing, breakpoints, assertions, or parallelized thread pools, addressing the inherent limitations of web page load times, even for the fastest agents.

Nova Act: A Reliable AI Model Aimed At Over 90% Accuracy For Complex Web Interactions

Nova Act is designed to provide reliable building blocks that can be combined into more complex workflows. While many agent benchmarks focus on high-level tasks, where state-of-the-art models typically achieve only 30% to 60% accuracy in completing tasks in web browsers, Nova Act is focused on ensuring reliability. Amazon AGI Labs aims for over 90% accuracy in internal evaluations, addressing challenges that often trip up other models, such as date picking, dropdown menus, and popups. The model is engineered to excel on benchmarks like ScreenSpot and GroundUI Web, which assess an AI’s ability to interact with the web. For example, the model scores 0.939 in interacting with textual elements on screenshots, 0.879 for interacting with visual elements, and 0.805 for understanding and engaging with various UI elements on web pages.

In addition to performance, Nova Act emphasizes reliability. Once a user has configured the model, there is no need for constant oversight. Users can enable headless mode, turning the agent into an API that integrates seamlessly with other systems, or even set it to run asynchronously on a specified schedule.

Furthermore, though still in its early stages, Amazon AGI Labs is optimistic about Nova Act’s ability to adapt its user interface understanding across different environments. Notably, early checkpoints suggest that Nova Act performs well in novel settings, such as web games, even without prior experience in video games.

Additionally, with its combination of reliable building blocks and flexibility, Nova Act is already being integrated into Alexa+ to autonomously navigate the web and complete tasks when integrated services lack the necessary APIs.

Nova Act represents the first step in Amazon AGI Labs’ vision to develop the key capabilities needed for scalable, effective agents. This initial checkpoint is part of a larger training curriculum that aims to improve the model. To make agents truly intelligent and reliable for complex, multi-step tasks, Amazon AGI Labs believes that agents must be trained using reinforcement learning in a diverse set of real-world environments, rather than relying solely on supervised fine-tuning with simple demonstrations. The team is eager to share further research and progress as the model evolves.

