News Report Technology

December 04, 2025

Perplexity AI Open-Sources BrowseSafe To Combat Prompt Injection In AI Browsing

by Alisa Davidson

Published: December 04, 2025 at 8:29 am Updated: December 04, 2025 at 8:29 am

by Anastasiia O

Edited and fact-checked: December 04, 2025 at 8:29 am

In Brief

Perplexity open-sourced BrowseSafe, a security tool designed to protect AI browser assistants from malicious instructions hidden in web pages.

Perplexity AI Open-Sources BrowseSafe To Combat Prompt Injection In AI Browsing

Perplexity AI, the company behind the AI-driven Perplexity search engine, announced the release of BrowseSafe, an open research benchmark and content-detection model designed to enhance user safety as AI agents begin operating directly within the browser environment.

As AI assistants move beyond traditional search interfaces and begin performing tasks inside web browsers, the structure of the internet is expected to shift from static pages to agent-driven interactions. In this model, the browser becomes a workspace where an assistant can take action rather than simply provide answers, creating a need for systems that ensure the assistant consistently acts in the user’s interest.

BrowseSafe is a specialized detection model trained to evaluate a single core question: whether a webpage’s HTML contains harmful instructions intended to manipulate an AI agent. While large, general-purpose models can assess these risks accurately, they are typically too resource-intensive for continuous real-time scanning. BrowseSafe is designed to analyze complete webpages quickly without affecting browser performance. Alongside the model, the company is releasing BrowseSafe-Bench, a testing suite intended to support ongoing evaluation and improvement of defense mechanisms.

The rise of AI-based browsing also introduces new cybersecurity challenges that require updated protective strategies. The company previously outlined how its Comet system applies multiple layers of defense to keep agents aligned with user intent, even in cases where websites attempt to alter agent behavior through prompt injection. The latest explanation focuses on how these threats are defined, tested using real-world attack scenarios, and incorporated into models trained to identify and block harmful instructions quickly enough for safe deployment inside the browser.

Prompt injection refers to malicious language inserted into text that an AI system processes, with the goal of redirecting the system’s behavior. In a browser setting, agents read entire pages, allowing such attacks to be embedded in areas like comments, templates, or extended footers. These hidden instructions can influence agent actions if not properly detected. They may also be written in subtle or multilingual formats, or concealed in HTML elements that do not appear visually on the page—such as data attributes or unrendered form fields—which users do not see but AI systems still interpret.

BrowseSafe-Bench: Advancing Agent Security In Real-World Web Environments

In order to analyze prompt-injection threats in an environment similar to real-world browsing, the company developed BrowseSafe, a detection model that has been trained and released as open source, along with BrowseSafe-Bench, a public benchmark containing 14,719 examples modeled after production webpages. The dataset incorporates complex HTML structures, mixed-quality content, and a wide range of both malicious and benign samples that differ by attacker intent, placement of the injected instruction within the page, and linguistic style. It covers 11 attack categories, nine injection methods ranging from hidden elements to visible text blocks, and three styles of language, from direct commands to more subtle, indirect phrasing.

Under the defined threat model, the assistant operates in a trusted environment, while all external web content is treated as untrusted. Malicious actors may control entire sites or insert harmful text—such as descriptions, comments, or posts—into otherwise legitimate pages that the agent accesses. To mitigate these risks, any tool capable of returning untrusted data, including webpages, emails, or files, is flagged, and its raw output is processed by BrowseSafe before the agent can interpret or act on it. BrowseSafe functions as one component of a broader security strategy that includes scanning incoming content, limiting tool permissions by default, and requiring user approval for certain sensitive operations, supplemented by standard browser protections. This layered approach is intended to support the use of capable browser-based assistants without compromising safety.

Testing results on BrowseSafe-Bench highlight several trends. Direct forms of attack, such as attempts to extract system prompts or redirect information via URL paths, are among the simplest for models to detect. Multilingual attacks, along with versions written in indirect or hypothetical phrasing, tend to be more difficult because they avoid lexical cues that many detection systems rely on. The location of the injected text also plays a role. Instances hidden in HTML comments are detected relatively effectively, whereas those placed in visible sections like footers, table cells, or paragraphs are more challenging, revealing a structural weakness in the handling of non-hidden injections. Improved training with well-designed examples can raise detection performance across these cases.

BrowseSafe and BrowseSafe-Bench are available as open-source resources. Developers working on autonomous agents can use them to reinforce defenses against prompt injection without needing to build protection systems independently. The detection model can run locally and flag harmful instructions before they reach an agent’s core decision-making layer, with performance optimized for scanning full pages in real time. BrowseSafe-Bench’s large set of realistic attack scenarios offers a means to stress-test models against the complex HTML patterns that typically compromise standard language models, while chunking and parallel scanning techniques help agents process large, untrusted pages efficiently without exposing users to elevated risk.

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

Alisa Davidson

Hot Stories

News Report

Rhino.fi Launches Stablecoin 1:1, Enabling Neobanks And Fintech Firms To Settle Stablecoins Like Dollars

by Alisa Davidson

March 19, 2026

Interview Business Technology

COO Of MEXC On Why AI Agents, RWAs, And Hybrid Models Will Reshape The CEX Landscape

by Alisa Davidson

March 19, 2026

News Report Technology

Inflectiv Introduces AVP To Standardize Secure Credential Management For AI Agents

by Alisa Davidson

March 19, 2026

News Report Technology

What Saeed Al Fahim From Tharwa Sees In Web3 That Many Institutions Are Still Evaluating

by Alisa Davidson

March 19, 2026

Perplexity AI Open-Sources BrowseSafe To Combat Prompt Injection In AI Browsing

BrowseSafe-Bench: Advancing Agent Security In Real-World Web Environments

Disclaimer

About The Author

COO Of MEXC On Why AI Agents, RWAs, And Hybrid Models Will Reshape The CEX Landscape

Inflectiv Introduces AVP To Standardize Secure Credential Management For AI Agents

What Saeed Al Fahim From Tharwa Sees In Web3 That Many Institutions Are Still Evaluating

Google Transforms Stitch Into AI-Driven Design Canvas For Fast UI Creation And Collaborative Prototyping

Rhino.fi Launches Stablecoin 1:1, Enabling Neobanks And Fintech Firms To Settle Stablecoins Like Dollars

Inflectiv Introduces AVP To Standardize Secure Credential Management For AI Agents

What Saeed Al Fahim From Tharwa Sees In Web3 That Many Institutions Are Still Evaluating

Google Transforms Stitch Into AI-Driven Design Canvas For Fast UI Creation And Collaborative Prototyping

The Calm Before The Solana Storm: What Charts, Whales, And On-Chain Signals Are Saying Now

Crypto In April 2025: Key Trends, Shifts, And What Comes Next