Perplexity AI Open-Sources BrowseSafe To Combat Prompt Injection In AI Browsing
In Brief
Perplexity open-sourced BrowseSafe, a security tool designed to protect AI browser assistants from malicious instructions hidden in web pages.
Perplexity AI, the company behind the AI-driven Perplexity search engine, announced the release of BrowseSafe, an open research benchmark and content-detection model designed to enhance user safety as AI agents begin operating directly within the browser environment.
As AI assistants move beyond traditional search interfaces and begin performing tasks inside web browsers, the structure of the internet is expected to shift from static pages to agent-driven interactions. In this model, the browser becomes a workspace where an assistant can take action rather than simply provide answers, creating a need for systems that ensure the assistant consistently acts in the user’s interest.
BrowseSafe is a specialized detection model trained to evaluate a single core question: whether a webpage’s HTML contains harmful instructions intended to manipulate an AI agent. While large, general-purpose models can assess these risks accurately, they are typically too resource-intensive for continuous real-time scanning. BrowseSafe is designed to analyze complete webpages quickly without affecting browser performance. Alongside the model, the company is releasing BrowseSafe-Bench, a testing suite intended to support ongoing evaluation and improvement of defense mechanisms.
The rise of AI-based browsing also introduces new cybersecurity challenges that require updated protective strategies. The company previously outlined how its Comet system applies multiple layers of defense to keep agents aligned with user intent, even in cases where websites attempt to alter agent behavior through prompt injection. The latest explanation focuses on how these threats are defined, tested using real-world attack scenarios, and incorporated into models trained to identify and block harmful instructions quickly enough for safe deployment inside the browser.
Prompt injection refers to malicious language inserted into text that an AI system processes, with the goal of redirecting the system’s behavior. In a browser setting, agents read entire pages, allowing such attacks to be embedded in areas like comments, templates, or extended footers. These hidden instructions can influence agent actions if not properly detected. They may also be written in subtle or multilingual formats, or concealed in HTML elements that do not appear visually on the page—such as data attributes or unrendered form fields—which users do not see but AI systems still interpret.
BrowseSafe-Bench: Advancing Agent Security In Real-World Web Environments
In order to analyze prompt-injection threats in an environment similar to real-world browsing, the company developed BrowseSafe, a detection model that has been trained and released as open source, along with BrowseSafe-Bench, a public benchmark containing 14,719 examples modeled after production webpages. The dataset incorporates complex HTML structures, mixed-quality content, and a wide range of both malicious and benign samples that differ by attacker intent, placement of the injected instruction within the page, and linguistic style. It covers 11 attack categories, nine injection methods ranging from hidden elements to visible text blocks, and three styles of language, from direct commands to more subtle, indirect phrasing.
Under the defined threat model, the assistant operates in a trusted environment, while all external web content is treated as untrusted. Malicious actors may control entire sites or insert harmful text—such as descriptions, comments, or posts—into otherwise legitimate pages that the agent accesses. To mitigate these risks, any tool capable of returning untrusted data, including webpages, emails, or files, is flagged, and its raw output is processed by BrowseSafe before the agent can interpret or act on it. BrowseSafe functions as one component of a broader security strategy that includes scanning incoming content, limiting tool permissions by default, and requiring user approval for certain sensitive operations, supplemented by standard browser protections. This layered approach is intended to support the use of capable browser-based assistants without compromising safety.
Testing results on BrowseSafe-Bench highlight several trends. Direct forms of attack, such as attempts to extract system prompts or redirect information via URL paths, are among the simplest for models to detect. Multilingual attacks, along with versions written in indirect or hypothetical phrasing, tend to be more difficult because they avoid lexical cues that many detection systems rely on. The location of the injected text also plays a role. Instances hidden in HTML comments are detected relatively effectively, whereas those placed in visible sections like footers, table cells, or paragraphs are more challenging, revealing a structural weakness in the handling of non-hidden injections. Improved training with well-designed examples can raise detection performance across these cases.
BrowseSafe and BrowseSafe-Bench are available as open-source resources. Developers working on autonomous agents can use them to reinforce defenses against prompt injection without needing to build protection systems independently. The detection model can run locally and flag harmful instructions before they reach an agent’s core decision-making layer, with performance optimized for scanning full pages in real time. BrowseSafe-Bench’s large set of realistic attack scenarios offers a means to stress-test models against the complex HTML patterns that typically compromise standard language models, while chunking and parallel scanning techniques help agents process large, untrusted pages efficiently without exposing users to elevated risk.
Disclaimer
In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.
About The Author
Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.
More articles
Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.