Google’s Gemini AI Exposed: Researchers Uncover Susceptibility to Malicious Prompts and Data Leaks

by Victoria d'Este

Published: May 16, 2024 at 11:02 am Updated: May 16, 2024 at 11:02 am

by Ana

Edited and fact-checked: May 16, 2024 at 11:02 am

In Brief

Google’s Gemini LLM is vulnerable to security flaws, potentially allowing malicious actors to manipulate it for offensive content, private information disclosure, and indirect hacking attacks.

It has been discovered that Google’s recently released Gemini large language model (LLM) is susceptible to a number of security flaws that might enable malevolent actors to manipulate it into producing offensive material, disclosing private information, and executing indirect hacking attacks. The cybersecurity company HiddenLayer released the results, which point out flaws that affect both businesses using the Gemini API and consumers utilising the Gemini Advanced.

The Gemini Suite and Its Potential

Google’s most recent line of massive language models, called Gemini, is intended to be a multimodal AI system that can consume and produce code, pictures, audio, video, and text. There are now three primary models in the suite:

Gemini Nano – Designed for light apps and on-device computing.

Gemini Pro – Designed to scale effectively over a broad range of applications and workloads.

Gemini Ultra – The largest and most powerful model, built to tackle complex queries and reason with advanced logic.

While Gemini has drawn comparisons to OpenAI’s GPT-4 and positioned itself as a rival, the multimodal nature of the models sets them apart, having been trained on a diverse array of data formats beyond just text. This versatility positions Gemini as a potential powerhouse for industries looking to integrate AI capabilities across various media types and workflows.

Vulnerability 1 – System Prompt Leakage

Researchers at HiddenLayer have found that one of the main weaknesses is the capacity to derive system cues from Gemini models. The first instructions given to large language models to set parameters for behaviour, persona, and output constraints are system prompts. In order to preserve the integrity and security of the models, it is imperative that these fundamental instructions be better protected, as this extraction capacity draws attention to possible security threats.

Instead of asking for the system prompt directly, researchers might ask Gemini to develop its instructions in order to skip security protections and trick the model into revealing this crucial information. The problem stems from the fact that, despite modifications made to prevent direct exposure to system prompts, Gemini is susceptible to synonym attacks that might circumvent these safeguards.

This vulnerability has grave consequences as it may enable attackers to reverse-engineer the fundamental limitations and rules of the model by granting them access to system prompts. This data might then be utilised to create more effective assaults or make it possible to extract further private information that is available to the LLM.

Vulnerability 2 – Prompted Jailbreak

Another group of weaknesses relates to the capacity to “jailbreak” Gemini models, so getting over the intended limitations and causing them to produce material that might be damaging or unlawful. This was illustrated by researchers who, in spite of the model’s defences against producing such fake material, managed to fool Gemini into writing an article with misleading information on the 2024 U.S. presidential election. This emphasises how much stronger security measures are required to guarantee that AI systems can properly guard against abuse and maintain the integrity of the material.

The technique used here involves instructing Gemini to enter a “fictional state” where it is allowed to generate untrue content under the guise of writing a fictional story. Once in this state, the researchers could then prompt the model to create detailed articles or guides on topics it would typically be prohibited from discussing, such as how to hotwire a car.

This vulnerability highlights a concerning ability for bad actors to manipulate Gemini into spreading misinformation, particularly around sensitive topics like elections or political events. It also raises alarms about the potential for Gemini to be misused in generating dangerous or illegal instructional content, circumventing the ethical guardrails put in place by its developers.

Vulnerability 3 – Reset Simulation

A third flaw found by HiddenLayer is the potential to cause Gemini to leak data by entering a series of unusual or illogical tokens in response to system prompts. Researchers discovered that they could confuse the model and get it to produce confirmation messages that contained information from its basic instructions by continually entering specific characters or word strings.

The technique used by huge language models to distinguish between system prompts and user input is what makes this assault possible. The researchers could effectively fool Gemini into thinking it was being asked to answer by providing it with a series of unusual tokens, leading it to unintentionally expose private data from its core code.

While this vulnerability may seem less severe than the others, it still represents a potential avenue for attackers to extract internal data from Gemini models, which could then be used to inform more sophisticated attacks or identify weaknesses to exploit.

Indirect Injection Attacks via Google Workspace

A more alarming discovery made by HiddenLayer pertains to the potential for indirect injection threats on Gemini through the Google Workspace interface. Researchers simulated how an attacker may manipulate a user’s interactions with the model and override Gemini’s intended behaviour by creating a specially prepared Google Document with harmful instructions.

The implications of this vulnerability are far-reaching, as it opens the door to potential phishing attacks or other social engineering tactics leveraging the trust and integration of Google’s ecosystem.

Addressing the Vulnerabilities and Broader Implications

These threats are indeed concerning, but it’s crucial to recognise that they are not exclusive to Google’s LLM suite. Many of these weaknesses, including susceptibility to prompt injection attacks and content manipulation, have been identified in other large language models throughout the industry. This highlights a broader challenge in the field, underscoring the need for comprehensive security measures and continuous improvement across all AI platforms.

Recognising the results, Google said it frequently does red-teaming drills and training initiatives to ward off aggressive attempts of this kind. Additionally, the corporation has put in place safeguards—which it claims are always being improved—to avoid damaging or deceptive answers.

However, the findings made by HiddenLayer are a clear warning of the possible dangers that come with adopting and deploying massive language models, especially in sensitive or business use cases. It is essential that developers and organisations give careful testing and security measures top priority in order to reduce vulnerabilities as these potent AI systems proliferate.

Beyond just Google, the findings also underscore the broader need for the AI industry to collectively address the challenge of prompt injection attacks, model manipulation, and content generation risks. As large language models advance and become more capable, the potential for misuse and abuse will only grow more significant.

Efforts to establish industry-wide best practices, security frameworks, and responsible development guidelines will be crucial to ensuring the safe and ethical deployment of these powerful AI technologies.

In the meantime, organisations considering the use of Gemini or other large language models would be wise to be cautious and implement robust security controls. This may include policies around the handling of sensitive data, rigorous vetting of model prompts and inputs, and continuous monitoring for potential vulnerabilities or misuse.

The journey towards safe and trustworthy AI is ongoing, and the vulnerabilities uncovered in Gemini serve as a reminder that vigilance and proactive security measures are paramount as these technologies continue to advance and proliferate.

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Victoria is a writer on a variety of technology topics including Web3.0, AI and cryptocurrencies. Her extensive experience allows her to write insightful articles for the wider audience.

Victoria d'Este

Victoria is a writer on a variety of technology topics including Web3.0, AI and cryptocurrencies. Her extensive experience allows her to write insightful articles for the wider audience.

Hot Stories