Anthropic Analyzes AI Safety through Biorisk Assessment

by Damir Yalalov

Published: July 27, 2023 at 10:39 am Updated: July 27, 2023 at 10:40 am

by Danil Myakin

Edited and fact-checked: July 27, 2023 at 10:39 am

In Brief

Anthropic, founded by ex-OpenAI employees, has conducted a project to assess the potential risks associated with AI models in biorisk.

The project involved experts investing over 150 hours working with advanced models, such as “Claude 2,” to gain a deeper understanding of their proficiency.

The research found that advanced models, such as GPT-4, can provide detailed, expert-grade knowledge, but their frequency varies across different subjects.

Anthropic emphasizes the urgency of addressing safety concerns, stating that risks could become pronounced in as short as two to three years.

Anthropic shared insights from their project aimed at assessing the potential risks associated with AI models in the realm of biorisk. The main focus was to understand the model’s capabilities concerning harmful biological information, such as specifics related to bioweapons.

Anthropic Analyzes AI Safety through Biorisk Assessment — Credit: Metaverse Post

Over a span of six months, experts invested over 150 hours working with Anthropic’s advanced models, speculated to be “Claude 2”, to gain a deeper understanding of these models’ proficiency. The process involved devising special prompts, termed as “jailbreaks”, which were formulated to evaluate the model’s response accuracy. Additionally, quantitative methods were employed to ascertain the model’s capabilities.

While the in-depth results and specific details of the research remain undisclosed, the post offers an overview of the project’s key findings and takeaways. It has been observed that advanced models, including Claude 2 and GPT-4, possess the capability to furnish detailed, expert-grade knowledge, though the frequency of such precise information varies across different subjects. Another significant observation is the incremental capability of these models as they expand in size.

One of the paramount concerns stemming from this research is the potential misuse of these models in the realm of biology. Anthropic’s research suggests that Large Language Models (LLMs), if deployed without rigorous supervision, could inadvertently facilitate and expedite malicious attempts in the biological domain. Such threats, though currently deemed minor, are projected to grow as LLMs continue to evolve.

Anthropic emphasizes the urgency of addressing these safety concerns, highlighting that the risks could become pronounced in a time frame as short as two to three years, rather than an extended five-year period or longer. The insights gleaned from the study have prompted the team to recalibrate their research direction, placing an enhanced emphasis on models that interface with tangible, real-world tools.

For a more detailed perspective, especially concerning GPT-4’s capabilities in chemical mixing and experiment conduction, readers are encouraged to refer to supplementary sources and channels that delve deeper into the intricacies of how linguistic models could potentially navigate the realm of physical experiments.

Recently, we shared the article discusses the creation of a system that combines multiple large language models for autonomous design, planning, and execution of scientific experiments. The system demonstrates the research capabilities of the Agent in three different cases, with the most challenging being the successful implementation of catalyzed reactions. The system includes a library that allows Python code to be written and transferred to a special apparatus for conducting experiments. The system is connected to GPT-4, a top-level scheduler that analyzes the original request and draws up a research plan.

The model has been tested with simple non-chemical tasks like creating shapes on a chemical board and filling cells correctly with substances. However, real experiments were not carried out, and the model has written chemical equations multiple times to understand the amount of substance needed for the reaction. The model has also been asked to synthesize dangerous substances like drugs and poisons.

Some requests have the model refuse to work, such as heroin or the battle poison Mustard. However, for some requests, the model has aligned with the OpenAI team, allowing the model to understand that it is being asked to do something wrong and goes into refusal. The alignment procedure is noticeable and encourages large companies developing LLMs to prioritize the safety of models.

MPost’s Opnion: Anthropic has shown a proactive approach to understanding potential risks associated with their models. Investing over 150 hours in evaluating the model’s ability to infer harmful biological information demonstrates their commitment to understanding the potential negative consequences of their technology. Engaging experts to evaluate the model suggests a thorough and rigorous approach. External experts can provide a fresh perspective, unbiased by the development process, ensuring that the assessment is comprehensive. Anthropic has adapted its future research plan based on the findings from this study. Adjusting research directions in response to identified risks shows a willingness to act on potential threats to human safety. Anthropic has been open in sharing broad trends and conclusions from their research, but they purposefully haven’t published specifics. Given that disclosing information might encourage misuse, this can be seen as a responsible choice. It also makes it challenging for outside parties to independently verify their claims. Their capacity to anticipate risks and suggest that particular threats may intensify in two to three years demonstrates their forward-thinking. Future challenges can be predicted, allowing for early intervention and the creation of safety measures. They appear to be aware of the implications and risks of AI models interacting with physical systems given their focus on models using real-world tools.

Read more about AI:

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet.

Damir Yalalov