In Brief
Researchers created a system that combines large language models for autonomous design, planning, and execution of scientific experiments, demonstrating its research capabilities in three different cases.
The model wrote code for chemical equations to understand how much substance is needed for the reaction.
The Trust Project is a worldwide group of news organizations working to establish transparency standards.
The article “Emergent autonomous scientific research capabilities of large language models” looks into the idea of creating a system that combines several large language models for autonomous design, planning, and execution of scientific experiments. It demonstrates the research capabilities of the agent in three different cases, the most difficult of which is the successful implementation of catalyzed reactions.

The main thesis of this article is:
- Researchers found a library that allows you to write code in Python and then transfer commands for execution to a special apparatus for conducting experiments (with mixing substances);
- Researchers used GPT-4 for search on the Internet and library documentation, as well as the ability to run Python code (to execute experiments);
- There is a top-level scheduler (also GPT-4), which analyzes the original request and draws up a “research plan.”
- GPT-4 does a good job performing simple non-chemical tasks like creating certain shapes on a chemical board (filling cells correctly with substances).
- They tried a more complex and applied task of conducting a reaction; the model coped well and acted logically.
- Then they gave the model several tasks for conducting experiments; however, for what the model gave out, no real experiments were carried out.
- Moreover, the model wrote the code for chemical equations several times to assess how much substance is needed for the reaction.
- It was also asked to create a cure for cancer. The model approached the analysis logically and methodically. First, it “looked” online for current trends in discovering anticancer drugs. Next, the model chose a molecule that would be used for modeling the drug and wrote the code for its synthesis. People didn’t run the code (and I didn’t see an analysis of its adequacy).
- In addition, it was asked to synthesize several dangerous substances like drugs and poisons.
Here is the most interesting part. For some requests, the model immediately refused to work (for example, heroin or mustard gas, an extremely dangerous poison gas). For others, it started to Google how to make the substances but realized that they could be used for illicit purposes and refused to continue work. For others, it wrote a research plan and code for the substance synthesis.
This “refusal” is likely because GPT-4 is designed to analyze the request, and if it is asked to do something illegal or dangerous, it immediately refuses to carry out the request. It’s really cool that the result of the alignment procedure is noticeable.
And at the end of the article, the authors urge all large companies developing LLMs to prioritize the safety of models.
- Researchers at the University of California created the Machiavelli benchmark to measure the competence and harmfulness of AI models in a broad environment of long-term language interactions. This test uses high-level solutions to give agents realistic goals and abstract away low-level interactions.
- The intellectual revolution marked by ChatGPT is a triad of synergistically sublime revolutions: technological, techno-humanitarian, and socio-political. To take a comprehensive look at what is happening, it is recommended to listen to three fresh points of view from intellectuals from the fields of philosophy, history, and innovation.
- The story of the petition to stop developing AI systems more advanced than GPT-4 has polarized society. An article provides examples of when processes go in an unexpected direction. Risks of malicious use of AI and misuse are not considered, leading to the argument that we need to be afraid of people and not AI itself.
Read more about AI:
Disclaimer
Any data, text, or other content on this page is provided as general market information and not as investment advice. Past performance is not necessarily an indicator of future results.