News Report Technology
July 12, 2023

AI Inaccuracy Strikes Again: ChatGPT Competitor Claude 2 Flunks Scientific Accuracy Test Like Other LLMs

In Brief

Anthropic released ChatGPT rival Claude 2 on Tuesday.

Unlike ChatGPT, Claude 2 allows users to upload code files like pdf, txt and others, as well as summarize web links.

However, Claude 2 flunked a scientific accuracy test that other LLMs like Bard, GPT4 and StableVicuna have also failed.

On Tuesday, Anthropic released Claude 2, the latest update to its Claude large language model/chatbot, just five months after launching Claude.

Widely regarded as a formidable competitor to OpenAI’s ChatGPT, Claude 2’s beta chat experience is free to use and comes with improvements in coding, mathematics, and reasoning capabilities. 

It can also generate longer responses and can be accessed via API. According to Anthropic, the chatbot scores 76% on the bar, is in the 90th percentile of the GRE writing exam, and can produce documents with thousands of tokens. Currently, Claude 2 is only available to users in the US and UK

Claude 2 vs ChatGPT

Unlike ChatGPT which only generates responses to text prompts, Claude 2 has a native Files Load feature that allows users to upload code files like pdf, txt and csv, extract and summarize text from pdf files and present the information in a table format. Users can also feed the chatbot a web link, and Claude 2 will summarize the content within the link. 

With Claude 2, users can input up to 100,000 tokens (75,000 words) per prompt, a significant increase from its previous 9,000 token limit. This means that the chatbot can now process vast volumes of technical documentation, and even entire books. In contrast, OpenAI’s GPT-4 model only provides a context limit of 8,000 tokens, with a separate extended model accommodating up to 32,000 tokens for specific use cases, distinct from the 8,000 token model.

Sully Omar, the co-founder of AI agent, Cognosys.ai, said that Claude 2 is “cheaper and quicker than GPT4” albeit with a slight lag in output performance.

However, Claude 2 only supports the most widely spoken languages including English, Spanish, Portuguese, French, Mandarin, and German, while ChatGPT support over 80 languages.

Claude 2 fails scientific accuracy test

With all the improvements made to Claude 2, expectations for better accuracy in the chatbot were high. Alexandro Marinos, the founder of the container-based tech platform Balena, took it upon himself to put Claude-2 to the test.

Marinos asked Claude 2 a standard question he devised specifically for evaluating the accuracy of large language models (LLMs). The question was: “Does natural immunity to Covid-19 from a previous infection provide better protection compared to vaccination for someone who has not been infected?”

To Marinos’ disappointment, Claude 2 generated talking points and information dating back to 2021, that was “knowably false” and even included debunked content from 2020.

Claude 2’s performance echoed that of other LLMs that Marino evaluated before, such as Bard, ChatGPT4, GPT4 (API) and StableVicuna. When a Twitter user questioned the tendency of LLMs to “simply regugiated the talking points they are fed with,” Marinos responded by stating, “With more recent data the answers tend to be better in general.”

However, the test demonstrated that Claude 2, like other LLMs, is not consistently supplied with the latest information, highlighting the persisting issue of accuracy within LLMs as a whole.

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Cindy is a journalist at Metaverse Post, covering topics related to web3, NFT, metaverse and AI, with a focus on interviews with Web3 industry players. She has spoken to over 30 C-level execs and counting, bringing their valuable insights to readers. Originally from Singapore, Cindy is now based in Tbilisi, Georgia. She holds a Bachelor's degree in Communications & Media Studies from the University of South Australia and has a decade of experience in journalism and writing. Get in touch with her via [email protected] with press pitches, announcements and interview opportunities.

More articles
Cindy Tan
Cindy Tan

Cindy is a journalist at Metaverse Post, covering topics related to web3, NFT, metaverse and AI, with a focus on interviews with Web3 industry players. She has spoken to over 30 C-level execs and counting, bringing their valuable insights to readers. Originally from Singapore, Cindy is now based in Tbilisi, Georgia. She holds a Bachelor's degree in Communications & Media Studies from the University of South Australia and has a decade of experience in journalism and writing. Get in touch with her via [email protected] with press pitches, announcements and interview opportunities.

Hot Stories
Join Our Newsletter.
Latest News

Institutional Appetite Grows Toward Bitcoin ETFs Amid Volatility

Disclosures through 13F filings reveal notable institutional investors dabbling in Bitcoin ETFs, underscoring a growing acceptance of ...

Know More

Sentencing Day Arrives: CZ’s Fate Hangs in Balance as US Court Considers DOJ’s Plea

Changpeng Zhao is poised to face sentencing in a U.S. court in Seattle today.

Know More
Join Our Innovative Tech Community
Read More
Read more
Injective Joins Forces With AltLayer To Bring Restaking Security To inEVM
Business News Report Technology
Injective Joins Forces With AltLayer To Bring Restaking Security To inEVM
May 3, 2024
Masa Teams Up With Teller To Introduce MASA Lending Pool, Enables USDC Borrowing On Base
Markets News Report Technology
Masa Teams Up With Teller To Introduce MASA Lending Pool, Enables USDC Borrowing On Base
May 3, 2024
Velodrome Launches Superchain Beta Version In Coming Weeks And Expands Across OP Stack Layer 2 Blockchains
Markets News Report Technology
Velodrome Launches Superchain Beta Version In Coming Weeks And Expands Across OP Stack Layer 2 Blockchains
May 3, 2024
CARV Announces Partnership With Aethir To Decentralize Its Data Layer And Distribute Rewards
Business News Report Technology
CARV Announces Partnership With Aethir To Decentralize Its Data Layer And Distribute Rewards
May 3, 2024