News Report SMW Technology
May 30, 2023

GPT-4’s Performance on U.S. Bar Exam Contradicts Its Claims

GPT-4’s Performance on U.S. Bar Exam Contradicts Its Claims

In a recent examination of GPT-4’s performance on the Uniform Bar Exam (UBE), doubts have arisen about the accuracy of OpenAI’s claims regarding the model’s success rate. Contrary to the initial assertion that GPT-4 outperforms 90% of individuals, the findings suggest a significant discrepancy between the estimated and actual performance of the AI model. This revelation emphasizes the importance of transparent evaluation procedures and accessible data for validating such claims.

The examination focused on various factors to ascertain the true capabilities of GPT-4. Firstly, the analysis of the February exams in Illinois revealed that GPT-4’s scores approached the 90th percentile. However, it was observed that these scores were heavily influenced by retakers who had previously failed the July exam and thus scored below the overall average.

Furthermore, the results of the July exam contradicted OpenAI’s claims, revealing that GPT-4 would only outperform 68% of people and 48% of essays. GPT-4’s performance against first-time takers (excluding retakes) was evaluated at the 63rd percentile when official data from several tests at different periods was considered, with essays scoring considerably lower at the 41st percentile.

An additional perspective was gained by examining the performance of those who passed the exam, including licensed individuals and those awaiting licensing. In this regard, GPT-4’s overall performance was ranked at the 48th percentile, with essays faring even worse at the 15th percentile.

While these findings are troubling, it is critical to consider the possibility of human mistake in the review process. The author of the article emphasizes the importance of understanding the sample utilized by the researchers to evaluate GPT-4’s performance. The lack of official data, especially in aggregated form, makes fair comparison and evaluation of percentiles difficult. Establishing clear and accessible evaluation techniques that can be evaluated by all stakeholders is critical.

In response to these concerns, OpenAI is urged to address the discrepancies and provide further insights into the evaluation process. Transparency and openness are essential for gaining trust and ensuring the credibility of AI models in high-stakes domains such as law.

It should be noted that the article does not discuss the specific score achieved by GPT-4, which is reported to be 298. Evaluating the significance of this score requires a contextual understanding of the grading system used. Just as a child coming home from school with a B could be either a cause for celebration or disappointment, the interpretation of the GPT-4’s score depends on the scale employed.

The assessment of GPT-4’s performance on the bar exam raises serious concerns about the veracity of OpenAI’s initial assertions. The gap between estimated and actual performance emphasizes the importance of clear evaluation systems and easily accessible data. OpenAI is encouraged to address these challenges and develop a more inclusive and reliable approach to AI model evaluation.

Read more about AI:

Disclaimer

Any data, text, or other content on this page is provided as general market information and not as investment advice. Past performance is not necessarily an indicator of future results.


The Trust Project is a worldwide group of news organizations working to establish transparency standards.

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

More articles
Damir Yalalov
Damir Yalalov

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

Hot Stories
Join Our Newsletter.
Latest News

20 Most Underrated AI Startups in 2023: Ranked by Funding

AI remains a constant focal point for investors and entrepreneurs alike. While the spotlight often falls on ...

Know More

Ranked: Top 10 Countries by Estimated AI Contribution to Economy by 2030

AI stands at the cusp of a transformative era, poised to reshape virtually every sector and ignite ...

Know More
Join Our Innovative Tech Community

Read More

Read more
Farmville Creator Raises $33M Funding to Develop Blockchain Games
Business News Report
Farmville Creator Raises $33M Funding to Develop Blockchain Games
September 21, 2023
Chainlink Integrates with Arbitrum for Web3 Interoperability and Cross-Chain DApp Development
Business News Report
Chainlink Integrates with Arbitrum for Web3 Interoperability and Cross-Chain DApp Development
September 21, 2023
Microsoft to Launch 365 Copilot AI in November, Adds DALL-E 3 to Bing Chat
News Report Technology
Microsoft to Launch 365 Copilot AI in November, Adds DALL-E 3 to Bing Chat
September 21, 2023
Mesh Raises $22M in Series A to Bolster its Embedded Crypto Platform
Business News Report
Mesh Raises $22M in Series A to Bolster its Embedded Crypto Platform
September 21, 2023
What You
Need to Know

Subscribe To Our Newsletter.
Daily search marketing tidbits for savvy pros.