Analysis News Report Technology

September 06, 2023

LLaVA vs. GPT-4: An Open-Source AI Showdown Highlighting Multimodal Potential and Mathematical Limitations

by Damir Yalalov

Published: September 06, 2023 at 4:27 pm Updated: September 06, 2023 at 4:28 pm

by Danil Myakin

Edited and fact-checked: September 06, 2023 at 4:27 pm

At the recent presentation of GPT-4, one of the standout features was its ability to engage in conversations enriched with images. However, this capability is yet to be integrated into OpenAI’s offering. While we previously highlighted Bing’s competence in this aspect, an open-source solution has now emerged in the form of the “Large Language and Vision Assistant” (LLaVA).

Mathematical Challenges: LLaVA's Open Source AI Struggles, Unlike GPT-4

LLaVA’s Multimodal Potential

LLaVA is an open-source multimodal AI that combines language and vision processing. A demo of LLaVA can be accessed here.

LLaVA vs. GPT-4: An Open-Source AI Showdown Highlighting Multimodal Potential and Mathematical Limitations — The odd thing about this picture is that a man is holding a clothesline with a white shirt on it as he leans out of the window of a yellow taxi. This situation is unusual because it is uncommon to see people leaning out of car windows while holding clothing or undergarments. The man is probably attempting to dry his shirt in an unsafe and inefficient manner while the taxi is moving.

We conducted a simple test by inputting a picture of Taxi and man, and LLaVA provided me with a descriptive analysis. However, my attempts to challenge it with mathematical problems, similar to those tackled by Bing, proved futile. LLaVA appears to struggle with mathematics, despite its proficiency in image recognition.

Challenges with Mathematical Tasks

We presented LLaVA with a mathematical problem involving trigonometry, akin to those successfully solved by Bing. Unfortunately, LLaVA could not provide the correct solution, returning an erroneous answer for the square root of 169 as 13.2.

Let’s solve this query

The task at hand involves determining the length of the diagonal of a triangle with sides measuring 5 cm and 12 cm. To accomplish this, we can employ the Pythagorean theorem, which states that in a right-angled triangle, the square of the hypotenuse’s length (the side opposite the right angle) equals the sum of the squares of the other two sides.

In this scenario, the hypotenuse corresponds to the diagonal of the triangle, which we can ascertain by taking the square root of the sum of the squares of the two given sides.

To begin, we must find the length of the diagonal, which is the side connecting the two non-adjacent vertices of the triangle. In this instance, those two vertices are associated with the sides measuring 5 cm and 12 cm.

Now, let’s apply the Pythagorean theorem:

(5 cm)^2 + (12 cm)^2 = diagonal^2

5^2 + 12^2 = diagonal^2

25 + 144 = diagonal^2

169 = diagonal^2

Finally, we can determine the length of the diagonal by taking the square root of both sides:

√169 = diagonal

The length of the diagonal is approximately 13.2 cm.

An Unusual Perspective on Images

LLaVA excels in its ability to converse with images, although challenges persist, particularly in mathematical problem-solving. For example, when presented with an image of a man leaning out of a yellow taxi window, holding a clothesline with a white shirt, LLaVA provided an unusual perspective. It suggested that such a scene is atypical, as it is not common to witness individuals leaning out of car windows while holding clothing. The analysis indicated that the man may be attempting an unconventional and potentially unsafe method of drying his shirt while the taxi is in motion.

While LLaVA offers promising multimodal capabilities, particularly in conversing with images, it faces limitations in mathematical problem-solving. It’s worth noting that Google’s capabilities in this regard surpass LLaVA’s, as demonstrated by a more accurate solution to a similar mathematical problem.

The development of AI with multimodal capabilities is undoubtedly an exciting advancement, and LLaVA is a commendable open-source effort in this direction. However, improvements are needed to enhance its mathematical reasoning capabilities to match its proficiency in image analysis.

For a more accurate mathematical solution, Google’s capabilities are currently superior: Google’s Mathematical Problem Solver.

Read more about AI:

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet.

Damir Yalalov

Hot Stories

News Report Technology

Rhino.fi Launches Stablecoin 1:1, Enabling Neobanks And Fintech Firms To Settle Stablecoins Like Dollars

by Alisa Davidson

March 19, 2026

Interview Business Technology

COO Of MEXC On Why AI Agents, RWAs, And Hybrid Models Will Reshape The CEX Landscape

by Alisa Davidson

March 19, 2026

News Report Technology

Inflectiv Introduces AVP To Standardize Secure Credential Management For AI Agents

by Alisa Davidson

March 19, 2026

News Report Technology

What Saeed Al Fahim From Tharwa Sees In Web3 That Many Institutions Are Still Evaluating

by Alisa Davidson

March 19, 2026

LLaVA vs. GPT-4: An Open-Source AI Showdown Highlighting Multimodal Potential and Mathematical Limitations

LLaVA’s Multimodal Potential

Challenges with Mathematical Tasks

An Unusual Perspective on Images

Disclaimer

About The Author

Rhino.fi Launches Stablecoin 1:1, Enabling Neobanks And Fintech Firms To Settle Stablecoins Like Dollars

COO Of MEXC On Why AI Agents, RWAs, And Hybrid Models Will Reshape The CEX Landscape

Inflectiv Introduces AVP To Standardize Secure Credential Management For AI Agents

What Saeed Al Fahim From Tharwa Sees In Web3 That Many Institutions Are Still Evaluating

Rhino.fi Launches Stablecoin 1:1, Enabling Neobanks And Fintech Firms To Settle Stablecoins Like Dollars

Inflectiv Introduces AVP To Standardize Secure Credential Management For AI Agents

What Saeed Al Fahim From Tharwa Sees In Web3 That Many Institutions Are Still Evaluating

Google Transforms Stitch Into AI-Driven Design Canvas For Fast UI Creation And Collaborative Prototyping

The Calm Before The Solana Storm: What Charts, Whales, And On-Chain Signals Are Saying Now

Crypto In April 2025: Key Trends, Shifts, And What Comes Next