Analysis News Report Technology
September 06, 2023

LLaVA vs. GPT-4: An Open-Source AI Showdown Highlighting Multimodal Potential and Mathematical Limitations

At the recent presentation of GPT-4, one of the standout features was its ability to engage in conversations enriched with images. However, this capability is yet to be integrated into OpenAI’s offering. While we previously highlighted Bing’s competence in this aspect, an open-source solution has now emerged in the form of the “Large Language and Vision Assistant” (LLaVA).

Mathematical Challenges: LLaVA's Open Source AI Struggles, Unlike GPT-4

LLaVA’s Multimodal Potential

LLaVA is an open-source multimodal AI that combines language and vision processing. A demo of LLaVA can be accessed here.

The odd thing about this picture is that a man is holding a clothesline with a white shirt on it as he leans out of the window of a yellow taxi. This situation is unusual because it is uncommon to see people leaning out of car windows while holding clothing or undergarments. The man is probably attempting to dry his shirt in an unsafe and inefficient manner while the taxi is moving.

We conducted a simple test by inputting a picture of Taxi and man, and LLaVA provided me with a descriptive analysis. However, my attempts to challenge it with mathematical problems, similar to those tackled by Bing, proved futile. LLaVA appears to struggle with mathematics, despite its proficiency in image recognition.

Challenges with Mathematical Tasks

We presented LLaVA with a mathematical problem involving trigonometry, akin to those successfully solved by Bing. Unfortunately, LLaVA could not provide the correct solution, returning an erroneous answer for the square root of 169 as 13.2.

Let’s solve this query

The task at hand involves determining the length of the diagonal of a triangle with sides measuring 5 cm and 12 cm. To accomplish this, we can employ the Pythagorean theorem, which states that in a right-angled triangle, the square of the hypotenuse’s length (the side opposite the right angle) equals the sum of the squares of the other two sides.

In this scenario, the hypotenuse corresponds to the diagonal of the triangle, which we can ascertain by taking the square root of the sum of the squares of the two given sides.

To begin, we must find the length of the diagonal, which is the side connecting the two non-adjacent vertices of the triangle. In this instance, those two vertices are associated with the sides measuring 5 cm and 12 cm.

Now, let’s apply the Pythagorean theorem:

(5 cm)^2 + (12 cm)^2 = diagonal^2

5^2 + 12^2 = diagonal^2

25 + 144 = diagonal^2

169 = diagonal^2

Finally, we can determine the length of the diagonal by taking the square root of both sides:

√169 = diagonal

The length of the diagonal is approximately 13.2 cm.

An Unusual Perspective on Images

LLaVA excels in its ability to converse with images, although challenges persist, particularly in mathematical problem-solving. For example, when presented with an image of a man leaning out of a yellow taxi window, holding a clothesline with a white shirt, LLaVA provided an unusual perspective. It suggested that such a scene is atypical, as it is not common to witness individuals leaning out of car windows while holding clothing. The analysis indicated that the man may be attempting an unconventional and potentially unsafe method of drying his shirt while the taxi is in motion.

While LLaVA offers promising multimodal capabilities, particularly in conversing with images, it faces limitations in mathematical problem-solving. It’s worth noting that Google’s capabilities in this regard surpass LLaVA’s, as demonstrated by a more accurate solution to a similar mathematical problem.

The development of AI with multimodal capabilities is undoubtedly an exciting advancement, and LLaVA is a commendable open-source effort in this direction. However, improvements are needed to enhance its mathematical reasoning capabilities to match its proficiency in image analysis.

For a more accurate mathematical solution, Google’s capabilities are currently superior: Google’s Mathematical Problem Solver.

Read more about AI:

Disclaimer

Any data, text, or other content on this page is provided as general market information and not as investment advice. Past performance is not necessarily an indicator of future results.


The Trust Project is a worldwide group of news organizations working to establish transparency standards.

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

More articles
Damir Yalalov
Damir Yalalov

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

Hot Stories
Join Our Newsletter.
Latest News

OpenAI Expands ChatGPT’s Capabilities with Web Browsing

by Agne Cimermanaite
September 27, 2023

CGV Research: Telegram Open Network’s (TON) Technological Advancements and Future Prospects

TL;DR TON’s Past In 2018, founders of Telegram — the Durov brothers, began exploring blockchain solutions suitable ...

Know More

20 Most Underrated AI Startups in 2023: Ranked by Funding

AI remains a constant focal point for investors and entrepreneurs alike. While the spotlight often falls on ...

Know More
Join Our Innovative Tech Community

Read More

Read more
Meta Introduces 28 AI Characters and AI Studio for Expanded Creativity
News Report Technology
Meta Introduces 28 AI Characters and AI Studio for Expanded Creativity
September 27, 2023
Meta Unveils Impressive AI Integration Across Services, from Generative Emu Model to Smart Glasses
Business News Report Technology
Meta Unveils Impressive AI Integration Across Services, from Generative Emu Model to Smart Glasses
September 27, 2023
OpenAI Expands ChatGPT’s Capabilities with Web Browsing
Business News Report
OpenAI Expands ChatGPT’s Capabilities with Web Browsing
September 27, 2023
CGV Research: Telegram Open Network’s (TON) Technological Advancements and Future Prospects
Analysis Opinion Technology
CGV Research: Telegram Open Network’s (TON) Technological Advancements and Future Prospects
September 27, 2023
What You
Need to Know

Subscribe To Our Newsletter.
Daily search marketing tidbits for savvy pros.