Analysis News Report Technology
September 06, 2023

LLaVA vs. GPT-4: An Open-Source AI Showdown Highlighting Multimodal Potential and Mathematical Limitations

At the recent presentation of GPT-4, one of the standout features was its ability to engage in conversations enriched with images. However, this capability is yet to be integrated into OpenAI’s offering. While we previously highlighted Bing’s competence in this aspect, an open-source solution has now emerged in the form of the “Large Language and Vision Assistant” (LLaVA).

Mathematical Challenges: LLaVA's Open Source AI Struggles, Unlike GPT-4

LLaVA’s Multimodal Potential

LLaVA is an open-source multimodal AI that combines language and vision processing. A demo of LLaVA can be accessed here.

The odd thing about this picture is that a man is holding a clothesline with a white shirt on it as he leans out of the window of a yellow taxi. This situation is unusual because it is uncommon to see people leaning out of car windows while holding clothing or undergarments. The man is probably attempting to dry his shirt in an unsafe and inefficient manner while the taxi is moving.

We conducted a simple test by inputting a picture of Taxi and man, and LLaVA provided me with a descriptive analysis. However, my attempts to challenge it with mathematical problems, similar to those tackled by Bing, proved futile. LLaVA appears to struggle with mathematics, despite its proficiency in image recognition.

Challenges with Mathematical Tasks

We presented LLaVA with a mathematical problem involving trigonometry, akin to those successfully solved by Bing. Unfortunately, LLaVA could not provide the correct solution, returning an erroneous answer for the square root of 169 as 13.2.

Let’s solve this query

The task at hand involves determining the length of the diagonal of a triangle with sides measuring 5 cm and 12 cm. To accomplish this, we can employ the Pythagorean theorem, which states that in a right-angled triangle, the square of the hypotenuse’s length (the side opposite the right angle) equals the sum of the squares of the other two sides.

In this scenario, the hypotenuse corresponds to the diagonal of the triangle, which we can ascertain by taking the square root of the sum of the squares of the two given sides.

To begin, we must find the length of the diagonal, which is the side connecting the two non-adjacent vertices of the triangle. In this instance, those two vertices are associated with the sides measuring 5 cm and 12 cm.

Now, let’s apply the Pythagorean theorem:

(5 cm)^2 + (12 cm)^2 = diagonal^2

5^2 + 12^2 = diagonal^2

25 + 144 = diagonal^2

169 = diagonal^2

Finally, we can determine the length of the diagonal by taking the square root of both sides:

√169 = diagonal

The length of the diagonal is approximately 13.2 cm.

An Unusual Perspective on Images

LLaVA excels in its ability to converse with images, although challenges persist, particularly in mathematical problem-solving. For example, when presented with an image of a man leaning out of a yellow taxi window, holding a clothesline with a white shirt, LLaVA provided an unusual perspective. It suggested that such a scene is atypical, as it is not common to witness individuals leaning out of car windows while holding clothing. The analysis indicated that the man may be attempting an unconventional and potentially unsafe method of drying his shirt while the taxi is in motion.

While LLaVA offers promising multimodal capabilities, particularly in conversing with images, it faces limitations in mathematical problem-solving. It’s worth noting that Google’s capabilities in this regard surpass LLaVA’s, as demonstrated by a more accurate solution to a similar mathematical problem.

The development of AI with multimodal capabilities is undoubtedly an exciting advancement, and LLaVA is a commendable open-source effort in this direction. However, improvements are needed to enhance its mathematical reasoning capabilities to match its proficiency in image analysis.

For a more accurate mathematical solution, Google’s capabilities are currently superior: Google’s Mathematical Problem Solver.

Read more about AI:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

More articles
Damir Yalalov
Damir Yalalov

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor's degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

Hot Stories
Join Our Newsletter.
Latest News

Samourai Wallet Founders Accused of Facilitating $2B in Darknet Deals

The apprehension of the Samourai Wallet founders represents a notable setback for the industry, underscoring the persistent ...

Know More

The DOGE Frenzy: Analysing Dogecoin’s (DOGE) Recent Surge in Value

The cryptocurrency industry is rapidly expanding, and meme coins are preparing for a significant upswing. Dogecoin (DOGE), ...

Know More
Join Our Innovative Tech Community
Read More
Read more
Metaverse Fashion Week 2024: Decentralised Governance and Investment Opportunities in Virtual Fashion
Art Lifestyle Stories and Reviews Technology
Metaverse Fashion Week 2024: Decentralised Governance and Investment Opportunities in Virtual Fashion
April 29, 2024
From Concept to Reality: How Roblox’s Avatar Auto Setup and Texture Generator Accelerate Creativity and Innovation
Software Stories and Reviews Technology
From Concept to Reality: How Roblox’s Avatar Auto Setup and Texture Generator Accelerate Creativity and Innovation
April 29, 2024
Jack Dorsey’s Block Unveils Groundbreaking Bitcoin Mining System with Advanced Three-Nanometer Chip
Software Stories and Reviews Technology
Jack Dorsey’s Block Unveils Groundbreaking Bitcoin Mining System with Advanced Three-Nanometer Chip
April 29, 2024
Friend.Tech Postpones V2 Release To May 3rd To Implement FRIEND Token Airdrop Sharing For ‘Key’ Holders
Markets News Report Technology
Friend.Tech Postpones V2 Release To May 3rd To Implement FRIEND Token Airdrop Sharing For ‘Key’ Holders
April 29, 2024