News Report Technology
March 15, 2023

GPT-4 Can Handle Your Requests for Images, Documents, Diagrams, and Screenshots

gpt-4 images documents screenshots

OpenAI’s latest milestone, the new model GPT-4, can accept requests that include images, documents with text, diagrams, or screenshots as inputs. This represents a significant improvement over the previous version, GPT-3, which could only understand and output text. With this new feature, GPT-4 generates text outputs given inputs consisting of interspersed text and images.

“Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs,”

OpenAI wrote.

ChatGPT-4 has a greater size than its predecessors, indicating that it has undergone training on a larger amount of data and contains more weights in its model file, resulting in a higher cost for its operation. The newest AI language can generate human-like text by using deep learning and being pre-trained on a large dataset.

GPT-4 has demonstrated superior performance over other AI languages in a variety of exams and tests due in part to its ability to access additional information and details through images that may not be available in a written form.

The new GPT-4 model can tell you what exactly is depicted in the illustration, analyze it, and even explain its meaning. In the demo, GPT-4 explained the visual joke where a VGA cable is connected to the iPhone. It could also explain what is unusual in a picture presenting “extreme ironing,” which you can check out below.

gpt-4 images
Source: OpenAI

However, there are also more useful implications to GPT-4’s newfound knowledge. In the presentation, it was shown that PGT-4 could tell what could be cooked from the ingredients shown in the picture. This means the model can help you cook if you have food products and no clue what to do with them. Take a snapshot of the food you have, and Chat-GPT can tell you what you can prepare from the ingredients that you have at home.

This ability to understand and interpret visual information makes GPT-4 a powerful tool for tasks such as image captioning, visual question answering, and even content creation. With the integration of both text and visual understanding, GPT-4 has the potential to revolutionize various industries, such as advertising, design, and e-commerce, and help people do the boring, mundane tasks for them.

The advanced language model also ‘understands’ screenshots and documents with text, tables, diagrams, or other visual representations. For instance, if you upload a three-page research paper and need it summarized and explained, GPT-4 is capable of doing so. 

Bloomberg’s anchor Jon Erlichman demonstrated how he was able to transform a hand-sketched design into a functional website.

The new technology can also be used as a mobility aid as it could be used to describe the environment for visually impaired people. To this end, Open AI has already partnered with an application called Be My Eyes which has been designed to give blind people a helping hand when they need to have a look at something, for instance, while grocery shopping. The app lets “sighted volunteers and professionals lend their eyes to solve tasks big and small to assist blind and low-vision people lead more independent lives.” Now, it also offers a virtual volunteer tool powered by OpenAI’s GPT-4.

Although OpenAI’s GPT-4 currently offers the ability to process text and images as inputs, the model is not yet equipped to handle audio and video inputs. Nevertheless, there are indications that these modalities might be included in the next iteration of the technology.

Read more:

Disclaimer

Any data, text, or other content on this page is provided as general market information and not as investment advice. Past performance is not necessarily an indicator of future results.


The Trust Project is a worldwide group of news organizations working to establish transparency standards.

Agne is a journalist who covers the latest trends and developments in the metaverse, AI, and Web3 industries for the Metaverse Post. Her passion for storytelling has led her to conduct numerous interviews with experts in these fields, always seeking to uncover exciting and engaging stories. Agne holds a Bachelor’s degree in Literary Studies from the University of Amsterdam and has an extensive background in writing about a wide range of topics including cybersecurity, travel, art, and culture. She has also volunteered as an editor for the animal rights organization, “Open Cages,” where she helped raise awareness about animal welfare issues. Currently, Agne splits her time between Barcelona, Spain, and Vilnius, Lithuania, where she continues to pursue her passion for journalism. Contact her on [email protected].

More articles
Agne Cimermanaite
Agne Cimermanaite

Agne is a journalist who covers the latest trends and developments in the metaverse, AI, and Web3 industries for the Metaverse Post. Her passion for storytelling has led her to conduct numerous interviews with experts in these fields, always seeking to uncover exciting and engaging stories. Agne holds a Bachelor’s degree in Literary Studies from the University of Amsterdam and has an extensive background in writing about a wide range of topics including cybersecurity, travel, art, and culture. She has also volunteered as an editor for the animal rights organization, “Open Cages,” where she helped raise awareness about animal welfare issues. Currently, Agne splits her time between Barcelona, Spain, and Vilnius, Lithuania, where she continues to pursue her passion for journalism. Contact her on [email protected].

Hot Stories
Join Our Newsletter.
Latest News

CGV Research: Telegram Open Network’s (TON) Technological Advancements and Future Prospects

TL;DR TON’s Past In 2018, founders of Telegram — the Durov brothers, began exploring blockchain solutions suitable ...

Know More

20 Most Underrated AI Startups in 2023: Ranked by Funding

AI remains a constant focal point for investors and entrepreneurs alike. While the spotlight often falls on ...

Know More
Join Our Innovative Tech Community
Read More
Read more
Grayscale Seeks SEC Approval to Convert Ethereum Trust into Spot ETF
Markets News Report
Grayscale Seeks SEC Approval to Convert Ethereum Trust into Spot ETF
October 2, 2023
Best 7 AI Meme Generators for Creating Viral and Hilarious Images
AI Wiki Software Technology
Best 7 AI Meme Generators for Creating Viral and Hilarious Images
October 2, 2023
Base Network TVL Drops Amidst USDC Burn, zkSync Era Takes Over
News Report
Base Network TVL Drops Amidst USDC Burn, zkSync Era Takes Over
October 2, 2023
Best 10 Dark Web Browsers for Anonymous Deep Web Surfing
Security Wiki Software Technology
Best 10 Dark Web Browsers for Anonymous Deep Web Surfing
October 2, 2023
What You
Need to Know

Subscribe To Our Newsletter.
Daily search marketing tidbits for savvy pros.