GPT-4 Can Handle Your Requests for Images, Documents, Diagrams, and Screenshots

News Report Technology

In Brief

GPT-4 can handle requests for images, documents, diagrams, and screenshots. It’s an improvement over GPT-3, which only handled text.

GPT-4 has superior performance in various exams and tests and can access additional information and details through images that may not be available in written form.

The Trust Project is a worldwide group of news organizations working to establish transparency standards.

gpt-4 images documents screenshots

OpenAI’s latest milestone, the new model GPT-4, can accept requests that include images, documents with text, diagrams, or screenshots as inputs. This represents a significant improvement over the previous version, GPT-3, which could only understand and output text. With this new feature, GPT-4 generates text outputs given inputs consisting of interspersed text and images.

“Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs,”

OpenAI wrote.

ChatGPT-4 has a greater size than its predecessors, indicating that it has undergone training on a larger amount of data and contains more weights in its model file, resulting in a higher cost for its operation. The newest AI language can generate human-like text by using deep learning and being pre-trained on a large dataset.

GPT-4 has demonstrated superior performance over other AI languages in a variety of exams and tests due in part to its ability to access additional information and details through images that may not be available in a written form.

The new GPT-4 model can tell you what exactly is depicted in the illustration, analyze it, and even explain its meaning. In the demo, GPT-4 explained the visual joke where a VGA cable is connected to the iPhone. It could also explain what is unusual in a picture presenting “extreme ironing,” which you can check out below.

gpt-4 images
Source: OpenAI

However, there are also more useful implications to GPT-4’s newfound knowledge. In the presentation, it was shown that PGT-4 could tell what could be cooked from the ingredients shown in the picture. This means the model can help you cook if you have food products and no clue what to do with them. Take a snapshot of the food you have, and Chat-GPT can tell you what you can prepare from the ingredients that you have at home.

This ability to understand and interpret visual information makes GPT-4 a powerful tool for tasks such as image captioning, visual question answering, and even content creation. With the integration of both text and visual understanding, GPT-4 has the potential to revolutionize various industries, such as advertising, design, and e-commerce, and help people do the boring, mundane tasks for them.

The advanced language model also ‘understands’ screenshots and documents with text, tables, diagrams, or other visual representations. For instance, if you upload a three-page research paper and need it summarized and explained, GPT-4 is capable of doing so. 

Bloomberg’s anchor Jon Erlichman demonstrated how he was able to transform a hand-sketched design into a functional website.

The new technology can also be used as a mobility aid as it could be used to describe the environment for visually impaired people. To this end, Open AI has already partnered with an application called Be My Eyes which has been designed to give blind people a helping hand when they need to have a look at something, for instance, while grocery shopping. The app lets “sighted volunteers and professionals lend their eyes to solve tasks big and small to assist blind and low-vision people lead more independent lives.” Now, it also offers a virtual volunteer tool powered by OpenAI’s GPT-4.

Although OpenAI’s GPT-4 currently offers the ability to process text and images as inputs, the model is not yet equipped to handle audio and video inputs. Nevertheless, there are indications that these modalities might be included in the next iteration of the technology.

Read more:


Any data, text, or other content on this page is provided as general market information and not as investment advice. Past performance is not necessarily an indicator of future results.

Agne Cimermanaite

Agne is a journalist and writer with a background in literature, culture, and arts. She entered the Web3 space in 2021 and began writing about cryptocurrency and NFTs. Agne is passionate about technology and storytelling and is always on the lookout for exciting stories.

Follow Author

More Articles