Build an AI Image Captioning App With GPT-4 Vision API in 3 Min

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hi everyone, in this video we're going to create an AI image captioning app that will generate social media descriptions and tags for your posts using the newly released GPT 4 Vision API. Let s get started. Alright. The first step is to install the packages and download a few sample images that will use as examples later on. Once that's done, import the Open AI package along with the getPass package, enter your open API key and run the following sale, now, before we can pass the images to Open AI, we need to encode them, for which we will use a special function named ENCODE image. Keep in mind that with this new update, the chat completions endpoint is now located under the chat property, which means that it might break your previous code. Also, with the GPT 4 Vision Preview model, you can now pass an array of objects into the content field in the messages with different types for image and text. So for the text type will'll ask a question like Can you please tell me what is displayed in this image? And for the image URL type we will insert our encoding with the encoded image itself. Now let's also run the display function from IPython to showcase our image and then print the response from OpenAI Just take a look at how accurately the model interprets the image. It goes beyond just identifying the lion in front, but also discerns the lioness in the background and points out that it's looking away from the camera. It's truly remarkable how this language model can detect such fine details that even few specialized image captioninging models can do. This is why I believe this API has a huge potential for all AI developers and founders. Okay, let's now proceed with the AI captioning app that we will build with Gradio. For the main function, which is called Caption Image will pass in image path parameter that will encode the image and then pass the encoded image to OpenAI just like we did in example before. Here you can get really creative just by switching the prompt. For example, you can make the model create. SEO tags for ecommerce or you can pass multiple images and make it generate a full script or even diagnose medical images as I explained in one of my previous videos. All Right. For this example, we're just going to use a simple prompt. Like provide a descriptive caption for the image along with 10 to 15 relevant tags. Now to build our UI with Gradio, we'll just use a simple Gradio interface that takes our image captioning function, the Gradio image component as an input and text as an output. For the examples, let's include our downloaded images. Now we just have to add a demo. launch command at the end and run this cell. You should see your Gradio interface appear below this cell, which you can also open using the following link. So now let's try to use it with some example starting with an image of the Eiffel Tower. As you can see, the app provides incredibly accurate captions and tags. It even describes the silhouette of the tree branches which are not easily noticeable in the image, and it also includes tags that indicate the time of the day. You can try it yourself with a few images downloaded from the internet and you can even share this link with your friends and others. So that's it for this video. Thank you for watching. I will do a lot more practical tutorials on all the crazy stuff that openai has just released on the Dev Day so make sure to subscribe and stay tuned for that.

Info

Channel: VRSEN

Views: 4,938

Rating: undefined out of 5

Keywords: GPT-4 Vision API, image captioning tutorial, AI image analysis, social media automation, OpenAI tutorial, Gradio app development, AI for social media, image tagging, AI developer guide, machine learning, visual content description, AI API integration, GPT-4 tutorial, AI image captioning, chatgpt tutorial, chatgpt, ai, artificial intelligence, openai, openai dev day, Gradio, AI Development, gpt 4 vision, openai assistant api, chatgpt 4 vision, openai vision api

Id: -ETwGfjt5ow

Channel Id: undefined

Length: 3min 9sec (189 seconds)

Published: Wed Nov 08 2023