Unleash OpenAI Assistant API & GPT-4o With CT Scan Image | Python Tutorial πŸ‘¨β€πŸ’»

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we will explore how we can use the openai assistant API to leverage the multimodal capabilities by taking a text an image as an input and then uh producing the output we're also going to gauge the quality of these responses we're going to put Omni to the test by assigning a role to it as a medical diagnosis professional and we're going to test if it's able to detect brain hamage in an image before I dive into the conceptual framework as well as the code make sure you like subscribe and hit the notification Bell so that you get notified every time I drop a new video about python coding open ey assist as well as AI in general if you are a business and you want an expert team to develop an AI application for you on top of the gbd4 Omni make sure you book up a 45 minute Discovery call and we go on a game plan for you Opia just recently launched their fastest and most affordable Flagship model the gp4 Omni which is multimodal and it's currently dominating the LM model leaderboard by surpassing all the commonly used benchmarks to uh evaluate a model we're going to test the gp4 Omni model with vision multimodal capabilities the first place that I want you to look at is the API reference if you haven't bookmarked this I recommend highly to do so bookmark this it's easier to find all the functions here and the models uh so you can just pick and choose exactly what you need for your use case so we're going to go to API reference and then we're going to scroll down to where it says assistance you're going to click the section where it says create messages and now you'll notice that you can actually assign a role in your message and then you can also assign content in your message the rules can be user and assistant now when you're sending off an image to the assistant you want to set the role as the user and the content will be image we also want to send text so we're going to send this off as a list so it's going to contain not only the text content but it's also going to contain the image file object so if you go under image file you can now specify the ID of that image so what that means we have to create a file with that image I'm going to test out JPEG and PNG but essentially you're going to create the files here and then retrieve that file when you're sending off that message to the assistant you might be already familiar with the assistant framework uh we're going to use the same framework as before create the assistant create the thread and then the message body we're just going to send off the image everything else is the same uh we can send off the streaming capabilities as well that's going to be in the Run function keep in mind the image input would only work on the vision capable models for openi that would be gbd4 Omni as well as the gb4 turbo which both has Vision capabilities and when you're creating the file you can use either one of these models we're going to write up the code in a little bit but I wanted to show you conceptually how this is going to work so we have the assistant this assistant can be anything it could be a medical diagnosis expert it could take CT scans through vision and then then spit out a response so we'll just assign The Prompt and the knowledge base in the assistant and then we are going to then create a thread right and then we're going to create the file now you can create the file ahead of time if you want to repurpose the file for other use cases but if you're just using it for a specific message you create the file before you send the message and then when you're sending the message you basically insert this file as a image along with your text and then you're going to get the assistance response it's really simple but it's really important how you implement this into practice because there's a lot of avenues you can take this we're going to go ahead and install the OPI package in our Google collaborator notebook I'm also going to link up the notebook in the description so you can play around with it you can adapt it to your needs use it as boiler plate code or even use the LM models to create something on top of this going to import all these functions also make sure you have your open API key available and plugged in under the secret section we're now going to use the open function provided by open to authenticate our client object now with our client object we have access to all these functions and models that AI has provided us under API reference we're also going to import the event handler which is provided by open to handle the streaming capabilities if you go under the API reference and click on models you notice you can also list the models that you have available under your account I know openi is slowly rolling out the a with the gp4 Omni API to uh to the user so you might not get access so you might want to check first if you have access to the model so we use something call Client models. List this is the function to uh this is going to be the function to list the models that you have available under your account these are all the models now let's just display it in a nice fashion just you're just going to Loop through all the models okay so these are all the models that you have available in your account let me see if I have GPT Omni yeah I have gbd4 Omni right here once you've confirmed you have access to the Omni model we're going to start creating our assistant we're going to start with a simple example I have a couple of files in my local directory that are basically images uh I have this image of somebody wearing sunglasses basically want on me to recognize what kind of sunglasses they are and where can I buy it in the US so notice when I'm creating the assistant I have the model as gp4 Omni so let's go run this following the assistant framework then I'm going to create the thread message it's going to be an empty thread in this case and then we're going to pen messages on top of it in step four we're going to upload our files I'm going to use Google collaboratories upload feature to upload the file for my local directory and then I'm going to use the op files. create feature to create the file for uh uh the assistant to use now if you go back to the API reference under files you notice you have the files. create function and for that function parameter is file the file object of course and then you have something called a purpose and in the purpose you can use assistance for assistance and this could be any file like document PDF uh something like that and you can use vision for assistant image inputs so we're going to use Vision as you can see we use the purpose Vision we're not going to choose our file we'll just upload the image of the M wearing sunglasses and you'll notice we have a file ID right now after the file has been created on the openi servers step five we're going to send our message with the image to the assistant so we're going to use the threads. message create function to create that message and then and under the content section we're going to send off not just the text but also the image so as you can see we sending off text and the text is where can I buy these glasses okay the type is text and the actual text is where can I buy these glasses and the type here is image file and then for the image file I'm going to plug in the file ID got from uploading the uh image to openingi servers so then we're going to run this and the last step is streaming the assistant's response we're going to use the event handler function to now stream the response let's go ahead and see what it says it's working as you can see it understood the image the sunglass in the image appears to be classic style similar to Rayband wave erors here are some stores in the USA where can buy these items this is great this shows that the model has multimodal capabilities it can process image supposed to also process videos as specific frames of the image openi hasn't provided functions yet to process the audio but if you still want to use the voice capabilities you can use the TTS and the whisper model to create layers around the uh gbd4 Omni uh model should still reduce the latency of the responses in this case it was very fast we're going to try a couple of other images gp4 Omni understands emotions so I have a picture looks like Obama is crying and it's a PNG file so we're going to upload this image and see if uh Omni can recognize the emotion in this image let's go upload this file B successfully uploaded that file we're going to take that file ID paste it here now of course when of course now when you build of course when you build out in to end U solution you're going to be wrapping all this into a function so that you don't have to manually inut all the file names Etc and if you want to learn more about how you can create assistance make sure you join my school Community where I teach you py on we do agent building workshops we also have cheat sheets deployment Frameworks that you can use uh for your needs all right so let's go ahead and ask a question explain the emotion of this well explain this image and tell me what emotion is the character successfully the next step is streaming the assistant response let's go see what the assistant says the image depicts Barack Obama the 44th President of the United States wiping his face with a hand handkerchief it appears to be emotionally charged moment possibly uh one where he's feeling moved or reflective the emotion conveyed seems to be one of sadness Nostalgia or deep emotional impact this could be this could be interpreted from the following visual cues tears facial expression contexual Clues the details are absolutely amazing I mean um this model could train you to be the next Charlotte Holmes you know it's just remarkable now when you're trying to build applications on top of this think about the implication you could do data analysis you could do medical diagnosis I have an image of a brain hamage I'm going to just plug in this image and see if only can recognize this is a brain hamhed file as you are an expert medical diagnosis professional and make sure we're using the same file I also want to point out when you're attaching the image file to the message um you can also specify the detail that you want to use it could be low or high by using low you can use fear tokens and if you want to use a high resolution you can use high so this is something you can take into consideration if you want to manage your cost so this ran successfully let's see if the assistant is able to detect the brain hemorrhage as an expert in medical diagnosis analyze they provided image of CT scans the scans shows Traverse sections of the brain at the same level typically used diagnos various inter cranial conditions well I'm not an expert but it sounds just as good as any medical expert the C SC shows a distinct hyperdense lesion appearing white conclusion this C can shows an interal hyperdose lesion likely representing either uh inter cereal Hemorrhage I can't even pronounce these words but check this out it successfully detected a hemorrhage I did not provide any kind of message other than the image itself so this is great the cool thing with the agent framework is that you can now expand this by attaching a knowledge base where it can pull Expert Medical information it can also leverage something called threats which is a collection of message Mees think about this if you want to make a product with this uh you create a seamless user experience where the user comes in puts in their information and and stores all the users conversation under the thread so user can come back at a later time and the assistant or agent could retrieve all their previous messages to give them the preferred uh recommendation the agent could work alongside medical doctors to keep track of the user's profile progress and diagnosis Etc so with the assistant version do openi has also introduced capabilities to attach Vector stores and databases to your assistant as well as expiration policies and auto truncation of the thread messages what that means in non- technical terms is that open AI basically does all the management behind the scenes to make sure that the content is efficient then you're able to manage your cost effectively because at the end of the day the more content you use or the more more tokens you use you're going to incur more cost so openi has introduced a lot of these automatic cost management techniques behind the hood so you don't have to worry about that I'm going to leave a link to this notebook in my school Community python for assistant agents you'll find it under resource Hub uh I also include other resources uh like deployment Frameworks as well as openi integration guides and uh make sure you check it out you'll also have a chance to directa with the community earn points and get access to python accelerator workshops and additional bonuses until next time this is mt's your to partner to build deploy and sell AI agents
Info
Channel: CustomGPT AI Agent Academy
Views: 688
Rating: undefined out of 5
Keywords: add gpts to website, ai agency, ai automation, ai automation agency, ai business ideas, ai tools, assistant api, assistants api, build gpt, business ai, chatgpt, chatgpt on custom data, chatgpt update, create a openai gpt, custom chatgpt, custom gpt tutorial, gpt assistant, gpt crawler, gpts, how to make gpts, liam ottley, machine learning, openai, openai assistants, openai devday, openai function calling, openai gpts, virtual assistant, virtual assistants
Id: nwb0K0fKNZw
Channel Id: undefined
Length: 11min 48sec (708 seconds)
Published: Wed May 15 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.