Ask Anything Tool: Chat with Your Video using ChatGPT, MiniGPT4, and StableLM

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone welcome to AI anytime channel so in today's video we are going to have a look at a new tool called ask anything so ask anything is an interesting tool that has released this week which combines multiple AI models like vikuna and Chad GPT and couple of others as well so they have combined all these models to create this tool and the tool is available on hugging face spaces through a graduate application so anybody on the internet can test this tool out so in this video we are going to test this tool because it says tool for chatting about you know video with chat GPT mini GPT 4 which has vikuna plus vision and then recently released stable LM stable language models so you upload a video file on this tool and you can chat with that video as well so basically you know it all and it also provides you a state recognition the action recognition it gives you a caption or the summary of that video and then you can interact with the tool so you can currently see I am on the GitHub repository which says ask anything by opengv lab okay they have online demo available through gradu as I said they have deployed this in hugging face spaces the best way to deploy these AI models nowadays open source research community and they have some details related to their ongoing activities with this tool now if you come back to their grade your application which is through this GitHub repository here it says online demo open any spaces so I've opened that here it says ask anything with GPT it says a multi-functional video q a tool that combines the functions of action recognition visual captioning and chair GPT so they have vit models they have llm model and they have action recognition model all combined to create this pretty interesting tool and it says our solution generates dense descriptive captions for any object and action in a video offering a range of language styles to suit different user preferences it supports users to have conversation in different lens emotions authenticity of language now if you come if you see here it they interface they have a drop video here or click to upload where you can upload your file basically the video file or you can also load some examples because they have given you some examples with you you can watch it and chat with that video the examples video okay and we'll try to see this so the first thing if I click on this you know one of the example is this uh this video from the examples let's say it you can currently see the gradual loader over there it says in the queue and it's taking time so guys they are face faces they might have been using the GPU there and it might takes little time to you know influence this model OKAY on gradu and you know several people might have been using at the same time there's also called some like NC soon now if you see this is a video which is a 10 seconds video frames this video that guy you know hitting some baseballs swing always doing some Swings with the baseball okay so this is what he's doing now they have some option you can see it says watch it once I click on watch it the gradual loader again starts loading and this one says loading videos and hitting baseballs you got the action here guys so this is the action recognition in the video which is hitting baseball and I upload a video so it in the this right hand side that you see the interface section it says I upload a video so okay the status message you upload a video about a man playing basket baseball sorry baseball swings a bat at a ball while other people play baseball oh okay and now since click chat button so now what I will do if I click on let's chat guys so if you see it says please paste your key here and surprisingly uh if I paste my key here it will be visible for all of you so of course that I will blur that and I will delete that key after this video but uh and I think they would have handled it in a better way you know it's uh you should have coded in in the back end that it should be like a password hashing or something right it should not be visible here on the interface but anyway and once I click on this again I'll click on let's chat and once now I click let's chat what it does it kind of uh gave me an option where I can chat with this so let me ask the question is the guy uh wearing a cap in the video frame or something in the video frames now this is my question is a guy wearing a cap in the video when I click on the Run what it will do guys it will try to you know retrieve the answer retrieve the information for me because it says uh multi-functional video question answering tool that combines action recognition visual captioning and chair GPT now if you see action recognition is uh the very uh famous task that we have seen when it comes to deep learning we used to train lstm plus CNN combined model that we used to call lrcnn to you know perform this action recognition models okay so they have used action recognition and visual captioning and chat GPT let's see I've asked this question taking a lot of time guys okay because and it makes sense because they are sucking face spaces I don't know what kind of infrastructure they might have been using in the back end that they have deployed this model on the space and how many people are using this but it might take little time so let me do one thing let me pause the video and then once the answer appears I will come back so you can see I have got my answer which says yes the guy in the video is wearing a cap perfect now let's ask one more question related to this video and then we'll upload a sample video from our site let me ask this question what is the color you know of the uh of the mat on the floor or the grass mat of the grass mat on this load let me ask this question and once I ask this question let's see if it takes time and now the first question that I ask is the guy wearing a cap in the video frame I got yes the guy in the video is very kept maybe I can ask the color of that cap as well to see what kind of you know uh visual model that they have been using how accurate is it but you can play around this tool guys it's available on graduates it's free but you need an open AI API key to do that and once you uh once you complete this testing please delete your keys from open AI dashboard okay it says the grass mat on the floor is green and this is fine so this is working for this sample video guys and just to give you information you cannot upload a video you know uh linear than 60 seconds in length Okay you have to upload a video shorter than 60 seconds now let's click on some you know uh video here so I will delete this or I will just cut that and I will upload a video where let's upload this video guys so I am uploading this video I click on watch it and I uploaded this video and you can see it will appear here it's in the queue right now loading videos let's see what we are getting you know in the action so in the action you see we got second hands okay that the video that we have uploaded and we also have got uh our caption here which says you upload a video that will be uploaded but you uploaded a video about a man in a suit and Ty is standing in an airplane with other people in seats let's play this video and see I think this is correct so at least this is giving you the action this is giving you the caption and you can also chat with it on the chat part this might get improved a bit uh in the back end you see they're asking for your open AI API Keys the one thing that can happen right it kind of has the frames might have been sending to open AI models in back end uh and they would have been generating these answers you know from that now it depends on you how you want to use this tool guys the tool is available through the GitHub repository ask anything on opengv lab it says uh you know video with chat GPT mini GPT and stable LM over here and that's a a high level view about this app guys okay this does not look like an open source at this moment okay now if you see their updates you see video chat demo is available explicit communication with chair GPT sensitive with time and that that's looks correct the statement doesn't work that good guys to be honest okay I it's an interesting tool the uh the I really appreciate of course the hard work put by the lab here opengb lab and the team but really are not that impressive you know at the first look but it might get impressive with you know mini GPT for video like we have vikuna over here when you click on that you know and it's taking you to the same there okay you see this how you can you know instruction to uh mini GPT if a simple extensor of mini GPT for video this one that you see you can do it open source it's completely you can deploy it locally how you can use vikuna Etc you know to run this uh and set it up in your local machine if you have a GPU available with you but this one that currently you see with Chad GPT one of course not uh an open source model over there because it's taking you the keys Etc and it is you just GPT uh models in the back end now if you come over here and you see uh with vikuna if you want to use mini GPT 4 for video you can set it off here that's the way that they have given in the documentation how you can use this prepare the environment if you go back you know you have with moss we have with stable LM which has been released I was uh you see this stable LM looks good okay you can test this out and how you can set it up locally as well so you can create an environment in your anaconda machine and install the dependencies ETC and you can run this out guys okay so this looks good so far you can test it out let's do one thing I'll also create another video for covering up with stable LM and you know mini gpt4 you know that will try to run it locally in a uh in our local machine through Anaconda as they have given the uses documentation over here so this is on the chat GPT backend that you see ask anything with GPT where using open AI apis to you know chat with it so I hope you like the video guys you know and you can go ahead and try this as anything from opengb Labs okay if you have any thoughts or feedback or any question please let me know in the comment box okay and that's all for today's video guys thank you so much for watching see you in the next video

Info

Channel: AI Anytime

Views: 614

Rating: undefined out of 5

Keywords: gradio, ask anything, llm, chatgpt

Id: nT5--SZuIe0

Channel Id: undefined

Length: 11min 17sec (677 seconds)

Published: Mon Apr 24 2023