How To Install LLaVA 👀 Open-Source and FREE "ChatGPT Vision"

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

chat GPT vision is one of the coolest advancements in AI that I've seen in a while and I made a video about a week ago going over a bunch of different use cases that it can be used for it can read what's in diagrams it can look at street signs and figure out if you can park there it can take a drawing of a website and convert it into code it's pretty incredible but I don't actually have access to it yet I have chat GPT plus and I'm still waiting I don't know when I'm going to get it but what if I told you you could have chat GPT Vision but it's completely open- Source free and available right now that's what you get with the project lava it is absolutely comparable to chat GPT Vision it is extremely good and today I'm going to show you how to install it and then I'm going to show you how to use it let's go so first this is the research paper that kicked it all off visual instruction tuning and if we click into it it's by these four people out of the University of Wisconsin Madison Microsoft research and Columbia University but this video isn't about the research paper but I do encourage you to check it out I'll drop a link in the description below and here's the actual project the GitHub code lava large language and vision assistant and it uses a fine-tune version of the Llama 2 model so again all of this is open source and free and it works really well and just in case you don't believe me here's an example of it working so I uploaded a diagram of a cell and I asked what is it and then I asked it to Define all of the different terms in that image and it did so perfectly now I spent the last few hours getting this to work I first tried it on my Windows machine and realized that some of the dependencies require Linux then I tried to get it working on my Mac and realized I needed Cuda which requires a GPU so you need a Linux machine to get this working and thus I'm going to show you how to use it on runpod because they run Linux machines with beefy graphics cards you can definitely get this working completely locally and completely for free if you have a Linux machine and if you have WSL on Windows you could probably get this working also if you want to see me create a tutorial using WSL let me know in the comments below so if you don't already have a runpod account go ahead and sign up once you do log in and this is the screen you're going to be met with then you click over here to secure cloud and we're going to choose the RTX a6000 as our GPU it's 79 cents an hour you could probably get away with an even smaller GPU because honestly these models run blazingly fast and again it's based on the Llama 2 13B models so it is a pretty small model so we're going to go ahead and click deploy right there and before we deploy it we're going to click this customize deployment button and right here rather than 20 GB of container dis temporary we're going to change that to 100 because we do need to download these models and for the ports that we're going to expose let's change this to 3,000 once you do that go ahead and click set overrides and then continue and then deploy and then it's going to take just a couple minutes to get those deployed soon after a few seconds it'll drop you on this screen you can you can click this drop down and watch the server deploy now we're going to be doing everything through the command line so it's a little bit technical but I'm going to walk you through everything step by step I'm also going to drop a gist of all of the instructions that I'm giving you in the description below so you can just copy paste it there now it's done then we're going to click connect right here then we click Start web terminal and then connect to web terminal once we're here we're going to download the project so get clone and then the URL switching back to the lava GitHub page if you want that URL you click this little green code button and then you click copy right there now back to the runp Pod command line we're just going to hit enter once that's done we're going to CD into that new folder so CD lava then we're going to make sure we have the latest version of pip so pip installed D- upgrade pip then we're going to install torch so pip install torch and then we're going to install all the other requirements so pip install - e and then period okay now that that's done the next thing we're going to do is spin up the controller so python 3-m lava. serve. controller D- host and then we put 0.0.0.0 d-port 10,000 and then hit enter now we see that it's up and running perfect now we need another instance of the Linux command line so switch back to runpod and you're going to click connect to web terminal again okay next we're going to CD back into lava and then we're going to run this command so basically what this does is it spins up a model worker which will also download the model so the first time you run this it will take a little while I think it's about 20 plus GB of model storage needed I'm not going to read this whole thing off I'll just include it in the gist which is going to be in the description below hit enter and there it goes so now we can see that it's downloading the model now you see these errors on every single line that can just be ignored maybe it's possible that they just use the wrong log level I'm not sure and once this is done downloading this is also the server that's going to be serving the model so you do need to leave it open so we're going to have the control controller we're going to have the model and we're going to have the gradio servers running okay now it's done and it's loading the checkpoint shards which means it's just loading the model now it's downloading something else okay there it is it's working and we have uicorn running on Port 40,000 so we have the controller up we have the model up now switching back to runp pod we need one more terminal open that up again we're going to CD into lava then we're going to run these two commands export gradio server name and then we put the local host and then the port as 3,000 and then hit enter now it's going to be listening to that Port okay and the last thing we need to do is spin up the gradio web server so we do python 3-m lava. serve. gradio web server-- controller then Local Host and we want to make sure that the Local Host is on Port 10,000 and then D- model- tier Das mode space reload hit enter okay here it is looks good to go so let's switch back to runp pod and then we're going to click this port 3000 button right here here and there we go we have it all loaded up so we have the model name right here in this dropdown and let's use an example image so if we click this here's the image what is unusual about this image send the unusual aspect of this image is that a man is ironing clothes while standing on the back of a moving car this is not a typical scene as ironing clothes is usually done indoors in a more controlled environment so there you go now you have a completely free completely open source chat GPT Vision alternative now of course I'm paying for runp pod but you can definitely get this installed on your local machine if you have Linux and I believe they're working on Windows and Mac versions there's a Mac Fork that's out there I tried to use it but I couldn't get it to work and they do mention Apple silicon in the documentation although I couldn't get it to work either let's try another image so I've uploaded this image that I had on my computer and this is a cell a human cell and what we're going to say is what is this image about Define all of the terms listed in this image send the image is a detail illustration of a cell showcasing its various components and functions and there it is all of the different terms and parts of a cell listed out and defined and if we look at the original image it doesn't say what any of these terms are so it actually can read the image and defines the terms for us now one mistake that I see is that it got in an infinite Loop and it continued to list all of them over and over again so it's not quite perfect but I'm sure it's going to get better very quickly and so that's all I'm going to show you in terms of examples you see that it works it works really well I've tested it with a few examples but this video isn't about testing the limits of this model if you want to see that video and you want me to push the limits of the lava model let me know in the comments below now you know how to do it this is so cool and if you don't want to go through the process of setting it up on your local machine and you don't want to set it up on runpod there's a demo which you can use and it's absolutely free I'll drop a link to this demo in the description below and I want to really thank Ashley Kay who helped me take this across the finish line and get the gradio piece up and running so thank you so much for spending a few minutes with me and getting this working and right after I finished recording this video Ashley K created a oneclick template for runp pod and I'll drop that link in the description below and it has his affiliate link so thanks again to Ashley K if you liked this video please consider giving a like And subscribe and I'll see you in the next one

Info

Channel: Matthew Berman

Views: 69,893

Rating: undefined out of 5

Keywords: llava, llava ai, llava llm, chatgpt, openai, chatgpt vision, gpt vision, ai, artificial intelligence, image-to-text, image to text, visual instruction tuning

Id: kx1VpI6JzsY

Channel Id: undefined

Length: 8min 26sec (506 seconds)

Published: Thu Oct 12 2023