Run Your Own AI Chat GPT-3 On Your Computer

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right so I got a really cool project to show you guys which is building your own chat GPT or AI chat bot so let's get started so a couple of weeks ago Facebook made a launched their llama AI source code which allowed us to actually understand how it worked and how to use something like chat GPT but the problem was there was no weights or the model behind it which means we only had the code but we didn't have any other resources to actually operate the code fortunately for us uh somebody delete the code in March and then every day since the leak of the code there has been progress on making this very functional now jump into it I'm going to leave this article down in the description below if you want to read it yourself but it's very interesting on February 24th of couple like three four weeks ago made it announced that lamma and got released uh March 2nd someone leaked the Lama model via BitTorrent so I'm not going to leave a link for any of that but you could probably Google and you'll find the BitTorrent for the Llama model leak and then uh Georgie which is the code we're going to be using creates the Llama cpep Or C plus file which allows us to use the models and look at the time frame from March 2nd March 10th March 11th someone actually used that model and ran it on a Raspberry Pi 4 with 4 gigs of RAM which is very surprising because I needed a lot of ram just to run this on my main PC but four gigs of RAM he was able to get it to go March 12th um somebody was able to use this and put it on node.js March 13th it's just so quick look you can even get it on an Android phone uh on like within days that's how fast this thing got released and here's a little demo of how it looks like now there are a couple of models out right now which is the 7B 13B 30b and the 60b or 65b which is the billions of parameters that you're allowed to get so there are seven billion parameters is much more like gpt2 and the 13 billion parameters would be like cheap GPT three and then I would say 30 billion parameters is like 3.5 and then 65 billion parameters is like gpt4 you know kind of more like that because the more parameters you have the more responses you're going to get the more accurate it's going to be we're going to be playing around with the 7 billion parameter one just because all the other stuff takes much longer to process and more memory more CPU like tons of other stuff but going down the code for what Georgie has I will show you what I was talking about so if you take a look at this uh he also made a software where it will shrink the original size from 13 gigs to 4 gigs which makes it a lot smaller and I'm assuming that this process of shrinking it down actually took like 16 gigs of RAM for me just to run it on this computer so I'm assuming he did this the guy who put this on a Raspberry Pi 4 probably did this on another computer and just transferred over the miniature size version of the billions of parameters but you do have the 7B 13B 30 and the 65b all in its different sizes and different parameter sizes but yeah we're going to be following these instructions on how to get this going I also am going to leave a link to Simon Wilson's article because he leaves a lot of good information about how what this is how much it works uh what it does and actually instructions how to install this on a Mac M1 as well as a Linux PC to be honest but one of the things that he did mention which I thought was really interesting is the difference between what we're doing right now and chat gpt3 is that chanchi pt3 is tuned which means our inputs are getting absorbed they're recording it and then they're tuning the models to have a better response every time somebody uses it which the models that we are using are just like stagnant it's not tuned so you're gonna get different types of responses you're still getting responses from an AI just the responses are not going to be as complete or you have to ask it a different way many times to get the result that you want unlike chatgpt which is constantly being updated entuned where we don't have that type of infrastructure to do now being able to run an AI off a machine like this is pretty interesting in itself but you do need Nvidia GPU or something with Cuda or something that could actually process this type of information because when I ran this test on a VM it was very slow not even worth running on a VM because I didn't have a GPU on that all right let's jump into it first what we're going to do is get clones let me make this bigger so you guys could all see is that big enough first we're going to get clone this project so I'm going to grab that and go to downloads git clone and I'm just going to grab that and I did drive run on this computer before so I know it does work I did originally install this on Ubuntu but we are using Arch Linux on this machine so we're going to be fine next thing that you're going to need to do is go into this and run a make this is pretty quick I could actually put more CPUs into making this but it's not really needed takes a few seconds to run so you could just take keep an eye on the time and you'll see what I'm talking about how fast this is to compile and get everything up and going next thing we need to do is make sure we have the models installed so he would do list structure to the models I have nothing so LS models and there's nothing there but I do have it downloaded so I'm going to transfer that over I do have the 7B and I'm just going to grab that cut go into here go into models and just paste it here that's all you really need to do just paste the models in that folder all right once I'm done and close this out we need to install the python pip torch numpy and sequential piece or something sentence piece now I already did this and this takes a while actually this was longer than anything I had to do was installed uh the resources to get python running to allow it to use torch mainly you could see it's using Nvidia Cuda Nvidia CCL it's a bunch of other stuff but it does take a while to install this part which I already have pre-done so you don't have to sit through that and now we're going to do python three convert pth 2G gml which I'm just following their instructions model 7 B 1. I've actually seen them use one or two I don't know what the difference is but I'm just sticking with instructions and using one now this does take maybe three to four minutes and what it does it's it's converting the PTL file to ggml it's not shrinking anything it's just converting it from one language to another language this way I could convert it you're gonna also notice this is the part that takes 16 gigs of RAM or even more I know it's using my swap because I put 16 I have 16 gigs of ram in this machine it's maxed out and if I was to show swap on here it would probably eating up a couple of gigs of swap itself too fortunately this is the only time it actually uses a lot of the ram using the actual chatbot doesn't eat up that much resources a lot more CPU and GPU versus using a lot of memory this is the only process that takes a lot of memory because the whole thing has got to be dumped into memory uh also when you're converting the 13B you need more memory for that as well and then the 65b and so forth and so forth for this process to happen and this is one of the main processes to convert the models to a understandable State all right so now that we are done with this all we have to do is run the next command which is Quant Quant size I don't know how to say that word but yeah run this command 47b and it will shrink this down to the 3.9 gigs that he's talking about this process takes a little bit longer uh the first process took about two minutes this takes about five minutes you know what I'm just gonna move this on the opposite side so it looks a little bit better because we no longer need to follow this prompt because the next thing you're going to do is actually just run the model so we're gonna see over here it doesn't use that much gigs of RAM it's only the first time okay there you have it this actually took about two minutes to get this going but otherwise we're all set now because we are using the 7B versus the 13B or anything else we could just run their little program called chat okay script I'm gonna hit enter on that and we're gonna see look the CPU is actually going to be pinned to 100. remember it's going to be around like a nine gigs more or less and there we have it we have our prompt we basically could ask it anything we want at this point and it's um AI generated so who was the first man on the moon let's do something simple like that you can see it's gonna run that was Neil Armstrong and obviously that's correct so we know that it's pulling information and it's getting it correctly uh when did he arrive on the moon let's ask that question where it's referring to his previous answer there you have it so it gives you the date or the year that he landed on the moon but he knows that we're talking about Neil Armstrong because that's the first question we ask it so we're technically talking to an AI right now and it's responding correctly I didn't have to write the word Neil Armstrong again I didn't have to do a few other things but that's really cool but what Simon Wilson was talking about is that it's just not trained for certain things like this so if I was to ask him can you write an email telling my boss uh I need a two-week vacation something like this I don't think it's going to know how to respond certainly that's all it gives me it doesn't even give me the email it just says yes I could do it yeah that's it and that's usually like the extent of the response you could probably fine tune what you asked over here and probably have it forced to give you a response but in general this is the model that we have which is a 7B I would love to give this 13B a try which I don't have right now but that might result in a better response but for now it is very simple to get the AI up and running it took me maybe less than 10 minutes to get all this and filming a video at the same time now you could do what he has too so instead of using an interactive prompt you could also run the command like this which model to use uh how many threads I don't know what n stands for but the prompt that you want is over here and he is actually using this from 13B which I do not have so asking the same question might not even result in an answer so I'm going to paste this in here what are some good pun names for a coffee shop brand by beavers I'm going to hit enter and see if it responds it might not even respond to me there's the Northwest Lodgers coffee shop or perhaps Beaver Valley coffee okay that's it very interesting he gave me two answers which kind of makes sense but when you're using the 13B you get a much better response on actual coffee name shops anyway that is it I did have a lot of fun with this you could ask it a lot of stuff but you do have to kind of train it yourself like how I did um who was the first man on earth and then you've got to ask him more questions and they'll go into details with that when I was testing the model as far as now goes it's still not as good as Chachi pt3 because they are using a lot of people's information to build their models out but it is something we could play around with and not have to have the restrictions of the API or paying more money or yeah anyway let me know what your thoughts are if you have any questions about this hit me up down in the comments below or on my Discord and if you guys are new to this channel consider subscribing and also hitting that Bell notification icon so you know when the next video is going to be out and I say my nerd cave Hector hurts
Info
Channel: Novaspirit Tech
Views: 30,201
Rating: undefined out of 5
Keywords: novaspirit, tech, llama, gpt-3, llm, large language model, leaked, meta llama, chat gpt, chatgpt-3, llama.cpp, georgi gerganov, simon willison, leak, llama models, gpt, how to, tutorial, github, git, ai
Id: EgoHtsOgZhY
Channel Id: undefined
Length: 12min 19sec (739 seconds)
Published: Sun Mar 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.