100% Offline ChatGPT Alternative?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

the chatbot you see here is running 100 on my local machine without any connection to the internet not only that I can link it to my local files so the GPT can use them in helping formulate its responses to my questions and the best part is it's 100 open source that means the code that was used to train it the data it was trained on and even the model weights are free for you to download and to use for commercial applications in this video I'm going to show you how I got this set up to work on my own machine so that you hopefully can do the same I'm also going to talk about why open source specifically for large language models is such a big deal we're going to be using H2O GPT which is the open source python library to run these models now I'm a little biased because I work at H2O but this in my opinion is one of the best open source chat bot models out there as always if you enjoy this video give it a like subscribe and leave a comment below so that'll increase the video's reach it really helps me out a lot now before I show you how I got this set up locally I'm going to link a few things in the description here that will allow you to test out the chatbot before you actually download it and try to get it working on your machine so you're sure it's something that you want to install this first link is similar to the UI that we're going to be running the models locally you can also turn on a dark mode you can see here I asked it for some suggestions of activities to do in New York and it provided the answers the second link is just a different user interface and you can see it looks very similar to other chat Bots that you might be used to working with the back end of this version uses the chat UI interface provided by hugging face and the repo that I'll link in the description I'm just going to paste the same question into this user interface and you can see that the answer comes across like this and you also have on the left side a history of all your conversations that you've had bad all right so we're ready to get started loading this code on our machine so we can get it running locally we're going to start with the H2O GPT GitHub repo you can see here it says it's 100 private chat no data leaks and it's released under Apache 2.0 there's also a link to the H2O GPT paper that was released with this code you could see that it links to H2O GPT which we're going to install and it also talks about llm Studio which is a code base that's used for fine tuning models and it was used to fine-tune the models that we are going to be running today now in addition to the GitHub page there's also an H2O hugging face page hugging face website is almost equivalent to GitHub when it comes to hosting models and model weights so if we scroll down here on h2o's hugging face page we can see that there are a bunch of different GPT models that are available for us to run it might be a little confusing at first for you to decide which one to actually run on your machine but the name of each model actually gives you a lot of information about what the model is and why you might want to use it so I've clicked here on this model which is the one that I'm going to run here on my local machine just to break down what the naming convention means this is the H2O GPT model it's the GM version which means it was trained by the kaggle grand Masters at H2O it was fine-tuned on the open Assistant data set you can also see that data sets here down to the right here we can actually see the data it was fine-tuned on and read a little bit about the data set itself this data set was designed to be an assistant style conversation Corpus of data en means that it was trained on the English subset of this open Assistant data set 2048 is the number of tokens in the context size of this model that essentially means how much information can the model take in at one time before giving an answer by comparison the GPT 3 Model is about 4 000 tokens of context length and gpt4 model is about 8 000 and then finally we're running the Falcon 7 billion perimeter model we should probably talk a little bit about what a foundational model is the model that we'll be running is based off the Falcon models they've provided two different versions of this model one with 40 billion parameters and one with seven billion parameters the really nice thing about these models is that they're completely open source and they're transparent about the data that the model was trained on by going to the hugging face page of this model you could see that they say that this is a raw pre-trained model which should be further fine-tuned for most use cases and that in fact is why we're using the H2O GPT version of this model because it was fine-tuned to be a conversation box when I showed you the examples earlier we were using the 40 billion parameter model and we'll be running the 7 billion parameter model on my local machine because the size of the model requires a larger GPU and my GPU just isn't big enough to fit the 40 billion perimeter model if you don't have a GPU you could always spin up an instance using a cloud provider if you wanted to and maybe I'll make a video about that in the future just let me know in the comments if that's something that interests you another thing I'd say about these foundational models is that things are moving really fast and new models are coming out it seems like almost every month the nice thing about using these H2O models is that the team that works on it is dedicated to testing out all these new open source models and fine-tuning them for specific tasks going back over to the H2O GPT GitHub repo if we scroll down we can see the readme document has instructions for how to install this on a bunch of different operating systems you can install it on Mac or Windows and we're going to be installing it on my machine that's running Ubuntu I already mentioned this before but if you want to run these bigger Falcon models you're going to need a GPU and specifically one that can run Cuda if you just want to give it a try there are some models that you can run using CPU mode like gbt for all and llama CPP and you would just have to follow the CPU instructions if you wanted to run it that way so jumping over here to my terminal the first thing we're going to want to do is git clone the H2O gbt repo now for me I already have that cloned so there's a directory called H2O GPT but for you it would just download the code into whatever directory you are in now we can CD into that directory and I'm actually going to run a git pull just to make sure I have the most recent version of this repo this code base is an active development so a lot is changing each day if you run a get pull you'll pull the latest version of the code base the next thing you'll need to do is install the necessary packages to make sure that H2O GPT can run whenever I'm working on a new python project that involves installing specific packages I like to create a brand new environment where I install those there are different package managers out there but I like to use conda you can install mini conda or anaconda on your computer if you don't have that already and that will allow you to create a new environment I do that by running conda create then I provide the name which is going to be H2O GPT and then I'll tell it the python version that I want which will be 3.10 now it's telling me the conda environment already exists so I'm just going to hit n here but for you that would create a new Anaconda environment and then we can activate that environment by running conda activate H2O GPT to install all the necessary packages is we are going to use pip by running pip install and then Dash R the requirements.txt file to be clear what that's installing we can look in this requirements txt file here we can see all the dependencies that are required in order to run H2O GPT and following the instructions from the readme document you can see that they recommend also installing this extra index so we're just going to paste this whole line and run it in this content environment now that that's done all the required packages have been installed another thing you want to check is to make sure that you have Cuda installed I can check that by running Nvidia SMI and it'll show all of the gpus I have on my machine and that Cuda version is installed now we can test running this model by running python generate and there are a few arguments we're going to need to provide so that H2O GPT knows which model and how we want it to run so we're going to provide it a base model and that's this version three of the Falcon 7B model so I'll paste that in here we're also going to say score model equals none we're also going to tell it the prompt type which is going to be human bot and for our first test we're just going to run the command line interface version now before I hit enter on this if this is your first time running this command it's going to actually download the model weights to your computer which can take some time just to show you if I CD into my cache directory under hugging face Hub in this directory we could see that all the models that I've tested out have been downloaded and I can enter into the directory with the model that we're going to be running now and by running du-sh we can see that this model has a total size of about 14 gigabytes so back to this command that we're running I'm just going to hit enter on it and you can see that it's loading the model weights into my GPU and we actually get an error here and this is a out of memory error that's because this model is too large to load on its own into my GPU on this computer but luckily there are some tricks we can use to get large models into GPU of certain size memory and there's a whole section in the documentation about how to do this but essentially what we're going to do is add to this generate call a load 8-bit version equals true and this will load the model into a GPU memory in a more efficient way if I create a new tab here I can show you that it's now loaded into my first gpu's memory and it has 9.2 gigs of memory being used there may be an impact to the quality of the model responses when you run this quantization 8-bit or 4-bit on your models since this is a conversation bot we can ask a question like what is the purpose of having large GPU memory and you can see it's providing us a nice answer for why you would have large GPU memory so it did a pretty good job at answering that question now I'm going to hit Ctrl C to exit out of this and we're going to run the graphical interface version of this model now one thing I like to do since this line of code has gotten pretty long and there are a lot of commands that we're running I like to create a script that has all these commands in it I've turned sharing off and turned the offline level to one this will ensure that the model is running offline and you can see that the model and the other settings we have are the same I've just removed the argument that disabled the UI so if I go back here and run this script and we can see that it's running on our local machine now on Port 7860. so if I click on this or open this URL in a browser I'm taken to this page which looks very similar to the H2O GPT site if you went to it online except for it's running completely on my local machine now we can see that it also tells us which model that we're running which is the 7 billion parameter falcon model I can go ahead and enable dark mode and let's ask it a question I'm going to ask it why open source llm models are powerful and why are they important open source models are powerful because they are freely available to anyone to use allows for Rapid development and it looks like it's running fine it's a little bit slower than the online version because my GPU isn't necessarily the fastest and the answer is done like I said before this user interface is undergoing some rapid development so it may change but one of the nice features I like about it is it's integrated Lane chain which allows you to actually import data sets that you might want the model to use in order to provide a better answer a nice example that I tried before was asking the model what is the fastest roller coaster in Pennsylvania you can see it gets confused here and it says the fastest roller coaster in Pennsylvania is the steel Vengeance but then it says that States in Ohio so we know this isn't correct and to be fair if I ask this same question of chat GPT 3.5 it gives the wrong answer to it says it's Skyrush which is in Pennsylvania but it's not the fastest roller coaster so I found this recent article that does mention the fastest roller coaster in Pennsylvania is this roller coaster called Phantom's Revenge I'm going to take this article and save it as a PDF to my computer then I'm going to reset H2O gbt with this Lane chain mode set to my data now that it's reset I can go to this data source Tab and I can click here to upload a file and click add files to user data what this is doing on the back end is creating a vectorized database entry with this coasters PDF so now this data source can be used by the llm to help answer my question now if I ask the same question again we could see that the answer is correct now it gives the correct roller coaster it also links us to the file where it found the result this still is an experimental feature in its ability to search really large sets of data is limited but at least for this example you can see how it might work for a future application I just want to address the question that you might be thinking which is why would anyone even want their own open source large language model I don't necessarily think there's one answer to that question but there are a few things that you should keep in mind the first one is privacy whether you like it or not anytime you type in any question to a chat bot that data is being sent somewhere and potentially stored and used there have been reports about information leakage from people using chat gbt and by using a private open source model you can ensure that your data stays with you and the second reason is what really excites me and that's the ability to customize and control your mod model right now you have a bunch of options for models that are good at answering almost any question the great part about open source models is that you're able to take those model weights and fine-tune them on any particular task that you're interested in you may have already seen this where models have been trained specifically for health care or for financial advice using an open source model and specifically one that allows commercial use opens the door for developing a wide range of these custom models and really I think this is where we're going to see the most advancements in large language models in the years to come and the third reason is transparency large language models like most models may have some biases they also can be overconfident and just give wrong information open source models still have these problems but at least you know exactly the data that the models were trained on and how the training was done thanks for watching this video let me know what you thought of it and I'll be back again soon with another one

Info

Channel: Rob Mulla

Views: 9,542

Rating: undefined out of 5

Keywords: chat gpt, chatgpt 4, chatgpt alternatives, free chatgpt alternatives, chat gpt free, free ai tools, privategpt setup, h2ogpt tutorial, h2ogpt install, h2ogpt falcon 40b, h2o gpt vs private gpt, h2o gpt langchain, local chatgpt, offline gpt, open source chatgpt, open source llm

Id: Coj72EzmX20

Channel Id: undefined

Length: 16min 1sec (961 seconds)

Published: Fri Jun 30 2023