Using Ollama To Build a FULLY LOCAL "ChatGPT Clone"

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm going to show you how to build chat GPT from scratch using any open- Source model that you want olama is the easiest way to run large language models on your computer and build incredible applications on top of them olama Powers you to run multiple models in parallel it absolutely blew me away when I first saw it so I'm going to show you that too so let's go so this is the olama homepage ol Lama doai and all you need to do is Click download now right now it's only for Mac OS and Linux but they are making a Windows version and it's coming soon but you could probably use WSL for Windows to get it working on Windows if you want to use it right now so just click download and once you do that just open it up that's it you're done and once you open it up you're going to see this little icon in your taskbar right there that's it that's how lightweight it is and everything else is done through the command line or encode itself so if you click over to this little models link you can see the models that are available and they have all the most popular open source models right now here's code llama here's llama 2 mistol and they have a ton so go ahead and look through it here's Zephyr here's Falcon they even have dolphin 2.2 mistol so they really do have a ton of great models that you can use and they're adding more all the time so now I'm going to show you how to run it through the command line then I'm going to show you having multiple models up and running ready to go at the same time and then we're going to actually build something with it okay so now that we have oama running in our taskbar all we have to type is olama run and then the model name that you want to run and so we're going to run nral now I already have this downloaded but if you don't it will download it for you so then I just hit enter and that's it we have it up and running let's give it a test tell me a joke and look how fast that is why was the math book sad because it had too many problems so perfect and it is blazing fast and that is a function of both olama and mistol but let me blow your mind now I'm going to open up a second window I'm going to put these windows side by side and now I still have mistol running and now I'm going to use o Lama run Lama llama 2 and now I'm going to have llama 2 running at the same time now I have a pretty high-end Mac but the way it handles it is absolutely blazing fast so we have mistol on the left we have LL 2 on the right I'm going to give them a prompt that requires them to write a long response and do it both at the same time and let's see what happens okay so on the left I'm writing write a thousand-word essay about Ai and then on the right with llama 2 write a thousand-word essay about AI so the first thing is let's trigger mistl and then at the same time I'm going to trigger llama 2 and so let's see what happens all right on the left side it goes first and it is blazing fast it is writing that essay about AI on the right side llama 2 is waiting and as soon as it's done it starts writing it with llama 2 how incredible is that so it swapped out the models in a mere maybe 1 and 1 half seconds it is absolutely mind-blowing how they were able to do that so I had mistro run it on the left llama run it on the right and they just ran sequentially you can have four eight 10 as many models as you want running at the same time and they'll queue up and run sequentially and the swapping between the models is lightening fast and you're probably asking yourself okay that's really cool but when would this be useful well I can think of two use cases one just being able to have the right model for the right task is incredible this allows us to have a centralized model that can almost act as a dispatch model dispatching different tasks to the models that are most appropriate for that task and what does that remind us of autogen we can have a bunch of different models running with autogen all running on the same computer powered with o llama and since autogen runs sequentially it is actually a perfect fit for that kind of work and there we go there's two of them so now that you can see that you can have as many open as you want I'm going to close llama 2 and let's say we want to adjust the prompt of the system message we can easily do that let me show you how to do that now so I switched over to visual studio code and what we're going to need to do is create what's called a model file and so to start the model file we write from and then llama to and we're going to change that to mistl because that's the model we're using right now I click save and it recognizes this as python which is why you're seeing all those underlines but it's not Python and I'm going to leave it as plain text for now and then we can set the temperature right here so let's set the temperature to 0.5 and then we can set the system prompt and the one in the example is you are Mario from Super Mario Brothers answer as Mario the assistant only so let's do that let's see if it works so now that we have this model file okay so we're back in our terminal and now we have to create the model file and so what this is doing is it's creating a Model A profile of a model using that model file so it says oama create Mario DF and then we point to the model file and then hit enter and there we go parsing model file looking for the model so it did everything correctly then we do o Lama Run Mario hit enter and there it is up and running who are you I am Mario the assistant it's great to meet you how can I help you today tell me me about where you live okay so now it's going to answer as Mario and that's it and we can give it complex system prompts if we want and we can do all the other customizations that we want to do in that model file and another nice thing is olama has a ton of Integrations so here's web and desktop Integrations we have an HTML UI a chatbot UI we have all these different uis we have terminal Integrations we have different libraries including Lang chain and llama index and then we have a bunch of extensions and plugins so we can use like the Discord AI bot for example all of these are really really easy to use but I think I want to do that all myself let's build on top of AMA now so the first thing I'm going to do is create a new folder for this project so let's rightclick create a new folder and I'm going to call it open chat because we're making a chat GPT clone that's using open source models next I opened up visual studio code opening the Open chat folder so there's nothing in it yet but we're going to put something in it so we're going to create a new python file we'll save it we'll call it main.py and open chat okay so let's do something really really simple first we're just going to generate a completion which means get a response and since we're doing this in Python we're going to need two things we're going to need to import requests and import Json these two libraries all right and then we have the URL and it's Local Host because this is all running on my local computer and we're going to use port 11434 going to hit the API and the generate Endo we have our headers right here and then our data we're not going to use llama 2 we're actually going to be using mistol 7B and I think that's the right syntax we'll try it and then the prompt will be why is the sky blue just as a test and then we're going to ask request to do a post to the URL with the headers and the data we're going to collect the response if we get a 200 we will print it otherwise we're going to print the error let's see if this works I'll save and I'll click play and let's run it all right mistal 7B not found so I think maybe if I just delete that part and try again let's see okay interesting so it looks like it streamed the response because we got a ton of little pieces of it let's see how we can put that all together together now okay looking at the documentation it says right here a stream of Json objects is returned okay so then the final response in the Stream also includes additional data about the generation okay so we get a bunch of information and if we don't want to stream it we actually just turn stream false so let's do that right here I'm going to add stream false and then let's try it again let's see what happens oh false is not a string okay fixed it and let's run it again it looks like false needs to be capitalized okay push play and it looks like it worked that time here we go the sky appears blue because of a phenomenon called Ray scattering this occurs when okay there we go we got it absolutely perfect so I don't really want all of this additional information what I really want is just the answer so now let's make that adjustment okay so I made a few changes here first we get the response text then we load the Json then we parse the Json right here then we get the actual response the response from the model from this Json then we print it let's try one more time there it is perfect now we have the response okay now that we got the basics working let's add a gradio front end so we can actually use it in the browser and then we're going to make sure that the user can go back and forth and actually have a conversation all right funny enough I'm actually going to use the mistro model to help me write this code so that's what I've done I basically pasted in the code that I had and said let's add gradio and then let's also allow for a back and forth between the user and the model so it generated this generate response method okay so I moved a bunch of stuff into this generate response method including this data object and then the response comes through through here so everything is going to run through this generate response method from now on then we're going to actually open up gradio so we have gr gradio do interface and we're going to have this function generate response the input is going to be the prompt that somebody enters and then the output will be the function response so let's run run it let's see what happens then we launch it all right here we go there's the local URL with it running let's click on it we're going to open it up and here it is we have a working gradio interface let's make sure it works now tell me a joke there it is why was the math book sad because it had too many problems in just a few minutes we were able to build our own chat GPT powered by mistol this is absolutely incredible but let's not stop there let's take it a little bit further because I don't think it has any memory of the previous conversations that we've had so let's say tell me another one let's see if it actually works here so it's giving me something completely different now so let's make sure it has the history of the previous messages as many as it can fit in there okay so to do that we're going to store the conversation history and we're going to try to store as much as we can and fit it into the model and I'm sure there's better ways to do this but we're just going to keep it simple and just assume we can store as much of the memory as we want obviously it's going to get cut off when we hit that token limit so let's add conversation history right here as an array and then the first thing we're going to do when we go to generated response is append the conversation history so conversation history. append and then we're going to add the prompt then the next thing we're going to do is add a new line and we're going to join by this new line the conversation history and then we're going to add it to full prompt so it basically takes the entire conversation history and puts it in this full prompt we're going to pass in the full prompt now just like that and then the last thing we need to do is when we get the full response we want to add that to the history so down here when we get the response right before we return it we're going to add conversation history. append and then the actual response and then I'm going to save so let's quit out of gradio clear and then hit play there we are let's open it up all right now tell me a joke why don't scientists trust Adams because they make up everything very funny another one and let's see if it knows what I'm talking about now what do you get when you mix hot water with salt a boiling solution there it is now it has the history of the previous messages powered by open source model completely written from scratch by myself or yourself so now you know how to build with olama if you want me to do and even deeper dive and continue to build something more sophisticated out let me know in the comments if you liked this video please consider giving a like And subscribe and I'll see you in the next one
Info
Channel: Matthew Berman
Views: 187,557
Rating: undefined out of 5
Keywords: ollama, chatgpt, open-source llm, llm, openai, llama, mistral, llama 2, chat gpt
Id: rIRkxZSn-A8
Channel Id: undefined
Length: 11min 17sec (677 seconds)
Published: Fri Nov 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.