LLM Chat App in Python w/ Ollama-py and Streamlit

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
olama recently announced Python and JavaScript libraries which will make llms more accessible to more people and expand what's possible to do with just a couple lines of code so to explore that today I'll be using the AMA python Library as well as streamlet to build an llm powered chat app I'm going to start by briefly walking through what each of these libraries are before moving onto the app itself now if you're more comfortable with JavaScript don't worry because I'll be doing a similar video using the AMA JavaScript library soon in the meantime let's go but first I think we should understand what's wrong with using the command line interface AMA gives us a really nice interface that does everything that we needed to with regards to interacting with our llms so why use this Library instead now I would say bash is really nice for civil interfaces but by and large it's not how most people are going to want to interact with llms you could use bash to write your tools and certainly there are plenty of people that do shell scripting professionally but it's pretty inconvenient for the average programmer some great uis for ama's API exist like the open web UI which I covered in my last video however even these as good as they are still don't give programmatic access to the underlying API and the olama API itself is great especially if we're doing just web requests however I don't really want to have to manage you know the state of any of my fetch requests or deal with manually adding types to the request responses after I get them back and so that brings us to the olama python Library it's a python interface that basically just wraps the AMA API into the hood as well as provides types for us so I have it GitHub page pulled up and if we look at the actual code we'll see that it provides types which essentially just tells us um you know what is the response that we're getting are they integers are they booleans are they strings and then for the client itself um we can see it's only a few hundred lines of code but if we look at one of the methods that we're familiar with for example the generate method um essentially just giving a prompt and generating a single response um under the hood all that it's doing is just making a request right we're seeing it's posting to the API generate endpoint and then it's passing all the same parameters that we are already familiar with and we can see that's also true of chat um embeddings which we haven't seen yet but we can even pull models through this Library as well and it's still just using the API under the hood but the benefit of this is it gives us access to the entire python ecosystem so we can really easily do IO web requests and we also have access to all kinds of tools and libraries uh that we wouldn't otherwise if we were just using the command line so in order to begin let's start by setting up our environment so the first thing we want to do is confirm that our python version is over 3.8 for that we can just do python D- version and we see python 3.8.8 that's fine so if you've worked in Python before you'll know that python dependency management can be a bit of a pain so we're going to be using a tool called vve to help us manage our dependencies and just scope them to our single project so we're going to do python DM which stands for module um we're going to use the ven command which helps us manage virtual environment ments and then the last parameter here is just the location of where we're it's going to be storing those files so in this case I'm storing it in a directory called doben so that's going to create our virtual environment and then we're going to activate that environment by doing source and then the path to our virtual environment it's binary folder um and then there's a an executable called activate and that will activate our environment for us we can tell it's active because our environment uh is now displayed right there in our terminal finally we're going to install AMA and we're just going to do python M again and then pip uh which is the python package manager and then install AMA great and now we're ready to go now that we have the olama python Library installed let's walk through a couple of the major functions that we'll be using in our app today before we begin I wanted to note that you should have Ama running um in your system before we start and also if you're new to Ama um I would recommend that you watch my beginner video which I'll link above but if you're not this should be review for you so to begin let's go into our terminal and open up a python session and then we import AMA great so the first thing I want to cover is listing the models for that we just do ol. list and it's going to give us literally just the API response right so this is some really ugly Json um I wrote a little bit of uh code just to clean this up uh now this is an array comprehension so what I'm doing is I'm saying um for each model get me the name attribute of that model for model in um the iterable. list and then from that response I'm just getting the models attribute of that and so that should just give us a list of all the names of each model returned by. list next up I wanted to cover the show method and the show method is how we view the model file for any model so for that we just do .show and then the name of our model so I'm going to use mistol but you can use any model that you have on your system and it gives us back um a lot of stuff we see that the majority of it up here is the terms and conditions and so I wrote a little comprehension just to filter that out so here I'm saying give me the key and the value for key and value in .show Mistral um and then items is how you essentially turn a dictionary into key value Pairs and then I'm just omitting the license key specifically that gives us a much shorter response and here we have the template um oh excuse me so we have the model file um as well as the template and some other details some sort of like meta level details about the model itself and finally I wanted to show off the chat method so the chat method is obviously the primary way that we're going to interface with any of our models um and so for that where you're saying. chat and then this is all the same parameters that we would pass in through the API so we specify model in my case mistol and then because this is chat it expects an array of messages where the shape of a message is a ro which is either user system or assistant and then your content is whatever the prompt happens to be and then I'm also passing in stream equals false so we're essentially just generating one single response rather than doing it one word at a time okay so I'm asking it why is the sky blue and let's see what it says okay and so this is what the response looks like again it's just exactly the same as what we're going to get back from the API um so what we actually want is in the message uh object and then within that the content so we could even destructure that and do response and then message and then content okay and that's all that we need to know for now in order to get started with our app so now that we know how to use the olama python Library it's time to put it to work but before we dive into the app I'd like to invite you to like And subscribe now to power our chat app obviously know that we're going to be using olama but I also mentioned a tool called streamlet I'm here on their website and streamlet is essentially a python library for creating user interfaces it does a lot of magic for us um and so as we proceed through this just expect that some of the stuff isn't going to be explicitly written out um but streamlet really does make it easy so it runs a server for us um it manages the UI it manages State updates and it even provides us with a set of components that we can just drop in um to build a really greatl looking app and in fact it even has components specifically for chatting and so we're going to be using some of these right here um in the app that we're about to build so while I will be explaining some of the methods that we're using from streamlit I won't be doing a deep dive on this whole Library so I encourage you to come back and look at the documentation yourself so the first thing we'll do is start by installing streamlit again that's just going to be python M pip install and then streamlet should take a minute to download and then we should be ready to go in the meantime I've created a new file up here um in my IDE that's just called chatore app.py but of course you can name it whatever you want to so so now that we're in our file we can start by importing AMA as well as import streamlit as St it's common to Alias stream L just to something shorter you don't really need to um but this just makes us type less code okay so just to get a feel for how streamlet Works um I'm going to type in St title and then you know AMA python chatot now streamlit gives us a whole bunch of methods which again you can read through the documentation for but this will just set the title of the page um and give us a UI this will give us something that we can display so now we can come down to our terminal and just run streamlet run chat app and this will start a server and give us a page that we can actually view so here we are we're on Local Host Port 851 and this is the title that we just generated so that's really cool it's already working so now that we have the basic app running the next thing that we're going to want is to have a user prompt so the way I'm imagining this is we're going to have our title at the top um and then we're basically going to try and replicate the chat GPT interface so we're going to have the ability to select the model that we want to talk with since AMA provides us with a lot of different models and then we're going to have the chat history where it's going to be you know the user prompt and then the assistant response and at the very bottom we're going to have the user input so in order to do that we're going to use um a method on streamlet which is just called chat input so here I'm assigning a variable prompt and I'm doing that using the walrus operator essentially this allows us to um assign a variable at the same time that we um use it in an expression so we're kind of saying if the output of this prompt is not nullish or is not falsy then continue with the block so once we have our user input then we're going to um want to render that input back into the UI so we're saying with streamlit chat message and we're saying saying that the person generating the chat message is the user now streamlet um because it knows that it's going to be used a lot for chat applications it natively provides us like automatic default Styles uh based on if it's a user role who's delivering the message or if it's an assistant role or an AI who's delivering the message you can also pass in arbitrary strings but we get sort of styling for free if we just say user or Ai and what we're going to render into this chat message block is just markdown with whatever the user's prompt is so we can save this and go back to our app and you'll notice that Streamlight even picks up when the file changes and so we can just rerun this and here's our prompt bar right it's already styled as an input we can say nothing much and there you go once we hit enter then it's rendered back um into our interface and we even see the little user icon here because we said the chat message was from a user okay so now that we're able to collect user input and render messages the next thing that we're going to need is a message history so we're in order to do that I'm going to use um streamlet session State um essentially this is just memory that streamlet will know that it needs to manage kind of in between renders and in between State updates um and so the first thing we want to do is we want to say if messages is not already a key within the session State then we're just going to add it and we're going to say session state of the messages is just an empty array and as we go along we're going to push new messages into that array so the next thing that we want to do is use this history of messages to uh render them back to our screen so we're saying for message in messages and then again we're using this with chat message block um but the exceptional thing here is we're saying instead of passing in user or AI directly we're actually saying let's read that message's role and then finally we're going to render the message content so as you sort of inferred the shape of each method the shape of each method method so as you can infer the shape of each message is just going to be an object with both a role and content which we're reading from here and then finally um every time that we get a new input from the user we just need to append that to our list of messages so here I'm saying State session State messages append and then I'm actually constructing that message uh that I told you about earlier so we're saying the role is user and the content is what whatever prompt is that we got from our input so now that we have a message history let's save our file and try it out so we can rerun hello how are you and there we go our history is working now you might be saying this is great we have an interface and all but this isn't very interactive I'm just chatting with myself so now I think it's time to add the AMA chat using theama python Library so essentially we want the olama uh model to respond to us whenever we provide it with a message so we're going to be in this if prompt block um after we display the user prompt we're going to come in and we're going to start a new UI block uh with the chat message from the assistant next we're going to use the AMA uh Library using the chat method just like we had tried out before um and we're going to get a response from that now the response is going to be in the same format as the API that we saw so we're going to need to destructure that and clean that up a little bit um but let's go through the arguments one more time so our model is mistal our messages are just the same as we're keeping in our session State and finally I'm saying stream false what this means is we're going to wait for the entire response to be generated by the model before returning anything and here's that destructuring I was talking about so from the response body we're taking the message attribute and then the content from that next we're just going to render that to the screen with st. markdown again and then finally we're going to add that to our list of messages so we're appending to our state messages and this time we're going to be using the role of assistant and the content is just going to be whatever the message was so let's save this and try it out run okay and so here we see that we have this little robot icon because again streamlet is kind of aware that there are conversations between humans or users and and AI or assistance we asked it to tell us a joke and boom here we have the whole response that looks pretty good right something that I think would take this to the next level is if we were able to sort of use the power of oama and choose which of any of the Myriad models that we have on our system that we want to interact with so in order to do that I'm going to basically just create a little dropdown that we can use to select the menu with so here I'm just going to say init models this is going to look pretty similar to the messages above model not in st. session State st. session State model and for now we're just going to leave this empty next I'm going to create our list of models that we can use to display to the user I'm going to borrow the code from our earlier example and this is just a simple array comprehension so I'm saying the models variable is is equal to the model name for each model in the response of AMA list which again lists all the models and then from there we're just taking whatever uh lives in the models key from that and so this is just going to be a list of all the strings of the model names and finally I'm going to say st. session State model is equal to st. select box um and so this is a UI element that is going to return the value to us um so we give the option or we give the user a prompt which is choose your model and the options that we give them are everything in our models list and the last thing that we need to do is copy of the state and wherever you were hardcoding our model now we're just going to replace it with um this okay so let's save and rerun great so now we have this element right here it let's just choose from all of our models we can tell that you know these are the models that I have in my system now we can choose F and that should work just fine the final piece that I'd like to add to this is to make the UI a little bit better where instead of having to wait for the model to generate its entire response before displaying anything to the user i' like us to be able to stream each word at a time as it's generated by our llm so in order to do that I'm going to introduce A New Concept potentially to you uh called generators so generators are a python concept that I can demonstrate right now um where essentially it allows you to make a function into an iterable that means every time you call the function it yields something new uh to the output so by way of example if you're familiar with the range command that just returns a range of numbers X4 X in range five this is going to print the number 0 through four let's see if we can use generators um to create our own range function so I'm going to say def my range and then n is going to be the number that we go up to um I'm going to initialize a variable xal 0 and then while X is less than n I'm going to yield that number X so whatever we yield is essentially what's going to be returned for that pass and eventually this function or this generator will stop yielding um and that's how the calling function knows that its response is done um and then to complete our logic I just need to increment this by one and so now if I do X forx in my range this should give us the exact same output voila and so this is just a toy introduction to generator functions and the yield keyword so back in our code base um I'm going to create our own function which is going to be called def model response generator and it's not going to take any arguments and we're going to start just by um using the AMA chat interface again right so we're using the same model from our state the only thing that's different is we're going to be passing stream equals true here and so what that means is instead of getting one big message we're going to be getting a series of smaller messages that all actually have the same shape um and so while this itself is a generator and we can use it as such um I just want to parse the output so the output of each yield statement is just one single string instead of this bigger object that we then have to destructure so I'm just going to say for Chunk in stream yield chunk message content and so for each chunk this should destructure just the human readable text which is what we really care about so from here we're going to take our generator name and we're going to come down here to where we're generating the assistant response and I'm going to replace um just going to get rid of our original AMA chat because that's being uh that's being done by our generator method and I'm going to replace this with streamlit do write stream because they anticipated the use case where we would want to stream a response back to the UI we going to pass in our method and then we no longer need to render the markdown uh because this right stream is actually already doing that for us um furthermore this message variable now is actually going to be uh the concatenated response from the streamed output and so we can still just pass this message um back into our list of messages so if I save this go back to our UI I rerun it I'm going to choose mistol and I'm going to say tell me one fun fact about llamas okay and so there's always going to be a base amount of time that it takes for our computer to uh load the llm model into the GPU but we can see that now our response is being streamed that was one word at a time instead of waiting for the entire block to be generated and with that that just about wraps up our app so in only a few lines of code let's take a look at this so in only about 40 lines of code we were able to create a really good-look UI um we're able to stream model responses so our users don't have to wait we have interactivity we have message history and we even have the ability to choose which model that the user is interacting with I think that really shows the power of bringing oama into the python ecosystem um and I think it's also worth noting that now that we're in Python we have access to all kinds of other tools so if you wanted to chat with documents if you wanted to load files if you wanted to make web requests if you wanted to save things to a database python gives you the ability to do almost anything and in future videos I'm going to be exploring almost all of these possibilities with a specific focus on rag applications which I know a lot of people have been asking for and before I go I wanted to share that all of the code for all of my videos is now going to be on my website decoder Dosh so each uh video's web page is going to be linked in the video description and all the code will be available which you can copy and paste it has syntax highlighting um I think it looks and works really well so please enjoy that finally I wanted to share that I'm celebrating having just past 2,000 subscribers um I'm brand new to this and I really didn't expect uh anything like this from happening so thank you all for joining me in this journey um I also really appreciate everyone who has been thumbs uping and commenting on my posts uh I want to let you know that I read every single comment and it's those comments that makes each subsequent video a lot better so thank you again
Info
Channel: Decoder
Views: 5,795
Rating: undefined out of 5
Keywords: ollama, machine learning, large langauge models, streamlit, python, chat app, LLM
Id: ZHZKPmzlBUY
Channel Id: undefined
Length: 22min 28sec (1348 seconds)
Published: Tue Feb 27 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.