How to Build a Fake OpenAI Server (so you can automate finance stuff)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

this is how to build a fake open AI server with llama CPP so you can automate Finance tasks with AI on your desktop computer open AI is obviously hugely popular but what if you wanted to run your own AI locally without being tied to using their API well llama CPP allows you to do exactly that with it you can run some of the most powerful large language models like mixt and llama but the ker you can run them on just about any old computer without the need for a GPU but it gets better deep inside the Library there's a function that spins up a server that mimic open AI this means you can swap out open AI for llama CPP and run llms in your apps for free I'm going to show you how to do it with a little Finance flare but will it stack up to the Speed and Performance of gp4 and what about more Advanced Techniques like function calling and multimotor models I've been working on this for weeks building prototypes while traveling integrating with the API running different llms and coding over 831 lines of experiments I'm going to break it down in just five steps and it begins with starting the Sero get a lad So today we're going to be build building our fake open AI server and that's going to allow us to do a ton of amazing large language model things now don't be afraid of the code that's on the screen these are actually the steps that we can use to get this up and running and it's eventually going to be on the GitHub link below so the first thing that we're going to need to do is set up llama CPP cuz this is the backbone of everything that we're going to be doing so I've got this first command here where we're going to clone it down it is from this site so if you do get stuck just know there's a whole bunch of information it's really well supported and there are a lot of people using llama CPP right now so you're in good hands if you're thinking about using this okay so the first thing that we're going to do is inside of the folder that we want to work in we are going to be cloning LL c p so let's make this a bit bigger so we're going to get clone this let's not make it caps and boom so get clone and then the link to our llac CPP repo so if I go and run that that's going to clone it down and then next what we're going to do is we're actually going to build this package so to do that we need to go into that folder so right now if I type in LS you can see that I have my llama CPP folder there and if I open it up you can see that I've got it there now as well what we actually need to do is make that Library so this is because it's built inside of C++ hence the CPP in llama CPP and if you actually scroll on down there's the installation instructions which come here so over here you can see on a Mac we just need to run make within the folder on Windows it's a little bit more extensive highly recommend you use Windows subsystem for Linux if you're going to be running it on Windows for now I'm running it on my Mac I know it's a little bit of change from the usual we're going to hit clear and then what we're going to do is we're going to go into that folder so CD Lama CPP and then we're going to run the make command so if I just type in make here this is actually going to go on ahead and build it so we'll be right back as soon as that's done 5 minutes later all righty so that is now done so looks like like we're all good we can clear that now and then what we're going to do is we're going to jump out of it and so if we actually go and take a look in our installation command so we've now gone and completed step two we can now go and install the python libraries that we're going to need cuz we're actually going to interact with our fake open AI server using python so we can actually copy this command here so pip install open AI so we're going to be using the open AI library but we're really going to be faking it so this is going to be free you don't actually need to pay for tokens so on and so forth but you get to use use amazing llms ridiculously quickly it's it's insane how fast this is uh and then we're going to need the Llama CP python Library we're going to need pantic for a little special project later on we're also going to need instructor and streamlit streamlet is going to come in the next part so let's copy this over and then I'm just going to paste that at my line boom this going to install all the python libraries that we need and I've got them installed already so they're installing pretty quick if they do run a little bit slow for you that's perfectly okay um and and then what we're going to do is let's take a look at our next step we are going to start our server so this is actually going to spin up our fake server literally that quick and we're able to get this up and running so if we wanted to just get it running we can run this command here so I'm going to copy this and I'm going to clear my command Lin so I've got a bit of room so it's python DM I'll actually write it out for you so let's go and write this out so the full command is python DM and then we want to run llama CPP that's lisore CPP do server and then we need to specify which model we actually want to load in so to do that we can pass through some commands or some Flags so I'm going to pass through D- model and then we need to specify the path to the model now I've gone and downloaded some models already and I've got them stored in here I'm going to include all the links to the models that I've used inside of this markdown file so over here I've got mistro we're going to use mistro to begin with and we specifically using the ggf model so these allow us to load these into a regular old um computer in this case I'm using my Mac you could also use a machine that's empowered with Cuda but over here we've got a whole bunch of different quantized versions that we can go on ahead and use any of these will work I am using the which one am I using I am using the uh what is it the 4bit quantize model over here so you can see that I've got that file over there so we're going to choose the path to that so over here I've got all of those models inside of a folder called models let me show you what this looks like just on my desktop over here if we go into react so you can see I've got that models folder right there you can see cool and if I step into that got all of those quantized models so we're going to load these right so by using a quantized model it basically means that we're going to be able to load it into our regular machine and it's going to run a hell of a lot faster so we've gotone to read in Python DM Lama cp. server we now pass through the path to the model so we're going to specify models and then forward slash the model that we want to load so I want to load mistal uh if you wanted to use other models you can actually go and just dump those in or pass the path to those models as well plus a little bit later on we're actually going to be using a config file which allows us to load in multiple models at once okay so that is our Baseline model now if I wanted to run this using a GPU I just need to pass through one flag let let me run it without a GPU first and then we're going to load it with a GPU in terms of what actually looks like when you use GPU versus non GPU speed is going to be a huge Factor but you can see down there that is loading right down at the moment so we've got this little progress bar and take a look we can see that our server is now up and running so we've got our server running at HTTP col localhost 8000 down there but we don't want to run it on without a GPU so right now you can see that we haven't offloaded any layers to our GPU that's no bueno it's going to be slower than we wanted so let's go and load it up with using a GPU so I'm just going to rerun the exact same command the only thing that we need to do so you can see that we've passed through the model flag right up here the only other thing that we need to do is pass through uncore ncore GPU and set that to negative 1 so that's going to offload as many layers as it can to the gpus as many of the Transformer layers to the GPU that means it's going to run hell of a lot faster and take a look that's it on GPU now so if we scroll on up this is now using using my Apple M1 Max and it's running on metal and over here you can see that we have offloaded layers so it says offloaded 33 out of 33 layers to our GPU this means it's going to run way faster Okay cool so that is that but so far I haven't actually showed you how to use this we need to get some validity or some actual productivity out of this so let's do a little bit of coding so we're going to create a new file and we are going to call it um app.py for now keep it simple okay so what we need to do is we are going to build up a really simple script to actually go and run a prompt using what we just started up so right now we've got our server running you can see that it's running at this API down here so Local Host 8,000 how do we actually go and use that well this is exactly what we're going to do so remember we went and insort a couple of python libraries one of which was open AI um now we're not actually using open AI in any way shape or form we're not actually going you don't need an API key you're not actually using their service so this means that if you want to use your own data and you don't want to send that across the net this is perfect so let's do this so we're going to write from open AI import open Ai and then we're going to create a client so our client allows us to interact with the API that we just spun up so we're going to write client so create a client and then we're going to specify our client as open Ai and we're going to specify our API key and you're like Nick you said no API key you can literally just type in gibberish in there it doesn't matter cuz we're using a local in the way that we use our local instance is by typing in base URL and pointing it to our Local Host environment that we just started up so this means that when we go and communicate with our server we're actually not going out to open AI at all we're literally just using the python library that is it so it's completely open source we're literally going to our local server that we started up I know I'm keep hopping on about it but it's so important when it comes to data security all right so to go to our base URL we're going to type in HTTP col localhost and remember when we started it up down here it was running at Local Host 8,000 so we're going to point it to Local Host uh let's come back Local Host come back here 8,000 that keeps scrolling down and then we need to pass through really important slv1 so that means that when we go and use this client now we're actually pointing to our little server that we just spun up kind of cool right so it's not that hard to get up and running all right so that's our client but now we actually need to go and run something against this is as of right now we just got our client set up um let's just toggle on word wrap so we can see exactly what we're running so we've got and specified our API key G and specified our base URL we're now going to perform some chat completion so chat completion here so we're going to create a new variable called response and we're going to set that equal to client. chat. completions do create so basically this allows us to send a bunch of messages to our API our fake open AI server that we just spun up using llama CPP to do that we need to specify our model and right now I'm going to get you into a good habit right so we're going to specify which model we want to use which model we want to use and for now we're just going to say mistal but right now that's not actually pointing to anything cuz we haven't actually gone and set up aases but we will eventually so right now we're going to say that we're going to be using mistra I like as a good habit to just go and specify which path so but keep in mind that this doesn't really matter at least for now so that is the model that we are using here now we also need to pass through some messages pass through our prompt so how do we do that well we can specify another keyword argument and we're going to specify messages and we're going to set that equal to an array so you can see that I've gone and specified an array down here this is because we can send multiple messages and then to that we're going to pass through a dictionary and then our dictionary needs two keys so our first key is going to be the role so who is sending this message or what type of prompt it is and our role in this case is our user so we're just a regular old user sending a prompt to our fake open AI server and then we actually want to specify our prompt and our prompt is set in a key called content so our prompt could be just about anything right so let's say for example um we wanted to learn a little bit about I don't know um a finance metric so let's think of a finance metric return on investment so we might say what is Roi uh in reference to finance right so that's going to be our prompt so hopefully we should get this is all going to have a Finance theming in case you missed that so that is our prompt over there so we've gone and specified our role which is our user and our content which is going to be what is Roi in reference to finance we're going to save that so it's going to do a little bit of prettying of our code cool so we've now got and specified our response how do we actually go and print this out well we're just going to print out the raw response and then I'm going to show you how to pass it so you can actually see all the different variables in it so print it out cool yo all right now to run this we can just run python app.py so let's go and run this so I'm just going to create so we need to keep our server up and running so don't close uh this command prompt or this terminal when you've got that up and running we're just going to create another one here so I'm going to now run python my head Block in that yeah python app. so all things holding equal take a look so we've now got our response this is actually coming from our fake open AI server how amazing is that you're probably looking at this like Nick what on Earth are you printing out there's a lot of information here well this is the raw response from our fake open AI server but if you actually take a look a little bit closely this has actually gone and given us a response that is actually pretty relevant so over here we've got this content key which says Roi stands for return on investment in context of Finance it is a measure of the profit profitability of an investment it represents the ratio of net profit generate right now it looks a little bit messy right like so we've got a response but like you're probably like squinting going like what on Earth is that saying like it it's a little bit tricky to see so we can actually make that a lot clearer so we can Traverse this response so what we need to do is inside of here we've got a variable called choices here so we need to go into our choices uh attribute we then need to grab the first value cuz it's in inside of an array and then we need to go and grab our content over here and our content is stored inside of another variable cor message sounds complicated not that bad so we're going to go response do choices we're going to grab the First Choice which is index zero and then we let's go and take a look again we need to go so we've got choices we need to then grab message and then we need to grab content so we can type in message. content I'm showing you how to Traverse this cuz if you wanted to pull out any of the other stuff you'll be able to do that all right so if we go and print this out now we should hopefully get just the Royal response not all the other crap as well but you do get some other useful stuff right so you get the role of the assistant coming back you get whether or not any tool calls H have happened I don't believe that's working yet just full disclosure um we get when it was created the model that was used system fingerprint uh we get the number of tokens that were used down here we also get the number of prompt tokens uh we get the total tokens so these are the completion tokens so the response effectively plus our number of prompt tokens so a lot of information there um so if we wanted to go and get this other stuff then we wouldn't just Traverse the response uh so but for now let's go and rerun that so we can rerun it using python app.py hopefully we just get the response and take a look just like that you've gone and built your own fake open AI server and you've now got a response give yourself clap guys that is absolutely amazing if you got it this far brilliant let's read it out so Roi stands for return on investment it's a financial metric used to evaluate the profitability of an investment by calculating the net return generated from it expressed as a percentage of the initial cost or investment amount in simple terms it measures how much profit you make on your investment compared to how much you spent on it a high Roi indicates that an investment is generating significant returns relative to its cost while a low Roi suggests that the investment may not be profitable not too bad this is great but we can make this better and one of my favorite ways to make this better is to add streaming so we're going to keep building up on this so we're keeping it simple but we're going to go pretty hardcore um but I'm going to take a step by step and I'm going to give you everything you need to know so we're going to say stream equals to true and rather than just printing out this now right we actually are going to use a generator to actually print out each token as it comes and this shows you just how fast this runs now right so let's go and print this out or use a stream so to do that we're going to Loop through each Chunk we get back from our respon we go for Chunk in response then what we want to do is first up we want to check whether or not uh or let's for now let's just print out all of the chunks so I'm going to copy this entire response and we're going to print this out so this is the original one so print looking good and I'm going to comment this out cuz we don't need that so rather than it being response. choices. message it's now response. choices z. delta. content so that should affect itively print out our streamed tokens now we're going to implement one additional thing just to make it look a lot cleaner and this is effectively going to give us a full-blown stream on our command line which I think looks really nice so we're going to say flush equals true and we are also going to specify n equals it keeps moving n equals uh a blank value over there so effectively this should stream out our response now when we go and run this command so this is going to be streaming the response out okay beautiful so let's go and test this out so all things holding equal this should now just go and stream out our spon so it would be like effectively like machine gun out to our Command Prompt okay so let's go and run this so python app doop high again and we've got an error uh stream object has no attribute choices uh let's print out the raw stream maybe I've typed in something wrong uh let's print out the chunk you thought there would be no errors I like showing you the errors this is this is what happens all right so that's out that you can see it's streaming it's going super quick I've clearly just typed in one value wrong uh so we need to go so I've tried to go choices so it does have choices so choices zero and then we need to go Delta that should be oh I know what I've done wrong this should be chunk do choices not response it was all fine all right so chunk do choices and then we're going to grab the First Choice grab the Delta grab the content all right let's rerun this you can see it's definitely streaming so it's streaming out we just need to clean it up I'm going to type in clear let's run this now take a look how awesome is that so it's now streaming out and look how quick it is right it's ridiculously fast um and there's other mod you can obviously use mix it's going to take a little bit more time to generate the output but it is still performing extremely well I don't like the fact that it outputs these nuns over here and over here so we can actually get rid of that right to do that we just need to double check that the value is not none so I'm going to copy this and going to say if this is not none then print it out right so that's our updated code so it's effectively going to um if the value is not none print it it out cool so if we go and run this now let's clear it up beautiful working perfectly take a look at that so just like that we've now gone and spun up our open AI web server and we've also gone and implemented streaming not too bad but we can make it better and the way that we're going to make it better is by implementing or converting this into an application so we're going to clear this and we are now going to take this one step further so rather than just having to update the prompt over here each time which is a little bit of a pain particularly if you've got users we're going to convert it into an app that you can run on your own desktop so we set up the fake server and have about llm streaming pretty fast but how would we go about hooking this up to an application though this brings us to part two building an app for the fake server time to throw this bad boy into an app so rather than having to go and update a prompt over here we can actually go and frame this inside of an application which like even though we've got a server that's up and running we want to be able to go and use this inside of different applications so we can see that we're getting that our API is taking all of these calls and we're able to go and use tokens but we really want an application right so we're going to jump back into our app.py file and we're going to do exactly that now so let's do it so the first thing that we need to do is we're going to import streamlit so streamlit is going to be uh the app framework that we're going to use cool so we're going to say uh import streamlit as St now you've probably seen me build a bunch of streamlit applications before this one's going to be pretty similar so over here what we're going to do is we're going to create a title so I'm going to say st. tile and you can name it whatever you want so I'm going to say um let's throw in an emoji so rocket uh llama see actually let's say fake open AI server app dot do dot lb you can name your app whatever you want so that's going to be my title here so the title of the app then what we're going to do is we're going to create somewhere that we can hold a prompt so we're going to say prompt is equal to st. chat input and inside of there we're going to say pass your prompt here so that is where we're going to be able to store our prompt then what we want to do is if somebody goes and types in a prompt and hits enter we want to go and effectively go and do all of this stuff that we've been doing so far so we're going to say prompt or if prompt then the first thing that we'll do so if the user hits uh or types a prompt and hits enter Then what we're going to do is we're going to effectively run our app so we're going to first up we want to render our users prompt message to the screen relatively easy to do we can type in st. chat message and there's a ton of information on streamlit right so if you type in streamlit chat components or chat elements there's a ton of information about all the different chat elements that you can use so I tend to use um st. chat input and st. chat message the most but there are a bunch of others as well so the chat message the first value that we need to pass through is who is passing through the message cuz that's what dictates uh how or the chat formatting as well as the little icon that you see or the avatar on the side chat message here we're going to say user and we're going to render markdown and to that we are going to pass to our prompt so this is just going to render the users message to the screen cool then what we want to do is we want to take that prompt and we want to send it to our fake open AI server so we're going to over here inside of our content tags we are going to pass through or change that to our prompt so this previously held our Roi prompt now we're just going to pass through our raw users prompt so we're taking this whatever they typee into the chat input barx and we're going to be sending it to we're bet we're going to be sending it to our client. chat. completion. create endpoint so that is looking Bueno and what do we need to do now so once we've got that response back we want to go and render that to the screen now my personal favorite way of doing this is again streaming because it just shows you just how fast it is and it sort of reaffirms to to the user that hey this is using a large language model cuz like sometimes you'll see stuff that isn't streamed and you're like this could just be stuff that's printed to the screen I like showing streaming for that very reason okay to do streaming inside of streamlit we can keep the majority of what we've got here we just got to do some extra magic so we're going to say with st. chat why have we gone great it's cuz we haven't used that yet with st. chatore message we're going to say AI then why is that gray that looks F okay we'll wait and see um response I know we'll see um so with st. chat message and then to that we're going to specify AI we are then going to create an empty value or um let's say a message variable and set that equal to st. empty so this is eventually what's going to render our output we also want a variable called completed message and we're going to set that to a blank string so then what we're going to do is we're effectively just going to tab in the chunk response bit that we had down there and we are going to say for Chunk in response if the chunk. choices so we're not going to print it out anymore all we're going to do is we're going to take that chunk and we're going to append it to our completed message first up so this will effectively build up a variable that holds our entire completed message we're going to say plus equals this over here and then the last thing that we are going to do it's killing me that this has gone gray why is that gray or like dark I don't know we'll dig into that in a sec um so we're going to take our chunk and we're going to append it to our completed message and then we're going to take this over here our message value and we are going to update that using our completed message so if I copy that boom okay that's all good it's still gray out can you see that on the screen yeah it is coming through head off nope we'll soon find out if there's an error Let's uh let's stick with that for now so now in order to start our app whatever it's not not highlighted we'll get there um so in order to start up our app we can run streamlit run app let me bring that up a little do high beautiful okay so this is our application now running so we can now go and say um write me at python function to get stock prices using Y finance and oh we've got an error hold on that might be it yep there you go there's that error beautiful now we're back Highland I knew we'd find it um let's try that again so uh right me a python function let me zoom in on this to uh get stock prices using Y me scroll over using Y Finance so this should take a look at that running this is now sending it to a fake open AI web server it's now going and writing a python function and it's printing it out so we've now taken our Baseline fake open AI server or our llama CPP compatible web server and we've now gone and wrapped it inside of an application so we can actually go and pass through whatever prompt that we want and you can see here that the remember how I mentioned um so the chat message dictates the Avatar and the formatting so because we specify do user we get that little Avatar and because we specified AI we get that little Avatar but take a look it's actually writing a function to be able to go and extract our Tika prices using Y Finance which we're going to be using in a second but for now we've now gone and wrapped up our web server and used it for an application app is done and dusted and we're even able to use open source llms to write python into the stream lead app but so far this isn't really anything all that revolutionary what if we could use the llm server to call out to functions maybe even get it to do some stock analysis for us this brings us to part three function calling so we' now got and wrapped up our application side of our app it's time to get onto my personal favorite bit function calling so done a lot of chat but what happens if we want to go and integrate this into other applications if we want to go and do some cool stuff well we can do that as well I'm going to show you how so to do this we first up need to import another library and we're actually going to build up a function calling framework to be able to go and extract stock prices over a set period so let's go and do this so we're going to build it up over time so we're going to bring in the instructor library and so this allows us to go and build up a response model and effect L extract the values that we need from a prompt so we're going to go import instructor and then what we need to do is we need to patch our client to be able to go and use this uh response model which you'll see in a second so for now we're going to say client is equal to instructor patch and then we're going to say client is equal to client so basically we're taking a key or we're setting our keyword argument to our client over here so we're now going to say create a patched patched client beautiful okay so that is our client now patched now what we need to do is we need to go and Define how we want to go and extract these values so I've actually got a function over here which I'll include in the GitHub repository which takes in two values so it takes in a ticker as a string so this might be the stock ticker AAPL for apple and it takes in an integer so the number of days so this represents how many days of data that we want so really we need to be able to go and extract a tier as a string and the number of periods that we want so days as an integer so we can actually Define a response model which is actually going to integrate with our web server to be able to do that which means that we'll eventually be able to use that llm to call this function so we are going to build this up I know this is like I've been wanting to teach this for so long so let's first up what we need to do is we need to go and create a class so we're going to create a class called our response model and then this is going to structure what we want extracted I'd love to do like a detailed video just on like building like a hardcore function calling project let me know if you want that in the comments so we also need to bring in the pantic library and specifically one class from there so bring in in the base model class so we're going to say from P identic import base model and then we're going to pass base model over here so we just going in written from pantic import base model so this base model is going to form a structured effectively response that we're going to be able to get out from our large language model so this is one of my favorite things right so we can specify what we want to extract from our generated output or from our prompt so so what we actually need is we need to get back remember our ticker as a string and we need to get back the number of days as an integer super easy we can type in ticker string and we want to go and specify our days as an integer so basically we just Define what values we want our llm to extract and the data types that we wanted want them extracted As and we're able to get this back I know absolutely brilliant so we can now go and Define that we need to make one tweak to our chat completions are create so we're no longer going to stream at least for now what we need to specify is the response model and we're going to set that equal to our response model over here beautiful and then what we also need to do is we need to respin up our web server to be able to use this so specifically we need to change the chat format that our that our fake web server is currently using or our API is currently using to be able to this response model so the way that that occurs is using the functionary chat format so if I type in functionary open AI web server llama PPP you can actually see that there's a whole bunch documentation around actually doing this so I can type in functionary and you can see that we can go and specify the chat format over here so we're going to do exactly that I'm going to show you how to do it um so let's go do it so we need to go back to our server and we need to stop our server so we can do that by run hitting command or contrl C on our keyboard and we're going to spin up our server again but we're going to add in one additional flag this time so we're going to say D- chat functionary so this is telling our web server that we want to go and use the chat the functionary chat format by default I believe it's chat ml we're going to choose functionary this time and eventually once we get to our next stage in our journey or the next part of this we're actually going to be able to use multiple models which makes this a whole lot more valuable so we're going to specify that we want to use functionary hit enter that looks like it's all up and running and now what we're going to do is rather than streaming this out cuz remember with disabled streaming over here we're just going to respond we're going to comment this out for now we're just going to respond with our output which is our response so we're going to say st. chat message AI markdown is equal to response and all things holding equal this response should return back two values it should return back Tika and days so we can then go and pass that to our function so let's go and test this out now so we've commented all that out and we're going to progressively build up on this So eventually you're going to be absolute Wizards when it comes to llama CPP and your fake open AI web server so we're going to run our stream lid app again you don't need to restart it I just did um and then what we want to do is we're going to change our prompt this time so we're going to say summarize the finan uh the stock price movements for AAP uh for the last 7 Days right so that's going to be our prompt so summarize the stock price movements for Apple for the last 7 days so if this works we'll get back two values we'll get back AAPL in seven days or seven in side of two of our variables from our response model so let's actually go and run this take a look how awesome is that so we're now gone and extracted our tier and we've extracted our days all that's left to do is actually go and send this to our function so let's go on ahead and do this CU then what we'll be able to do is do things like actually summarizing those stock price movements we'll be able to use different models to go and do different analytics as well so that is that let's go and run our model now so I'm just going to close this for now so so far let's quickly take a look at what our code looks like so we've gone and got all of our Imports just gone and defined our response model now over here what we now need to do is we need to take the output that we got from our response model and we actually need to run our function so I'm going to leave this output for now just to double check that we're always extracting it but what we want to do is we want to try to run our function so our function is inside of a file called stock data. py so you can see it there so we want to go and extract the get stock prices function from stock data or we want to import it from that module so we're going to do that inside of our app. pile so let me zoom out just so you can see it so up the top we're going to bring in the stock prices function and you could build like a ton of functions you could Loop all of these or make them sequential do a ton of stuff like the the literally so Limitless what you can do with this again maybe we'll do a bigger video on this um so we're going to say from stock prices uh stock data import is this stock data yes it's called stock data import uh what are we importing get stock prices so the function that we want to run is get stock prices so right down here after we've just gone and printed this out back to our application we're going to say we're going to put this in a try except Loop so we're going to try for now we'll pass and then we're going to say accept exception as e if something goes wrong then we're just going to return something went wrong to the screen so we're going to say some thing that's a bit too small we can barely see what we're doing something went wrong perfect let's add a sad emoji sad cool so if something goes wrong we're going to return something went wrong with our sad emoji now if something goes right we want to return back the stock prices from our Stock app so let's go and do this so to run our function we can go and say get stock prices and then through that we are going to remember we need to get our ticker so we need to get this ticker and we need to get the number of days to pass them to our function super easy to do we can go response. tiker and response. days it's literally it kind of cool right and this should return our prices so we're going to store that in a variable called prices then what we can do is we can render that back to the user so we're going to copy of this and instead of returning something went wrong we're going to return back our prices over here you see me looking up here it's just to make sure my head's not blocking the code um so we're going to pass through our prices over here so if that works we should get back our raw response from our prices so let's go on ahead and do this so let's go and re we're just going to copy our prompt refresh pass it through in here take a look at that we've now got our prices now extracted so you can see in here we've actually got a Json object that contains all of our prices so we've got um the date and then inside of that we've got our open value our high value our low value our close value our adjusted close and our volume for each one of those dates so we've now successfully gone and used our llm to call a function now what would be really nice is to actually go and use that to summarize the stock price movements we'll be doing that in a sec so we've got the server to call out to our function and bring back different stock prices but that's only halfway there we didn't actually summarize the stock price movements to do this we've got to send the data back to another llm this brings us to part four using multiple models simultaneously now what we need to do is we actually want to go on ahead and use a different model to perform the summarization so far we're calling the function but we're not actually summarizing so kind of haven't hit the brief yet but we will so what we're going to do is rather than using the functionary model cuz remember we went and spine up our server to be able to go and use the functionary tool which or we actually the functionary chat format we now want to go and use a different model and we want to use chat ml so we want to go and use that same format that we started off with but now we actually want to run them both at the same time CU we want to go and extract those tickers and we want to go and do the summarization how do we do that cuz our server is only handling one model at the moment well this is where multiple models come in so we're going to stop our server and I'm going to show you how to do this so over inside of the GitHub Reaper I'm going to give you this config so you can go on ahead and use it but you are able to go and spin up your fake server with multiple models at the same time all you have to do is Define a Json object in this particular case I've gone and prepared one for you but you can go and specify a whole bunch of different models a whole bunch of different uh chat formats you can also specify whether or not you want to use your GPU or not number of threads batch size the the context window as well so context is here if you need a bigger context window also really important um if you wanted to run it on a different port or on a different host you can do that as well so this means that it's still going to run on Port 8,000 now I've set this up so that it's going to run three different models at the same time it's going to run mistal over here and the beautiful thing about this is that we're able to specify a model alas so over here you can see I've spef the model Alias is mistal then I've also got another model over here which is the big chungus mix which is the big boy so um this is the 8x7 billion parameter model and the Alias that I've got there is mix draw and again this is using the chat format chat ML and then I've gone and specified mistal again but this time I've gone and specified the chat format as functionary so we'll be able to going ahead and use the function calling method there so the model Alias here is going to be mistal D function- calling but you could Define a bunch more you could Define different aliases different chat formats so on and so forth but I'm going to show you how to spin up the server using this so it's actually super easy so we instead of using the model flag like you've seen us do before and the GPU um we can actually go in and simplify this a ton so we can go and type in Python DM Lama cp. server and this is where it gets super easy let me make this a little bit bigger we're going to specify one flag d-c config file and then all we need to do is point to that file so we're going to say config.js so this is because my config do ason file is in the same root folder that I'm calling this function from but if you had it inside of a folder called I don't know um source files it'll be source files SLC config.js or if let's say there was a underscore in the the F folder name it'll be sourcecore file config.js if you had it inside of file called a folder called environment it' be environment config.js you get the idea mine's in the same folder so all I need to do is specify config.js hit enter take a look at that bad boy it's now up and running and we can point to all of those different models kind of cool right so let's jump back into our app cuz we're going to make this absolutely brilliant and we've got to do a couple of tweaks here right so if we go back down to our model remember we specified our model name now now it is is important because we need to point to the right Alias so for the for our actual function calling over here we want to point to our function calling or a model that has the chat format functionary enabled so we're going to go to our config and down here you can see the the model Alias that I've specified is m- function calling so I'm literally just going to copy that cuz that's going to point to this mistal model and it's going to have the chat format functionary so if I go back into my app and then I change this file name over or this variable value over here and I'm going to set it to mistal D function- calling brilliant so that should now work that should now point to mistal function calling we do need to make another call down here to actually do the summarization cuz right now we're not doing the summarization or we're not following through on what we wanted our prompt to do so how do we do that well we just do another llm call so this is where we're going to get stuck in I'm going to zoom out a little bit just so you can see this all so I'm actually going to copy this chat completion bit over here and I'm going to paste that inside of our TR catch block and then I wonder why python didn't call it TR catch they made it accept let me know if you you know why um so this time we are going to specify this variable over here and we're just going to call this full response rather than calling the function call let me walk you through so we're going to run our first llm extract the ticker and extract the number of days we're then going to run the function and then we're going to take the data from the function and we're going to send it to our llm call in the next prompt we could do a bunch of prompt formatting um maybe we do that in the potential full-blown function calling course if you guys want that again let me know in the comments so uh we now need to point to a different model over here so we need to go back to our config.js scroll on up we're going to point to mixture so we're going to copy that model Alias and really the model Alias just think of them like nicknames so they could be you could name them whatever you want most important thing is that when you go and make the call over here you need to point to the right Alias so we're now going to point to mix Dr for our second one cuz we want it to do some amazing summarization and then we are going to disable the response model cuz we don't need it now we don't need to do function calling or we don't need to extract the ticker and the date we've done that in our previous llm call and the only other thing that we're going to do is we're going to append the data that we got to our prompt so I'm going to make this is going to be a little bit janky but you get the idea so we're just going to add in a line break so backwards SL line and then we're going to append the data that we got from here which is in the variable prices you could also do a little bit of um prompt formatting and just structure it out so it's a little bit nicer but effectively we're taking our original users prompt which would have been this summarize the stock price movements for AAPL over the last seven days so we're taking that we're then passing through a break so we're having a space and then we're going to append the prices to that prompt so now hopefully it's got enough data to actually go and do the summarization cuz previously it didn't and then we let's enable streaming here so we can enable streaming again and then we are going to copy our big streaming chunk over here big streaming chunk and we're going to paste that there and we are going to uncommon this also all the code is going to be available in GitHub as per usual so let me zoom out so what we've now gone and done is we've now gone and created another llm call so we're now going to be calling mix with the appended prices over down here and then we're going to be streaming out the response right down here the only change that we need to do down here when we actually going and stream out the output is we need to copy our full response and update the response value there cuz we're no longer looping through uh the values that we got over here cuz that's being that's our function calling response we now want to go and output it using the summarized response so let let's actually make this a little bit more clear this is the summary output so effectively prompt plus prices this is just the function calling llm call Wa no cool all right so all things holding equal we should now successfully be able to achieve what our users ask so summarize the stock price movements for a if you're thinking about what's possible with this right like you could take this ticker you could take the number of days and create an entire profile for a company could even plug it into my trading bot um that we did in the previous video so now what we're going to do is we're actually going to refresh this and hope this works so let's refresh zoom out a little paste that back in and see if we got any errors something went wrong oh no what has gone wrong all right so what I like to do if something actually goes wrong is comment this out and comment this out just so we can see what our error was cuz otherwise it's just going to say something went wrong we got no idea so it's going to be something that's gone wrong down here I believe oh I remember what it is I I made this eror when I was prototyping this this should be a string so we're just going to convert our prices and wrap them inside of a string let's try that now all right let's paste that in here okay this looks like it's running it looks like it's being sent to mixt if we actually go and open up our server take a look it's loading up Mixel so mixt takes a little bit of time to load up that's perfectly normal it's quite a large model but all things holding equal will maybe get a summary and and see what happens a few minutes later okay so mix is now loaded into memory should be generating response take a look how awesome is that it's now generated we've now gone and successfully implemented function calling and it's generating a text based summary pretty cool right we could ask it to to to find we could fine- tune that prompt and get different responses but it's actually going in gener and is the data right so on the first day so what is that March the third of March 4th of March what are we yeah so 4th of March the opening price was 17614 17614 um the opening price what was that the highest price were 176.832 189 lowest price was 17378 17378 uh closing was 1751 1751 how amazing is that volume did we get volume right 85 yep pretty cool so in a nutshell we've now gone and spun up a second model and we've now taken the raw output from from our function call and we've now actually gone and done our summarization home stretch now we've got our function calling application up and running summarizing stock price movements but what if we wanted to use more than just text Data what if we wanted to use llms to analyze images this brings us to part five using multimodal models we're in the end game now and we're going to deviate a little bit and we are going to go on ahead and use a multimodal model so so far what we've done is pretty much follow our path and and build up the ability to go and summarize our stock price movements and do function calling uh multimodal is going to give us the ability to go and bring in images and work with them when it comes to using our llms so let's go on ahead and do this so I'm going to leave our Baseline app and we're going to tweak this one oh we we we'll update our Baseline app so we're going to stop our server and this time we are going to be using a multimodal model so I'm going to clear this out and multimodal B basically allows us to take in an image and blend that with our text space promp so if you think you've ever uploaded an image to chat GPT and asked it to analyze this image that's effectively what we've got the ability to do so if we wanted to go and um I don't know extract drug labels or maybe look at Candlestick patterns I don't know if Candlestick patterns would work but we could definitely try um we've got the ability to do that so uh got some really interesting use cases in this space okay so let's go on ahead and let me show you how to load up a multimo model one of the most popular multimo models is called lava so over inside of our my models F folder I've got two parts I've got the lava model and I've got the clip model over here so you need both to be able to go on ahead and run this inside of your fake open AI web server but really this is the fifth and final thing which I think is most important when it comes to going and using llama uh CPP and the uh open AI compatible web server and really brings it together I've also included the links to where you can get these models inside of the readme file okay as well as all the commands to run the server so if you need a bit of a hand you got that in there so let's go ahead and do this we're bringing it home so we're going to type in Python DM Lama CPP do server my head is covering it as per usual server beautiful and we are going to say-- model and then we're going to point to our models folder and we can you could also pass this inside of your config file but I want to show you sort of uh just how to do it ra directly on the command line um so it is lava uh what do we need to do we need to point to the so this file over here is the clip model we need a point to this file over here for the Bas on model we're going to say Q4 and then we need to specify the clip model path so d-clip modore path and then we point to this file over here we're going to say models and then it's going to be Lava and then we want to point to mm Pro perfect and then we're going to specify ngpu equal to1 again so that is the and we've got an error there should be ngpu not n CPU rookie error beautiful so let me show you the full command cuz I've already gone and run it the full command is python DM Lama cp. server we then specify the model and we are pointing to the uh 4bit quantize model over here we then specify our clip model path I've actually made an error I just realized now we then pass through point to our clip model path and we are passing through to the file that has mm Pro in it we also need to specify the chat format so we need to turn that or specify that as lava D1 -5 so let's go back to our server we're going to need to redo this so we're going to rerun that command let's actually just clear it we're going to rerun this command and we're going to specify chat and then that's going to be Lava I know it's a little bit of pain one Das 5 that is the full command so python DM lava llama CPP do server with then passing through or specifying the model flag we're specifying that value there or that specific model that we want to use but then specifying the clip model path and that is that value there these are just pointing to these fars over here so these two over here then we specify uh the GPU offloading and then we specify the chat format which is going to be Lava D 1-5 if we spin that up now that is beautiful that is now running all things holding equal okay so then we're going to go back into our app and we've done a lot of work on this app now so I kind of don't want to ruin it so I'm actually just going to make a copy so I'm going to copy this paste this down and I'm going to rename it I'm going to say this is the lava app cool all right so what we're going to do now is I'm just going to get rid of the stuff that you don't need when it comes to using lava so we can get rid of instructor and the base model don't need that we also don't need our function we're going to get rid of that we can get rid of our instructor patching get rid of this response model so we're going to take it back to basics and then really all we need is we can actually get rid of all of our we'll keep our streaming keep our streaming yeah we'll keep our streaming oh we've got it down there anyway so we're going to get rid of this whole chunk over here which was everything that we needed to do our function calling so we don't actually need that we're going back to Bare Bones we're going to keep our streaming we're going to reenable that and we're going to get rid of this response over here so really all we should have back is pretty much the Baseline app that we had when we first enabled our streaming but this time we're going to be using our uh lava model so what we want to do now is we want to again that the model aists don't matter that much anymore cuz we've gone and used the specific model that we need we're going to get rid of our response model here and going to reenable streaming the biggest difference when it comes to using lava is how you go and pass through the prompt so over here you saw that we had content we got to get rid of that and to that we're actually going to replace the content value with a array and then we're going to have two values inside of our array we're going to have two different dictionaries inside of that we need two different values so the first value that we need to specify or the first AR is um type and the first type that we're going to specify is image URL and then we actually need to specify what the image URL is so let's make this a little bit neater so over here the Val I'm just going to put a placeholder here at the moment but over here this is eventually going to be the URL for the image that we want to go and analyze so let's say for example we had Candlestick patterns we'd pass through that image URL there and then the second value that we need to specify is sort of similar so the type is going to be text and then this is where we pass through the Baseline prompt so the first value is going to be processing the image so that we can use it the second value is actually going to be the text that we want to go and pass through so this is going to be our prompt here which is just coming from over there perfect so what do we have to do now so I'm just going to pass through httv just to make sure we get uh rid of those errors so that is looking good I think that's pretty much good to go but we need to pass through our URL somewhere so let's say for example we actually wanted to go on use a URL where do we get or pass through that URL so I'm just going to create another uh input value up here that we can use to hold our URL so we're going to set this as a variable and say image URL and then I'm going to uh pass through a place that our user can specify the image URL so I'm going to set that to St do uh let's say text input and then we're going to say put your image URL here right so we're now going to have two inputs so the first input is going to be where a user is able to pass through their image URL the second chat prompt down the bottom is going to be that whatever prompt they want so let's go on ahead and do this so just a quickly summarize we've gone and updated what we passed through in our messages so previously content was just a string we've now gone and updated it to an array and that array contains two different dictionaries one of which holds our image URL and one of which contains our text and assuming that works we should be able to go in ahead and stream it out so let's go ahead and test out our streamlit app now keep in mind because we renamed our app to Lava app we need to go and start our streamlit app again so I'm just going to stop it I'm going to stop the other one I'm going to clear it and then remember just start up our stream lad app what's the command probably didn't give you that much help stream lit run lava app L A oh God this kind of hilarious I don't know what it iser streamlit run lava app.py Okay cool so take a look so we've now got an image URL here so we can actually go and pass through a URL and then specify a prompt and then get it to do something so let's go and find a Candlestick pattern image in keeping with our finance theme so let's go and find an image uh I don't know let's copy this one uh and what we need to do is we need to get the URL so we're going to open it in a new tab and as long as it's got jpeg at the end uh we need to know let's go find that that image let's go and open this open image in new tab Perfect all right cool this should work let's get rid of this resize bit so we've just got the PNG so I'm going to copy that I haven't deed with this image before so it's going to be interesting if I paste that in there and say so this is going to be our image URL and then down here what we need to specify is what we want to do right so I'm going to say um describe the image provided take a look we've now gone and got a description of the image so the image features two graphs each just uh displaying a different Hammer Time pattern is that a thing okay all right clearly a hammer pattern is a thing I did not know that um one of them is an inverted Hammer pattern while the other is a ham one of them is an IM inverted Hammer pattern while the other has a hammer pattern so not quite the greatest description both charts are green and red in color making it easy to distinguish between these patterns the graphs also have multiple arrows pointing outwards from the center indicating important information or insights about the Hammer Time patterns so this actually gives us a ton of information about images but it also allows us to go and blend this together if we wanted to we could then go and take that text could go and pass it to another model and blend it all together so let's say for example we want to combine stock prices and Candlestick patterns uh we could potentially do that but that in a nutshell wraps it up we've now gone and used a multimodal model and we've also gone through all of the five steps when it comes to going ahead and using llama CPP open AI server or the fake open AI server catch you

Info

Channel: Nicholas Renotte

Views: 18,236

Rating: undefined out of 5

Keywords: llms finance, finance llms, llm function calling, llama cpp, llama, llama.cpp, ai, python

Id: voHTS9Nk5VY

Channel Id: undefined

Length: 61min 18sec (3678 seconds)

Published: Fri Mar 22 2024