Building a RAG application using open-source models (Asking questions from a PDF using Llama2)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hey so uh today I'm going to show you how to run a local llm on your computer and how to build an entire rack system retrieval augmented generation system uh using those locally running llms and these are going to be open source models that we're going to use here is the reason why this is crucial right so even though uh the GPT models are so far uh the best at solving most problems that we want them uh we want them for uh knowing how to use a local llm it's very important first because these open source models are getting really really good number two because they're cheaper and you don't need all of the Power from GPT to solve certain use cases so if you know how to use a local uh model now you can do the same task at the same level of quality for much uh cheaper number three for privacy reasons so many many companies do not want to use the GPT models they don't want to connect to an external API they want to have everything in house and that's what an open source model will give you also if you are planning to deploy one of these models into a scenario where you don't have connectivity let's say robotics right or an edge device uh there is no chance for you to just connect to an API you will have to use an open source model so those are some of the reasons there is one that's also my favorite one that is even if you're using open AI as your primary model you can use one of these open source models as a backup so if the open a API goes down the other day they had downtime for a day I think the model was just returning nonsense you can immediately flip to an open- Source model and keep your workflow running with no disruption right so many many reasons uh the the goal of today's video is to show you how to do that on your computer starting from nothing so I'm going to start with an open browser I'm going to do that I'm going to make it run I'm going to build that simple very simple rack system uh we're going to read or answer questions from a PDF so I'm going to get a website download it as a PDF and answer questions from that and the most important thing is that here the thing that matters the least is the code that I'm going to write that is not what's important about this anyone can write the code the thing that matters the most is the reasoning behind that code why do we need to do this or that right that is what I want to convey on this video and I hope that's what you get uh out of it right the understanding of the stuff that we're building here and the reasoning behind it all right so uh with no uh finishing that introduction let me get here to my browser and I opened we're going to start here very simple I opened uh this website is AMA so AMA is the project that is going to allow us to run an open- Source model in our computer okay so here's the thing here's the thing that I want you to know like when you think about a model I want you to think about a like a gigantic mathematical formula because that's what it boils down to it's just a bunch of values weights and biases we call them parameters and they're put together in a gigantic mathematical formula that is huge like when we talk about Lama 2 the 7B model that 7B means 7 billion parameters so it refers to the number of weights and biases that we need to store and execute uh in order to make any prediction uh Lama 270b that's 70 billion parameters so when we download one of these models what we are downloading is all of those values all of those parameters the 70 billion parameters we're storing that in our our hard drive and we're storing uh some sort of like a instructions on how to put together those parameters and run them as a big mathematical formula that's basically what we are storing so AMA is going to serve as the common wrapper around all of these different models so we can download Lama 2 but we can also download uh mix straw and we can run them through AMA through this common interface so to install AMA is very simple go to this website ama.com they have versions for Mac Linux and they just release a preview for Windows so even if you're on Windows you can run uh AMA on your computer so I already downloaded it uh it's just it's very thin it's just it's very small uh when you download it you can run it and you can see here on my status bar uh you can see there is a little llama right there that tells me that a Lama is running on my computer by the way the first time you run a llama it's going to ask you to uh to install the command line tools I think that's what it does or to download the Llama 2 model I don't remember what is the first instruction they give you but it's very simple to navigate uh and it's very quick so there is here here in the website uh you're going to see a link it's called models so click on that and here is the list of all of the models that all you can run using a Lama okay so you have the Gemma models that just Google uh released I think it was a week ago this was updated two days ago Jesus it's is so fast uh you get llama 2 mol mixol with an x uh lava I mean just goes on and on and on if you care about code you have code Lama here so all of these models you can download now how do you do that I mean you can obviously you can go here inside one of these models and see the entire family you see the 7B version the 70b version the chat model you can you know you can explore this on your own you're going to get information about your your model how to use it Etc uh this is what I'm going to do it's going to be very very simple so I'm going to go to my command line and you're going to have Ama as the uh as a command after you install it and here you can see a bunch of available commands here you get the pull command so basically you can say Lama pull Lama 2 and that will download the latest version of Lama 2 to your computer okay I already did that so I'm not going to do it here plus it takes I mean it's downloading I think it's like 20 Gigabytes of data so it's going to take a few a few minutes to do uh you can do a llama list and that will tell you what are the models that I have installed here on my computer so I have the latest version of Lama 2 uh I also have mixol the a * 7B uh version and I have the latest version of mixol okay and you can see here the ID you can see the size 26 gabt mixol Lama 2 is only 3 GB and when I was modified I have here just to show you when you download a model uh that is going to get I don't know how big this is on the on the recording but when you download a model uh it basically downloads these files within a folder I'm on a Mac here so it will download uh within my my base directory there is aama directory and it will put in there every file that I downloads and you can see like some of these like this one here is the mixol one see the 26 gab that is just basically all of the parameters are stored there after the latest training version of mix trol so all of those values are going to be store here so just so you know if you want to delete them uh where to come you can also obviously delete them using the RM command here so now that I have this I think I can do I don't remember the command but I'm going to do Ama llama 2 is not like that so how do I serve this oh there we go maybe serve let's see show run ah let's do run there we go so now we're running Lama 2 here on my computer okay and now I can say tell me a joke and there we go sure here is one we why don't scientists trust um this is just a bad joke I'm sorry sorry so anyway here is the model running on my computer which is awesome uh if I do this this is going to give me the help uh let's just say buy okay awesome so now I have an open-source model running on my computer from the command line that's great but that's not what I want what I want is to be able to access this model programmatically so in order to do that I'm going to create a directory here and we're going to build a very simple rack system from zero from scratch using L chain to get data from a PDF file okay so that's what I want to do all right so I'm going to create a directory I'm going to call it uh I don't know local model let's call it local model okay and let's go open Visual Studio code on that directory there we go local model okay awesome so here's visual stud Studio code I'm going to make it nice and big and beautiful so you guys can see and I'm going to create a notebook I'm going to call it notebook and this is a Jupiter notebook uh just so you know in order for you to be able to use Jupiter with a visual studio code you need to install the Jupiter plugin uh this plugin is created by Microsoft and obviously the python plugin because this is going to be python but yeah I'm going to be using Jupiter from within my notebook here uh awesome I'm going to open a terminal window within Jupiter and I'm going to create a virtual environment so I can install all of the libraries that we're going to need uh within that virtual environment I don't want to be installing anything directly on my computer so from within uh Jupiter notebook here from within the terminal I'm going to do Python 3 to just call u a module and the module is vm. the virtual environment module that comes with python and I'm going to call itvm like that's the name of the folder that is going to create so I'm going to do that and that is going to execute and now I have a new folder here is a hidden folder but it's going to be a new folder that's going to be called VM uh many people use different virtual environments they use poetry and they use cond Mina I'm an old school guy I'm want to I want to keep it very simple that's why I'm using the virtual environment here here all right so let me just uh go inside that virtual environment cool and now I can start installing stuff okay so what do I need here uh well I I don't know still what I need but let's first start by just doing something from this notebook just making sure this notebook is actually working it tells me hey you need to select a kernel what is the kernel I'm going to go to the python environment that I just created it I'm going to select that Visual Studio is going to tell me that it's going to install everything that it needs to execute that print line and boom it runs so this is awesome this is working okay so this is something that we are going to need uh here I'm going to need an environment file so this environment file is just going to store any environment variables that I'm going to use during this presentation so for this presentation uh we're going to need the open AI key we're going to need to pass that uh obviously I don't have the value here but this is going to be something like blah and I'm going to store it there and then I want to read that key from my code from my notebook so I can get access to the open AI key why do I need access to the open a API because I want to test everything that I'm doing locally also with GPT just to make sure how they compare so that's why I'm going to access the the open AI key so I have my key over here I'm going to paste it uh maybe in order to do that I don't want to just put it on the on the screen so you guys don't just use my key to build your applications for free okay I know you guys are not seeing what I'm doing but basically just pasting my open AI key uh off screen so I know I could do it here and then just change it but anyway so after doing that this is the stuff that I'm going to do so I'm importing the OS library and I'm importing this Library uh it's called DM and that's going to read the environment variables that I just store in the data end file it's going to read that in memory uh when I call this load. m and now I can use my open AI key just like this it's just very simple now in order for me to be able to use this I need to install uh it's called python. M so I need to install that Library boom awesome and then I'm defining here three it's the same variable just overwriting in shorter I'm going to start by using the GPT 3.5 I'm going to start by using this just a variable boom that works okay so now I'm ready to just using L chain uh create a very simple model just to make sure that the API is working okay so I'm going to be copying the code here uh let's see so how do we how do we access this I'm going to import the chat. openai model and yeah thanks to to copilot I'm going to create model chat. open AI I'm going to pass the open AI key I'm going to pass the definition of the model which is going to be or the name of the model which is going to be GPT 3.5 and now that I have here I can just to invoke and I can pass hey tell me a joke okay so I need to install L chain open AI of course I want to do that uh what else I need to install more stuff uh this might have been installed but just in case I'm going to install the L chain as well yep it wasn't okay awesome so now can I run this boom obviously this is beautiful okay so here we go I asked the open a API tell me a joke and I get back why couldn't the lower I get back a response from open AI that is awesome but this is just ch GPT or the GPT model we don't want that we want to do it from the locally uh running mod model so how do we do that it's very simple we need to use the instead of using the specifically the open AI chat model we need to use anama model and I think that's on a different Library uh might have been I'm not sure I think it's in the L chain Community Library uh let's see let's import it here and we're going to get this class which is AMA and what I I want to do is I want to instantiate anama model if if the model is not a GPT model so I'm going to do something like this if model sorry if model uh starts with not not like this but the other way around GPT then do this right else okay let's do an AMA model uh by the way I I can take this out and do it regardless okay so this is awesome so basically if I Define that my model is a GPT then I want to instan shate model using the chat open AI model if not I want to instan shate the model using the class or Lama passing the name of the model in this case uh I'm going to test this just using GPT let's see so that should work like it did before it does now I'm going to change the name of the model let's do Lama too so I'm going to change the name of the model and there we go you get an answer back from llama notice something interesting here notice that when I call the Llama model I get back a string that's what I get back from the Llama 2 model but when I call the GPT model I get an AI message instance here that has the content inside so the reason that's happening is because the GPT turbo model model this model here is what we call a chat model so it's a model that's meant for a conversation right they're going to be AI messages which is what I get here they're also going to be human messages which is like the questions that I'm going to ask that model for example they're going to be classified as human messages so when I interact with that class Lan chain would return a special structure in this case is an AI message instance containing the content inside so but the Lama 2 model is a completion model it's not a chat model I could be using a chat model as well but I'm not I'm using a completion model and that's why what you get back is a string so how do we fix this uh or just it's not a problem that we need to fix but I don't like to see an AI message here well we can use l chain to parse out that AI message just remove that and turn it into a string right so L chain supports the concept concept of parsers and I have uh I have here what we need to do and you're going to see how simple it is so let me paste it here so I'm going to import a string output parser and a parser again it's just a class that's going to take an input and it's going to transform it in one way in this case this one here is going to transform the input into a string which is what we want and I'm going to create my first L chain chain here Lang chain is language chain I'm going to create a chain here uh that's where the name comes from so my chain is going to take the model and I want to pipe the output of that model into the input of the next component in this case it's a parser so if I do this basically what's going to happen is that Lan chain will talk to the model will send a request to the model and then we'll get the output from the model and we'll pipe that output into the input of the parer which then will return the string so if I do this I'm going to reexecute this line I get my AI message but I'm going to do it again just now from the chain so I'm invoking the chain not the model anymore I'm invoking the chain so the parser gets involved when I do that boom I just get back the string why because this parser that's the job I'm going to remove the parser invoke it now you get the AI message I put back dep paror invoke it now you get the string see how that works that's beautiful and that is one of the main characteristics of L chain you can put uh you can create increasingly more complex chains using different components okay but this is just the beginning we cannot run a model and that model can run locally I have a llama here and we know how to do a chain uh let me just build a simple rack system just very very simple I want to answer questions from a PD PF so the first question is what is going to be that PDF so I'm going to go to this website here and this is the class that I that I teach I teach a live program is called a machine learning a school program well it's actually called building machine learning uh systems that don't suck and I'm going to save this as a string okay so I'm going to uh go here like if I were to print this I'm going to save my whole website as a PDF so let me save that as a PDF and I'm going to save it the same folder I'm going to put it in the same folder uh let's see where do I put it here this is just it's horrible the dialogue that Apple decided to create to save components I just hate it so much all right I'm going to call it uh ml school which is the the name of the website okay so I'm going to call it ml School boom let's go back here here we go we have our PDF right there so what I want to do now is use my model to answer questions from that PDF the first thing that we need to do is load that PDF in memory so how do we do that we're going to need a library that I have it here somewhere okay the library is uh PP install it's called Pi PDF or python PDF however you want to call it so we need to install that library and then using L chain we can actually load that in memory so we're going to come here and look at this L chain supports document loaders and by the way there are a bunch of different document loaders that you can use to load information from anywhere okay so I'm going to use the P PDF loader that's why I needed the library in the first place I'm going to type what the name of the PDF is going to be here and I'm going to load it and split it and that is very important and then I'm printing out the p pages so you see what the result was so I'm going to execute this and see what just happened so Lan chain using this loader class loaded and splited my PDF into different pages you're going to see uh here it says page one of 14 page two of 14 all the way to page 14 of 14 so he split my entire PDF document into different pages and loaded each one of those pages pages in memory okay so I have 15 pages or 14 pages in memory right now that's awesome that was a great great step I have that in memory here the next thing is creating a template so I need to create a template I want I say a template it's a prompt template to ask the model to answer questions from specific context so let's do that uh and let's do it the right way so look at this by the way all of these stuff all of the steps that I'm covering here talking about retrieval augmented generation systems I covered in much more details in a different video I'm going to put it somewhere here uh or maybe the description below or if not you can find the latest video in my in my YouTube channel and you're going to see more detail about these steps here so okay so here is a promp template that we are going to use to pass to the model so here is a stram it says answer the question based on the context below if you can answer the question reply I don't know okay you can make this comp this more complex if you want it this is good enough for for us then I'm going to provide the context and then I'm going to provide the question that I want to answer I'm basically telling the model the question please answer it using this information don't go to your memory don't go to the stuff that you've learned before just answer out of this section here okay so ideally I'm going to be able to grab some of the pages on that PDF put them here the content of the pages and then answer a question or ask a question about those pages so I'm going to create this prom template notice these two squiggly braces here uh context and question those are variables and L chain will turn them into variables that I can pass and provide values for so here is my prompt I'm using the prompt template a class to create a template from the string that I just passed right here and now I'm just just to test it out I'm calling the format function I'm saying hey format my prompt passing context see how context just became a variable here or an argument to this function here is some context and the question here is a question so I'm going to execute this just to make sure this works and you can see actually let me do a print here let's see if those no lines much better much better like this answer the question based on the context below if you can answer the question reply I don't know and then we say context here's some context and then we say question here is a question okay so that's awesome we have a prompt so how do we pass this prompt into the model well we can just keep building our chain remember that our chain was like this and we were piping that into a parser so we could make this chain better if we do something like this so what happens if we do this we have a chain we start with a prompt and that that prompt is going to go into the model and that that model is going to go into a parer so we can create that the question now is what do you think is going when we say chain invoke what do you think we need to pass well remember there are two variables that we need to provide therefore we're going to have to invoke this chain passing cont context I'm passing a question so if I say uh the name I was given was Santiago and then I'm gonna ask what's my name oops and if I do this your name is Santiago boom that works important lesson here notice how this chain when we invoke the chain we need to understand and what the input of the chain will be now there is something that might help there uh there is an input schema functionality that you can call the chain I mean obviously you can just look at the first component of the chain and if you understand what the first component is waiting for or is is expecting that is the in invocation that is going to need to happen but I find this input schema uh trick or or or tool very helpful because it tells me it gives me information about the chain without me having to overanalyze what that chain looks like so in this case you see okay so this is the chain what is the input schema to that chain and it talks about the prompt input so it's going to be the prompt and it tells me well the properties is expecting an object and the properties of that object are a context okay which is a string and a question which is also a string so these are the two variables context and question see so that is why I know that I need to invoke that chain with the context and the question okay awesome so what do we have right now we have a chain that already has a prompt a model and a parser we have the documents in memory the by the documents I mean the PDF document we already have it in memory split by Pages now we need to find a way to take that document and pass it as a context but only pass the relevant portions of that document as a context so how do we do that well I'm going to use a very simple Vector store that is going to uh do several things for us so number one it's going to save it's going to serve as a database for the content in a different way we're not going to store the pages of the content just straight into the database we are going to be storing embeddings of those documents so we're going to get the whole PDF we're going to generate embeddings for each one of the pages of that PDF and the reason we generate these embeddings is so we can later compare those embeddings with the question that the user is asking and find the embeddings that are the most similar or the pages of the document that are the most similar to the question the user asks and I know I'm wiping my hands here a little bit uh in the video that I mentioned before in the other video which is it's is called building a rack system from scratch on that video I go into a lot of details about how embeddings work uh hopefully by now you know that if not just check that video out because it's going to help you understand what is the reason we create these embeddings the good news is that all of these embeddings are going to be created for us behind the scenes and the doc the uh the vector store in memory is going to help us retrieve the pages that are the most relevant to answer specific questions so how do we do that well first I need to install a couple of libraries here uh the first one is going to be Doc array so I'm going to do pip install do array that is important the second one is a specific version of pantic by way I'm installing all of these by hand in the description and in this YouTube video you're going to find a link to the repo with all of this content so you don't have to do any of the so you don't have to follow through you can just go directly and grab the content including all the libraries that you need to install okay I think that's it no I need something else I need to install this or not uh let's see no this we might not need this or we do I don't know let's see let's see if we need that or not okay so let's create by the way let me just hide here the okay much better I have a little bit more space so here is what I'm going to create now I'm going to use a dock array memory search Okay so this dock array memory search uh this is just going to create a vector story memory now if we were building a real app application we wouldn't be using this Vector storing memory obviously we will be using something that has permanent storage like pine cone or any other Vector database out there but this is good enough for our video here for our purpose this is just going to do the same thing but just in memory here in our computer and the good thing about this dock array in memory Vector store is that we can just load it and create it off of the documents that we generated okay okay so you can see here that I'm passing the pages of the document that we generated from the PDF right remember that we here are the pages not not here these are the pages of the document you can see here pages is just an array with every single page so we can create a vector store directly off of all of those pages what's going to happen is that all of those pages are going to uh go into the database the database is going to generate embeddings for all of those it's going to save all of that in memory there is something else that I need to pass and I need to provide the embeddings class so what class we're going to be using to generate the embeddings here's the thing every model uses a different model to generate embeddings so depending on the model that we create we need to uh generate embeddings one way or the other we have here right now uh either a GPT model or an AMA model therefore or we are going to have to generate embeddings uh in a different way so let's see embeddings let's create a variable uh this is not co-pilot is not being helpful right there I think the open ey embeddings are here okay there we go so we're going to need this class so if if we're using a GPT model the embedding instance is going going to be we're going to instan shate embeddings with the open AI embeddings model and theama one theama ones is going to be this one here so there is an AMA embeddings and that is the one that we want to use if oops this is the one that we want to use if we're using Lama 2 or mixol or any other llama model let me execute that again all right let me go here all of this is good all of this is good here is ooom okay so there is a problem here with the library okay so I have a problem here with the Lou let me restart this just to make sure that is not what's happening I know I remember the first time I installed this that I had issues as well with uh the vector store in memory Vector store uh but no that now it's working fine so now I have a vector store here which is gray and I think I can do something like uh retrieve not tell me a joke but let me retrieve something related to machine learning let me see if this works oh maybe like this there we go okay so if I get to my Vector store and I turn it into a retriever and and a retriever is a component of L chain that will allow you to retrieve information from anywhere so basically here what I'm doing just to put it in a different way that it's it's it's uh a little bit less convoluted I'm going to create a retriever off of the vector store and again you don't need a vector store for a retriever you can create a retriever that that's going to be using Google searches you can create a retriever that's going to get information from anywhere so just in this case it's just going to come from the vector store and then I can invoke my Retriever and then pass information and what's going to happen is that that retriever or that Vector store is going to return the four top documents that are the most relevant to the concept machine learning okay so anything that's relevant to that concept going to come sorted in order of importance back and I think I can say there might be a k here let's do two not here somewhere somewhere there is a actually uh maybe top I don't think is is top K but there is a parameter which I don't remember right now I will have to look at the documentation if you wanted to control how many documents are going to come back you can do that through the retriever I'm not sure what the name is right now it doesn't really matter we can use four all right so now we have a retriever so the idea here is going back to our chain let me copy our chain down here so getting back to my chain I have a prompt I have a model and I have a parser and remember that that prompt is expecting a context and a question okay that's what I need to pass that prompt now the context is g to come from this retriever okay so this is what's going to get a little bit tricky here the prompt is expecting a map so imagine that I do this if I take a map I'm going to create a map here and I'm going to pass a context uh let's see the name what was the the name I was given was Santiago and I pass a question what is my name uh can I do this or not I think I I'm going to need something like this okay can I do invoke doesn't work let's figure this out white okay oh this is it runnable okay can I do this actually nah that's not it yeah I thought that this was going to get uh turned into a runnable directly let's see why this is not working so it says let me let me recreate this I mean I know how to fix the problem I just don't want to fix it like that let me do let me do this uh import operator no sorry from operator import item getter and then let's get an item getter and let's do question and let's pass that to a retriever and then let's do this still doesn't work why is why it doesn't work let me see the documentation here really quick and see why that is not working okay so you know what let me just pass this question here oh well I I understand why that doesn't work I need to pass a question obviously now let's do this now there's something let me just check here okay so I have the retriever okay I have a prompt I have a model I have my parser so that is working fine this looks much better here the question let me gra let me grab the same question that I'm passing okay there we go this is what that was that was the problem all right so I'm going to explain here really quick what was happening here uh because it's not it's not obvious what was happening here all right so I have my prom my model my arer I need to pass the prompt I need to pass a context and I I need to pass a question to my prompt okay so what I want to do here is the context is going to come from a retriever so I'm going to put this here organize it in a different way so it's it's more obvious so the context is going to come from a retriever but that retriever requires the question that I'm invoking this chain with so they have this weird here and there is another way of doing it but we're using here uh something that's called the item getter function and just so you guys understand what the item getter is if I do item getter and I pass uh ABC now I can call that actually compilot is is being helpful here there we go if I do this I create this item getter with ABC I can apply that to a function later that becomes a function that when I call it with a dictionary for example it's going to return one two three so if I execute this you get the one two three here so in this particular case if I if I have item getter of question and I pass it I pass this dictionary here of course what I'm going to get back is just the question right what is machine learning because this is the question okay so this is the way you sort of like put together this chain I'm going to expect here the first module is going to be what they call a runnable so it's basically a component that can run so a runnable here is going to generate a map because that map is what's going to be passing we're going to be passing that map with context and question to The Prompt we're going to generate a map and the first value of the first variable of the map which is context is going to come from the retriever but this is a unit here so I'm basically grabbing the question from the invoke I'm grabbing the question piping that question into the Retriever and the output of that is what's going to go into context and we know what the output of that is going to be because we already did it here you can see here this is my Retriever and I'm piping or or invoking that with a question and I'm going to get an array of documents so this context here is nothing else that this array of documents okay and then question question which is the second value of the map is just the value of the question I'm passing through the question from the invocation I'm passing it through here to the second component and that creates my entire chain so now I need to test this chain and to test this chain I actually have a bunch of questions that I'm going to be using to test this chain and here are my questions what is the purpose of the course how many hours of live sessions how many coding assignments and just a bunch of questions and now I'm going to go one by one and I'm going to answer those questions so this is uh let's actually let's do this for question and questions uh we can invoke the chain saying hey this is my question I'm going to do something else which is I'm going to print here the name of the question so I'm going to do hey this is the actually let me do a printf question I'm going to print the name of the question I'm going to print the answer so let's do let's do something like this answer and then let's put all of this here inside I'm going to need to change this to single quotes and then let's just print a new line and that is going to call let me give it a try what is the purpose of the curse and this is the answer okay let's see let's see this is the GPT model by the way yeah this is fine how many hours the program offers 18 hours of live interactive sessions that is correct how many coding assignments there are 30 coding assignments that is correct is there a program certificate yes that is correct what programming language will be used python how much does the program cost the program costs $450 for Lifetime access so this is correct that's the GPT model answering questions from my PDF let's now change the model to something different I'm going to go up here and I'm going to do Lama 2 going to execute and everything here should stay the same if we did our jobs everything should work without making any other changes to what we built let's do all of that by the way loading the documents on Lama 2 uh is running here on my computer which is not I don't have like a big Nvidia GPU I have my my uh M3 laptop it's pretty good laptop but obviously with a bigger GPU it's going to run much faster so let's [Music] see okay answer look at this this Lama 2 is just so BOS the answer to your question is 18 hours of Hands-On live training spread over three weeks but it works it's correct how many coding assignments I don't know the exact number of coding assignments in the program Accord according to the provided document there are 30 coding assignments Jesus Christ it knows where the information is it just sucks at summarizing it is there a program certificate yes what programming language uh python how much does the program cost based on the context provided the program costs a th000 to join that is just not true it is very clear I don't think there is even a $11,000 mention in the entire higher page so that it just hallucinated that completely so obviously to get this working by the way I I also have this model let me just try this model just for fun and then I'm going to show you something else before we finish uh I'm going to try mixol uh just running the whole thing here uh one thing that I wanted to mention is that I've had obviously this these models are not as good as the GPT models and here I'm not even using GP G pt4 gp4 is so much better uh but if you if you play with the with the prompts uh you can get this models uh to do a very good job summarizing stuff obviously my my prompt here is very very Bare Bones just for this example but this I mean don't don't be discouraged because the model doesn't answer correctly some of these questions uh playing with a prompt will go a long way uh yeah so let's see it's still running this is a big model uh it takes a little bit of time to generate all of those embeddings because the mol a7b is is huge compared to Lama two let's see uh did we start a answering yet not yet still invoking this chain it's coming it's coming through 20 seconds just to invoke this question here there we go okay so now it's G to try to answer all of those questions again this is just a big model if we go back to uh let me see mraw is this one here it's the 26 gigabyte compare that to the 3 gigabyte this is this one here is L 2 so the 26 gigabyte is is mixol so it takes quite a bit of time on my computer to produce any results and speaking of and while that works speaking of of of being slow and whatnot uh I want to show you something else which is it's pretty cool and the first thing is how to stream so you can use here I'm invoking my chain and waiting for an answer to come back to display the whole answer but a trick just to to for your users when you present this information could be just streaming answers back so you can see here I'm calling the chain do stream instead of calling invoke I'm calling stream with a question and whenever this finishes uh it has answer one uh this hallucinated this four hours of live sessions per week oh no no no this is true with two live sessions each lasting two hours these live sessions take place every Monday and Thursday so that is true but it's not returning oh there we go look at this so mixol is is doing a multiplication here so it's saying it's 12 hours and it's ignoring ah there we go however the document mentions that there are 18 hours of Hands-On live training okay it's just very very bothos try to do some math uh yeah it says that the number is not specifying the documents but it can be infer that there are at least 30 codent assignments what what do you mean like you first tell me that you cannot infer it I mean that that is not mentioned and then that you can infer that no you just read that it says D coding assignments in my document is models are just yeah okay uh what model will be used is Python and it says that the document does not provide information on the on the on the cost of the program by the way the information is there you saw GPT doing it let's go back to GPT here really quick so we can test the the streaming I'm going to show you streaming uh something else really quick and then we are just going to be done with this okay so this is how it works when it answers one question after the other right and you can see boom boom boom boom it displays the answers all together but if we do streaming look at what's going to happen boom boom boom see let's let me try to do it again see how it just sort of like Builds on the question and it's really fast that's why you barely notice but it builds on the answer uh just because it's streaming out the characters as they are produced by the model so that's super cool the other thing that you can do is just batching uh which is also super cool so here I have a bunch of questions and I'm answering those questions one by one we can also just do batching so batching basically I'm just passing instead of passing just a single question I'm passing an array of questions and when I do that look what's going to happen it's just going to take a little bit more time but boom it's just going to display all of the answers at the same time and the good news is that all of these calls are going to be in parallel behind the scenes so we don't have to wait for one answer in order to ask the next question we can answer we can ask many questions at the same time time so the overall result is going to be way faster so all of that is thanks to L chain so again the code is going to be down uh in the description below just make sure you like this video it's a ton of work that goes into this videos make sure you like these videos I'm going to be creating more videos but it's your likes what makes me create more videos If you guys don't like it well just going to stop creating like these videos uh what you learned today just as the final thing thing that I need to mention is how to use these models locally right and you can do this on your Linux server on your own computer or whatever use these models locally and combine them or create a piece of code that will allow you to use these models uh regardless of the that the exact model you can use one or the other and the entirety of your code does not need to change to reflect that so hopefully you enjoyed it um I have a bunch of videos that are going to be coming through I think the next one is going to be a simpler one how instead of just doing a PDF how you can connect to the web directly and answer questions from a website I think that's the one that's going to come next we'll see uh but anyway thank you and I will see you in the next one byebye
Info
Channel: Underfitted
Views: 22,938
Rating: undefined out of 5
Keywords: Machine learning, artificial intelligence, data science, software engineering, mlops, software, development, ML, AI
Id: HRvyei7vFSM
Channel Id: undefined
Length: 53min 15sec (3195 seconds)
Published: Sat Mar 09 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.