LangChain Crash Course for Beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Lang chain is a framework designed to simplify the creation of applications using large language models it makes it easy to connect AI models with a bunch of different data sources so you can create customized NLP applications Rashad Kumar created this Lang sync course for beginners he is an experienced engineer and a great teacher let's learn about what langchin is sulang 10 is an open source framework that allows developers working with AI to combine large language models like gpt4 with external sources of computation and data the framework is currently offered in Python in JavaScript well typescript to be specific and you can combine large language models like gpt4 from open AI or hugging phase to your own application so it's an open source framework that allows you to build you know AI llm applications allows you to connect a large language model like tpt4 to your own sources of data and we are not talking about you know pasting a snippet of text into chat GPT prompt we're talking about referencing an entire database filled with your own data so it could be you know a book that's in PDF format that you have converted into the right format for these llms to use which are known as Vector databases and not only that once you get all this information you need you can have leg chain to perform a certain Action For You by integrating external apis so let's say you want to send an email at the end of you know whatever task you did with your given data set and this is where the kind of the main Concepts come into play for the Lang chain framework so I built this diagram to better you know kind of understand the concepts so you have three main kind of Concepts you have components chains and agents So within components you know we have llm wrappers that allow us to connect to a large language model like gpt4 or hugging face then we have prompt templates prompt templates allows us to avoid having to hard code text which is the input to LLS and then we have indexes that allows us to extract the relevant information for the other labs the second concept is change the chains allow us to combine multiple components which are these here together to solve a specific task and build an entire application and finally we have the agents that allows llm to interact with its environment and any of the external apis remember how I talked about the task you want to perform after you have retrieved the information there is a lot to unpack in Lang chain and new stuff is being added every day but on a high level this is what the framework looks like but I have built you know kind of a demo app as you know projects are the way that all of this information basically sticks talking about requirements for this course so you will need python installed and specifically version 3.8 or higher and pip which is python package manager a code editor so I'll be using visual studio code but you can choose whatever code editor of your choices and also an open AI account since we'll be using the open AIS llm today to build our link chain applications I'll be using a Windows machine so all of the commands you'll see in the terminal will be for Windows users but they are quite similar on Mac OS or Linux systems so let's start with the first thing which is you'll need an openai account and in order to sign up for an opening account you can go to openai.com and click on login this will take you to the login screen I already have an account signed up with my Google account so I'll go ahead and log in the reason why we need openai is we will be using open AIS llm and we need an API key so if you click on your user account on the top right hand corner you can click on view API Keys as you can see I have generated a few of them in your case you'll not see any API keys so you can click on create new API key and this will give you a new openai API key now remember to save that safely somewhere because as you can see you can't reveal the API key again so once you create a new one it'll be only revealed one time so save that and we'll be using it later as an environment variable in our code so now let me open up my terminal here and what I'm going to do is create the project directory where our code will reside so I want to make sure I'm in the right directory on my computer here which is GitHub and I'll create a new directory by typing in the command mkdir and we'll call this Lang chain Dash llm-app now let's change our directory to our project directory here and let me open it up in Visual Studio code which is the editor of my choice again you can use any code editor that you like okay now that we have of the project directory open in Visual Studio code I'm just going to open up a terminal in my visual studio code here what I want to do now is create a virtual environment so we'll be using Python and we'll be creating a virtual environment and you can do that by typing python Dash mvnv and then dot VNV so this is the command and then dot VNV is the directory where the virtual environment will exist and as you can see on the right hand side where my project directory is open we have a folder now called dot V EnV and once we have that prepared we'll need to activate this virtual environment and you can do that on Windows by typing in E and V scripts and then activate.ps1 which is a Powershell script that will activate our virtual environment as you can see there is a green virtual environment text in the front of the prompt so this means the my virtual X environment has been active and now we'll use pip which is a python package manager to install the required packages that we'll be using today so I'm gonna do pip install and then Lang chain openai streamlit and also python dot EnV so Lang chain allows us to you know work with Lang chain using python open AI since we'll be using open Ai zlnm and then streamlit allows us to build interface for python applications and you'll be seeing it how streamlit makes it so easy to build interfaces and then python.env allows us to use dot EnV file which is where our openai API key will reside safely as you know environment variables in our python code hit enter okay so after some time all the packages should be installed and you can see my terminal is giving me a warning that a new version of pip is available so if you get same burning you can either upgrade it or you can ignore the warning for now I'll just hit clear so that my terminal has a clear screen but also I'll close it for now what I want you to do is now create a main dot Pi file so now we have main.pi we are python code for the site so let's start by importing Lang chain on the top and we'll be using llms I want to use open AI again I'm using open AI I know it will cost some money and I'll show you in my openai dashboard how much of the API calls cost but it's in cents but it is the best one if you want to use some other ones like the open source hugging phase llm models you can do that too Lang chin supports it but right now I'm happy with openai and also what I want to do is use dot EnV the python.env package that we installed to load our environment variables and we can initiate that by typing in load.env now I can go ahead and create a DOT EnV file and save my open AI underscore API underscore key as an environment variable here and this is where you will paste the SK Dash key that was created in the openai dashboard so let me do that and I don't want to reveal my open AI API key okay so I copied my API key from my open AI account and pasted it in EnV file here so I'll close that now what I want to do with the first sample application here is generate pet names so let's say I have a pet dog and I want to generate some cool names for it and maybe we'll add few parameters where people can select what kind of pet it is and maybe color so let's to start with that function so you can define a function in Python by typing in Def and then we'll call this generate underscore pet underscore name and now we'll be using llm from our Lang chain library and as I said I'll be using openai today so this has few properties one of them is temperature now what temperature means is how creative you want your model to be so if the temperature is set to let's say 0 it means it is very safe and it is not taking any bets or risks but if it is set to 1 it will be very creative and will take risks and also might generate wrong output but it is very creative at the same time so I tend to set my temperature to be 0.5 or 0.6 so that you know it can get a little bit of creative so let's set that by typing in temperature and now what I want to do is use this llm to create cool names for my pet which is in my case a DOT so I'll type something like I have a dog pad and I want cool names for it suggest me five cool names so that's what our luncheon app is gonna be it'll suggest five cool names for your pet so let me type that out as a prompt okay so I have my prompt ready and this function will return the name and now what we can do is if name is equal to main which is you know boilerplate python code I wanted to print whatever that function generates so generate pet name will be printed in our console output so let's give this a try by opening up the terminal here and typing in Python main dot py so as you can see it gave me five names for my pet dog Apollo Blaze Hershey Kona and Maverick which are pretty good names and as you can see I ended up setting the temperature to 0.7 so it's getting a little bit of creative again you can you know test this out by toggling this between 0 to 1 and see what temperature suits your needs but for me yeah 0.5 to 0.7 anything between that is good since I need my llm to be a bit creative so we just introduced one component of Lang chain which is llm the next thing that I want to introduce you to is promptemplate so prompt templates make it easy to generate these problems so you don't have to keep asking openai a different prompt every time right so we want to repurpose this so that people on the internet might be able to generate pet names so maybe we will create you know imagine that you want to create a web app where people can comment and read pet names we want to repurpose this prompt and also we don't want to hard code dog and if we want to have pet color as an option we don't want to hard code that so we want the ability to repurpose our llm prompt for different kind of animals and different kind of colors and the way we can do that is by using prompt templates so prompt template name let's just call it that and in Lang chain it's called prompt template and we will also have to import it from Lang change so going to the top let's import prompt templates okay so you're using Lang chin prompts and importing prompt template now you can see the squiggly line underneath it has gone so let's give it to let's give it an input variables so input variables are the parameters that can be dynamic so in our case it will be animal type right so animal underscore type and now we'll also have to add that as a parameter to our python function there we go so animal type is the input variable and the template that our prompt has is same as this so I'll copy this and instead of a dog pet I will use the input variable here which is animal underscore type so now you can imagine you can say hey I have a cat and some other person comes and says hey I have a cow pet and I want a cool name for it suggest me five cool names so that is what prompt templates allows you to do and now we'll have to also get rid of this and use chains as a concept so that's import chain from Lang chain from Lang chain dot chains import llm chain what llm chain allows us to do is put these individual components of Lang chain together so llm and Prime template in our case so llm chain llm is equal to llm in our case because we named it and prompt is equal to prompt template so I'll just copy this so prompt template name and instead of name let's call this name chain right since this is an llm chain and instead of returning name let's create a response here and that response basically will be name underscore chain and we'll be using the animal type parameter here right which is basically whatever the animal type the person specifies and will be returning the response here so response will be whatever this chain gives us the output has right now let's try instead of dog let's try cat right so we are using parameters to print five cool pet names using our name underscore chain which is the llm chain using openai and using this prompt template hit Ctrl s and going back to my terminal here let's run python main.py so now you can see we are getting a Json response with animal type which is cat and the text that we got is so these are the names that we got one is mochi or Moki nacho Pebbles tiger and whiskers so we got five names for our animal type cat similarly you can try cow here hit Ctrl s and run the python file again now it says animal type was cow and the text response is where our cow pet names are so one is hambone Daisy moo Moody Milky Way and give her hugs awesome so the other parameter that I want to add to our pet's name generator is the pet color because I think that is an important aspect when you name your pet right so pet color and so we'll add pet color as a parameter to our generate pet names function but also we'll have to add it as an input variable in our prompt template so let's add fat color over here there we go and now we'll also have to change the template itself so I have an animal type pet and I want a cool name for it and let's add it is whatever the color is so pet color so maybe it's black in color suggest me five cool names for my pet there we go so that is our new prompt hit control s and in the name chain we'll also have to add the pet color here so pet underscore color and that will be equal to whatever the pad color the person picks or says so now we can run this by saying cow and our cow color is black so let's let's try that out toggle back my terminal here and type in Python main.pi and you can see so we got animal type cow pet color is black and we received text response with those five names so one is Shadow second is midnight uh Starlight and we have Raven awesome so our pet's name generator is working as expected maybe I want to publish this as web app later right and that's where streamlib comes in streamlit will build us a web interface and we don't have to do much we can use our python file here to build that beautiful interface and then people can come in and select whatever pet kind they have and whatever pet color they have and it would output those five names utilizing the Lang chin app we built so in order to do that what I I want to do is instead of having all of this code in main.pi I want to create another file called Lang chain underscore Helper and this is where all our Lang chain code will go so I'm going to go into main.pi Ctrl a to select all the code and paste it in the langchin helper file here and then we can clear the main.pi so our main.pi is blank and I have moved all my code to Lang chain underscore helper.py hit Ctrl s so make sure you have saved that and in main.pi what I want to do is input our Lang chain helper Library so we can do that by doing Simple import statement on the top so I'm importing Lang chain helper as lch just short form so that I'll be able to call our generate pet name function by just using lch dot right also remember we did pip install streamlit in the beginning so we'll be using that here too and I'll be calling it throughout the python code AS SD which is just short for streamlit so in order to create our streamlit app you can use different text types and you can also use markdown which streamlit will render but one of the main things is having a title for our web interface and you can do that by doing St dot title and we'll call this pet's name generator right hit save and now I'll just show you how to run a streamlined app where you can do that is open the terminal and type in streamlit run main.py hit enter and let me open my browser on Port 8501 there we go so as you can see right out of the box we have this interface that was built using streamlit and again if you haven't heard about streamlit it's an amazing tool you can go to streamlit.io and go through their documentation on how to even make your web app better since I'll be using some basic components from streamlit to display our pet's name generator beautifully so let's get back to our app here so back in our code editor I'll hit Ctrl C in my terminal to stop the streamlit app and bring my terminal down and now we need some variables and Logic for the ability for users to pick their pets and the pet color so one of them is the animal type right whether it's dog cat or a cow will give a sidebar selection for our users so SD dot sidebar dot select box will allow you to do that and you can input what the question is so what is your pet question mark and then you can include the options so it will be a drop down where people can select cat right dog and cow and maybe a hen right so think of all the pets then people that people can have maybe hamster is more popular I guess so cat dog cow hamster and then you can just keep going so that is the animal type and I can show you how this looks on our streamlit apps so streamlit space run space so you can see that and I can zoom in a little bit on the left hand side we have a sidebar now and you can select what kind of pet you have so what is your pet the next logic that I want to build is another option to select the color of your pet but I want it in a way that once you have selected the pad type so animal type right if it's cat it should say what color is your cat and we can do that by if statements so if animal underscore type is cat right I want pet color which is another variable we pass to our generate pet name function here you can say pet color is equal to and then we use the select box component from streamlit to ask what color is your cap now I feel like there can be different variations so you can't just put in black blue white orange you know since with cows and even cats and dogs you can have multiple colored pets right like a white dog with black spots on it so we can't have a select box let's just keep this as a text to you and I just thought of that as I was building this right so instead of a select box we have a sidebar with a text area that asks for what color is your cat and I also want to maybe have a limit of Maximum characters that people can put into this because remember we are calling the open AI API and the API calls depend on the amount of information you are sending in the prompt template so if our prompt gets bigger we'll be charged more so in order to limit that let's have a Max character property here again this is available on Shameless documentation and we'll use the label ER so the label is what color is your cat and the maximum characters that users will be allowed to put in is 15 and we can hit save what you can do is copy this over for dog cow so I'll put dog here and what color is your dog or dog for cow it will be what color is your cow and then I think we're left with one which is for hamster again I'll just copy this and paste for hamster okay there is an efficient way to do this but I'm just gonna copy the code that I already have go back to my browser here refresh my streamlit page and now you can see if we select dog it will say what color is your dog and you see the text area which has a limit of 15 characters similarly if you select the cow you can see it asks what color is your cup so both of the parameters have been set right now what I want to do is send this information to our Lang chain helper right because this is where it will generate those names and give it back to us so let's do that so after we have set the pet color right because that's the last question we ask our users what we want to do is have a variable here called response and response is equal to LC Edge which stands which is just short for Lang chain helper here and the function in the Lang chain helper is generate pet name so I'll copy that over so you do lch dot generate pet name so we are accessing that function animal type was the first parameter that we need again I'm using animal type as a variable here maybe we can say user underscore animal type and I'll have to change that over here over here over here and over here again just so that you're not confused so two parameters animal type and pet color and then I'm using the user animal type as a variable on our main.pi so user underscore animal underscore type and the second parameter is pet color again you can do the same here so user underscore pet underscore color and you'll have to update all of these here just to avoid confusion so we are passing these variables that the user said so user will say I have a dog and its pet color is white and we are passing those to our generate pet name function and then we'll just write that as a text field so our text Will field will just reply with response so let's save that now let's go back to our browser here hit refresh now let's select dog and type in the color black and you can hit Control Plus enter to apply and you can see we got a response with the five pet names right and what you can do to display this beautifully is set an output key right so let's go back to our Lang chain helper here and in the name underscore chain We'll add a third property called output key right and the output key is pet underscore name so basically instead of giving us a text output it will associate those five names that it generated to this output key and we can access this in our main.pi so instead of just returning the entire response so the whole text here see how it looks weird we'll just we'll just access the names that it generated and we can do that by doing response and then accessing that underscore name which was the Kiwi set so hit Ctrl s go back to our browser window and click refresh this time let's go with the cat which is white hit Control Plus enter to apply and you can see it displays the text now better right it looks beautiful and we have the recommendations here as snowy marshmallow cotton pull blizzard let's go over the brown hamster so hamster and brown Coco mocha Chestnut caramel biscuit love those names so now as you can see we have a streamlit app and we are using Lang chain to generate five cool pet names for the pets that we might have and we saw how you can use another lamp prom templates and the chain which are three main components of Lang chain but now the important one that's left is Agents right so agents allow llms to interact with the environment so think of apis or things you want to do after Gathering the information so going over the Lang chain documentation about agents the core idea of Agents is to use an llm to choose a sequence of actions to take in Chains a sequence of actions is hard coded in code whereas in agents a language model is used as a reasoning engine to determine which actions to take and in which order and there are several key components Langton provides a few different types of agents to get started even then you will likely want to customize those agents depending on the personality of the agent and the background context you are giving to the agent and then there are tools so tools are functions that an agent calls there are two important considerations giving the agent access to the right tools and describing the tools in a way that is most helpful to the agent so let's test it out so we already have a pet's name generator thing that's working for us right gives us a name for our pet now let me create another function here which will name Lang chain underscore agent and before we can interact with the agent we have to import the Lang chain Agents from the framework so you can do that by adding these three import statements on top so we are importing tools we are also importing the initialization of the agent and the agent type so coming back to our function here so first we'll Define the llm that we want to use and I still want to use the openai llm and the temperature will set it to 0.5 here and then we can load some tools that will perform the given action so there are various tools that are available and again you can go through the availability of tools or the list of tools on the link chain documentation but I'll be using Wikipedia which will be the first tool I want to use and I'll get to it why I want to use Wikipedia and the other one is llm matte because I want to perform some matte and this is to just showcase what agents can do right and then the llm that we'll be using is defined here which is the open AI so llm is equal to llm right and now we'll have to initiate the agent so agent and to initialize its initialize underscore agent and here you specify the tools that will be providing it which is stored right here which is Wikipedia and lmat the llm we want to use right and the agent type so one of the agent types that's available in the quick start guide for langchain is the react and you can go to the agent types documentation here so zero shot react is the one that I'll be using decision uses react framework to determine which tool to use based solely on the tools description so heading over to our code and the way you define that agent type is by setting it here and we'll set the verbose flag to True which means it'll show us the reasoning that'll happen in our console so that's the agent we want and we'll create a result here where we run the agent and now you can specify the tasks so you want to perform through this agent so since our app is solely based on pets let's ask it what is the average age of a dog and I'll ask it to do some math and that is the reason why I loaded the llm math tool here multiply the age by three and at the end we'll print result so that looks good and I will change this so I'll comment this out instead we'll print whatever this generates so Lang chain agent right hit save and now we can run this and just to demonstrate it I'll not be linking this to our streamlit app which was the web interface I'll just run the Lang chain underscore helper python file just to Showcase you how agent works so before I do that I have to make sure that Wikipedia is installed through pip so pip install Wikipedia will install that python Library so now if I run the langchin helper file we'll see the agent in action okay so you'll see that it finished the chain and the answer was the average of a dog is 45 years when multiplied by three but the final answer that it got was 15 right so the average age of the dog is 15 and then it multiplied by 3 which is 45. so you can see that it was able to grab the information from Wikipedia which is 15 as the average age of a dog and it was also able to perform the math and get to this right and now since we set the verbose flag to true you can see the reasoning that went into it right and I'll increase my terminal and with no size here and get rid of the file explorer on the right so you can see I need to find out the average age of a dog action is Wikipedia action input is averages of talk and this is the observation that it found right so it did scan few pages on Wikipedia thought I now know the average age of a dog and the age of the oldest dog right and then action is calculator where it's trying to multiply 15 which is the average age by three because that's what we asked it to do awesome so that's how the agents work and I believe we have kind of covered almost all components within the langchin framework the only thing that's left is indexes right so what are indexes basically as you can see we are still working with the open AI llm but we are also not providing any of the custom knowledge right so we are still relying on open Ai and the information that they have gathered but with langchain you can also provide your own knowledge or knowledge base on which you can ask llm to do certain actions so think of a PDF file or even URLs that you can script or maybe you have a large PDF file with a lot of text and maybe you want to run an llm AI chat bot for your own document so you can do that with the help of language in the next project that I want to showcase you will exactly do that will take a long YouTube video so think of a podcast which is hours long or a long YouTube video right so what I have here is the Microsoft CEO certain dealer full interview on recode but it's 51 minutes long and what I want to do with Lang chain is the ability to ask questions to this video so the context that the llm would have is strictly of that video and I'll be using few libraries like YouTube transcript which basically converts whatever URL we provide for a YouTube video and gets its transcript right so let's build this YouTube assistant now I'm going to show you how you can create this assistant that can answer questions about a specific YouTube video so coming back to the concept of indexes I touched briefly on it but we also saw it in the Lang chain diagram but we know that these large language models become really powerful when you combine them with your own data and your own data in this scenario will be the YouTube transcript that we are going to download automatically but you can basically replace that transcript with any information in this approach so it could be a PDF it could be blog post URL right so what Langton offers is document loaders and I can quickly show you the YouTube transcript one so this is the YouTube transcript and basically it allows you to get the transcript which will be the text version of the YouTube video right but there are several other document loaders that you can see on the left hand side right so you can bring in an S3 file you could bring an Azure blob storage file you could do Hacker News posts or articles right so these are some of the document loaders that are supported by linkchin as of now and we'll be using text Splitters and Vector stores so we are going to use these three components to load our YouTube video transcript split it into smaller chunks and then store it as Vector stores so you can think of these as little helper tools that will make it easy for us to load the transcript which might be thousands of lines of text so to get us started what I have already done is created a YouTube assistant directory so not be using the pets generator directory that we had and what I have done is pretty similar to the pet's name generator right so I have main.pi which will hold our streamlit interface and then the langchin helper will have the length chain components and I've also created a virtual environment and installed all the necessary packages which is link chain openai YouTube transcript also I've I went ahead and created dot EnV file which holds my openai API key so pretty similar to the pet's name generator and now we can start with the lag chain helper first so the first thing that we are going to import is the YouTube loader that we saw right which is a document loader so from langtin dot document loaders we are importing that YouTube loader and the second important thing we need is the text splitter so as I showcased that the video that I have is 51 minutes long you could also pick up a podcast like Lex and they have podcasts that are three hours long and which means you'll have thousands of lines and that is where we'll use the the text splitter to break down those huge transcripts into smaller chunks and I'll show you how and for the rest of the inputs we are gonna input the lag chain components like the llm which will be open AI prompt template and llm chain the other thing coming back to indexes we'll be using Vector stores so I'll be using the phase library and I'll quickly show you what the face library is phase is a library by meta or Facebook for efficient similarity search and you might have heard of other Vector stores or databases like Pinecone or vv8 right but I'll be using phase for this project so let's start with writing some code so I've done all the necessary inputs here the only input that's left is the dot EnV which will load our environment variables and I'll initiate dot EnV here also since I'll be using openai embeddings so we'll initiate that to here and I forgot to import those so I'll import the open aim bearings and now we can create our first function to create a function we know that in Python it's deaf and let's name this function that will be be creating a vector DB create Vector DB from YouTube so that's a pretty big function name right but I want to specify what we are doing and we'll be using phase here also for the parameter let's give this a required parameter which is the video URL right so we'll be pasting this video URL in our streamlit interface and that's what we'll be using and this will be a string right so the first thing we want to do is load the YouTube video from the URL right so we'll use loader which we imported on the top so YouTube loader Dot from YouTube URL and we'll pass the video URL parameter here after we have loaded the YouTube video I want to save this into the transcript variable so we'll create transcript here and we'll just do loader dot load and this should give us the transcript now we'll be using text splitter and I'll specifically tell you why so text splitter and we imported it here as recursive character text splitter you can specify few parameters when using this so the first one is chunk size which will set to 1000 and chunk overlap so chunk size is how much each chunk will contain so for me it will be 1000 right and then overlap is once it has created those individual docs from the long transcript it'll have an overlap in every document so document one the last hundred words would also be included in the document twos first hundred words right so that is what overlap is and now we'll save them into a docs variable so text underscore splitter not split documents as the function and we'll provide the transcript that we had loaded from the YouTube url there we go okay now let's also initiate the phase so phase Dot from documents and we will be using docs which we stored here right docs and we'll be using the open Ai embeddings and we'll return this DB okay so now on to the explanation why we have to split the text so basically what we are doing at the text splitter is we have taken over thousands of lines and split up the documents so it has taken very large transcript over and split it up into chunks of 1000 so that is the first step now you might wonder right so we can't just provide thousand lines to the open AI API remember there is a token size or a limit on how much information you can send to open AIS API and that is why we have split the amount of context we'll be sending for for a YouTube transcript right because the model that I'll be using is the text DaVinci 003 and as you can see it can only take 4097 tokens so I cannot send the entire transcript to open aiz Ai and that is why we'll be splitting it and storing it into Vector stores again this is quite technical I'll not go into much detail but vectors basically are a numerical representation of the text we just created here right so the core responsibility of this function is to load the transcript right take all the text that's in the transcript split it into smaller chunks and then save those chunks as Vector stores again we can't just provide all of these Vector stores to the open AI right we can't just send over the 10 000 or maybe even 50 chunks that we have created of smaller text that's where we'll use phase to do a similarity search right and that's what the next function will be and before I write that next function we'll see if this works so video underscore URL so I'm gonna hard code the video URL that we have for the podcast and see if we get the smaller chunk documents right so let's print this function at the end hit save and we'll open the terminal make sure your virtual environment is activated and you have installed the required packages again all of this will be available on GitHub for reference later but let's run the line chain helper python file again it'll take some time to do the computation I missed to write print so we'll have to print this whatever this function returns which should be the database that we created right so let's run it again and this time we should get the vector stores that were created and so instead of DB if I return docs you'll see those chunks so if I expand my terminal here you can see we have quite a few text here but here are the docs right so you can see that there's a document and then it starts with the content and you'll see multiple document chunks so these are the chunks that we created from the larger transcript so this is one right this is the second one and so on I know the formatting is weird so you can't really tell where the new document starts but yeah this is all the chunks that we have awesome so our function to create the vector DB from YouTube url is working as expected so I'll get rid of this print statement and full return DB here now for the next function which is going to be getting off the response on our query we have to ask this YouTube video right so let's create that function we'll name it get underscore response from query again pretty self-explanatory name for the function itself and we'll pass few parameters to this function one is DB the important one will be query which will be the question that the user asks and K which is another argument that I'll go over this is used for the similarity search that will do so keep in mind the amount of tokens that the text DaVinci 3 Model can take right so keep that in mind it's 4097 so I'll just add a comment here saying text DaVinci can handle 2097 tokens right now in order to do a similarity search we'll save that into a docs variable within this function so DB is what we'll use we'll perform a similarity search on the DB which is the database we created in the previous function so gb.similarity underscore search and the search will be basically the query so the first thing I want to do with this function is basically search the query relevant documents so let's say in this podcast they talk about a ransomware somewhere so right here they talk about ransomware right and if I want to ask a question saying what did they talk about ransomware so my query is just about ransomware that that they talked about in the podcast so it will only search the document that has details about ransomware so we'll not send the entire documents that were created but just the one that is relevant to the query that the user made I hope that makes sense and this is also where we'll pass the K as argument and I'll tell you what K is so remember that we can have 4097 tokens but our chunk size is 1000. so that means we can kind of send four documents right because each document is a size of thousand so let's set that value to four okay so we'll be sending four relevant docs based on the query that the user made now I'll create another variable called docs page content and what we'll basically do is join those four docs that we'll be sending okay so we got those four docs and we are joining them to create one dock because the Toca limit is 4097 and here we'll almost have 4 000 tokens being sent to the text DaVinci 3 mod awesome now let's work with the llm right so pretty similar to what we did with the pet's name generator we'll initiate the llm to be open Ai and as I said the model that I'll be using is text DaVinci 3 so let me go to the open AIS documentation copy this model name come back here and paste it and there is some white space at the end so we'll get rid of that and the second thing we did with the pets name generator was prompt right so prompt templates is the is another main component of Lang chain so we'll use that and this is variable define what the prompt should be for the open AI llm so the first thing would be to specify the input variables right so the first one is question or query right so whatever the question is being asked by the user in Docs so docs is basically the similarity search we did there we go now the template that we'll be using is a prompt that I've created here so I'm gonna copy this really quick since it's a long prompt okay so I've copied the prompt basically it says you're a helpful YouTube assistant that can answer questions about videos based on the videos transcript right answer the following question and this is where the input variable goes whatever the question the user is asking by searching the following video transcript which is the docs right so docs is basically the similarity search we did only use factual information from the transcript to answer the question if you feel like you don't have enough information simply say I don't know right because we don't want the AI or the llm to hallucinate your answer should be needed so that is basically the prompt that we'll be using to answer questions and now we'll be using another main component which is chain within the Lang chain so let's create an llm chain where llm is equal to llm because we specified it here that will be using openai model text DaVinci 3 and prompt is equal to prompt which we specified here using prompt template okay now we just have to learn the response so I'll create a variable call response it will do chain dot run which will basically run our chain since we had question as the input variable here we'll say that question is equal to query because that's what we were referring to it on the previous function and Docs is equal to docs page underscore content remember because we joined all the four documents because K is set to 4 to be one doc because we can we have the ability to send four thousand tokens and then response is equal to response dot replace and this is just some formatting that we have to do because if you remember in the pet's name generated to the response we were getting was in one line and it included new line characters so we'll replace that with some white space and we'll return response okay so now we can test this out as it is in the console by hard coding the question and the URL which we already did so let's get ready for that but also build the interface because it'll be really quick with streamlit so coming over to our main dot Pi let's do some inputs on the top so pretty similar to what we did in our pets named generator so streamlit I'm importing it as St and the langchain helper where are all of the Lang chain code is and I'm also importing text wrap basically it gives you the ability to wrap text so that you're not you don't have to scroll the page the title of this page will be YouTube assistant right so YouTube assistant and now on the sidebar we can have those parameters that we need from the user with sidebar I want to create a farm so we have a submit button at the end so SD dot form is how you do that and you also have to specify a key so key is my form again this is all streamlit stuff and let me know in the comments if I should create a course on streamlit on how to build you know cool python interfaces I love this tool because I don't have to care about building a front end and the first parameter we had in our length chain helper was the YouTube url right so we'll save that as video URL so YouTube url is equal to SD dot sidebar text as we used in the pet's name and we'll just say that the label is what is the YouTube video URL and we'll give a Max to maximum character limit of 50 because I don't think a video URL can exceed 50 characters uh the other parameter we had was the question that the user can ask and we'll save it as query here so St dot sidebar dot text underscore area right and then the label will be asked me about the video so again you can have a limit here right so maybe you can only ask questions that are not long enough so we'll set max characters 50 here too and also set the key to query here okay and at the last since I created this as a form we'll give it a submit button and the label here will be submit now so if Kiri which is the question the user can ask and YouTube url exist right what I want to do is basically run this function to give us the answer right so we'll be as we are already importing the link chain helper on the top as LS lch so that's what we'll be using here so DB which is the database will be equal to so remember we have to pass the video URL uh to the create Vector DB function to create a new Vector database based on the transcript that we got so DB is equal to lch which helps which is basically that we are accessing this python file and then the create Vector DB from YouTube url function and we'll pass the YouTube url as the parameter because remember we just need the video URL here response comma docs is equal to and now we'll get a response which we can do by running this function which is get response from query and remember the parameters that will be passing so lch dot getresponse from query the first one is DB which we just created right and query is the question that I will be asking so um right here whatever the user asks will be the query so I am missing a comma here as I'm going through my code so I'll add that and now we'll save that response in our interface with streamlit so let me create a sub header here which will say answer right and below that we'll have St dot text and we'll wrap that text and this is where the text wrap library is being used you'll you'll see this in the interface once I run it so text wrap dot fill and whatever the response we get from the length chain function you can also set the width of this text area to be 80 let's go with 80 and see how that looks and that is basically it so two parameters for necessary one is the YouTube video URL and the question that the user asks right and we are passing so if the both of those parameters exist first we are creating the database from the YouTube video URL and then we're getting the response based on the question that the user asked using the llm so now we can run our streamlit app after saving the file so if I scroll down to the bottom here for my terminal expand this and run stream lit run main.pi hit enter it should load our web interface for our streamlit app awesome on the left hand side you can see we need to provide a YouTube video URL so I'll just go ahead copy this interview video URL paste it here ask me about the video so let's say what did they talk about rent somewhere is what I want to know and hit submit okay so we have got some errors saying input variables let's go to our terminal and see if we have any logging okay so I found the error I was just missing S I thought I typed it right so instead of input variable it needs to be input variables and we'll Ctrl C to stop our streamlit app and do streamlit run run main.pi again so after adding the S hit enter and now we need the same exact information so copy this and copy the question so the YouTube video URL and what did they talk about ransomware hit submit there we go we got our answer so it says they discussed how ransomware is difficult to track due to zero day exploits and how Microsoft is making it a mission to help with secure cloud backup for Enterprises better tracking of zero day exploits and helping with enforcements they also discussed the importance of public-private Partnerships in order to prioritize cyber security and create new standards such as those for nist so remember our prompt I asked it to be as detailed as possible also say I don't know if it doesn't know what the answer is based on the transcript we provided and not to hallucinate so I think this is a pretty good answer um that we got out of this 52 minute video again you can pick a longer video and ask about anything specifically longer from podcasts right maybe the video is four to five hour Longs and you need to know a specific detail I think that's where this tool or the app we build can be really handy right but yeah so we learned a lot about Lang chain today specifically the main three main components which is llm so any of the large language models that you can use like open AI or hugging face prompt templates right and chains so how you can combine these components into chains to perform the required task and agents right remember in the pets generator we talked a little bit about agents and how they have reasoning behind the tasks that they perform because we try to calculate average age of a dog and also multiply it by three so it used Wikipedia and llm math to get those answers but also we learned a bit about indexing and Vector stores so how you can split large documents into smaller chunks and store it as Vector which is basically you know numerical representation of the documents that we created and then passing those on to the llm since there are certain limits of how much context you can send to the API but yeah one other thing I would like to mention is if you are planning to make these apps public remember we were storing our environment variables in dot EnV file and you might be wondering every ship I also created an openai API key like how much all of this is going to cost so I'll go into my dashboard in into billing to see how much did it cost me to you know basically kind of build this course out so you can see um 10 cents and 30 cents so very close to less than a dollar like half of a Dollar close to 50 cents is what it costed me to make all of these queries to the openai llm the thing I was gonna recommend if you want to publish this app so that the public can use it is to have a field here uh you know with the sidebar saying open AI API key so that the users have to submit their openai API key with their app so you can have a text field here saying hey what is your open AI API key just so that you know you are not being charged and you can make that as a secret field so that the key is not displayed in the interface but you can use that key to make these queries you will just have to pass it in the Lang chain helper so whatever the variable name you decide maybe like open AI API key which you'll get the value from our streamlit interface you can pass that right here when you initiate the large language model so you'll specify openai API key as a parameter here and the value of that key which will be the variable you decide so yeah that's pretty much it for this course again we learned quite a bit about the langchin framework specifically in Python uh you know the models prompts indexes chains and agents or the five main Concepts within nag chain that I wanted to cover again I hope this helps you understand the framework itself and how you can utilize this information to build something really cool with the power of llms but if you would like to see a streamlit course again let me know in the comments but I hope you find this course helpful I'll see you in the next one peace
Info
Channel: freeCodeCamp.org
Views: 120,011
Rating: undefined out of 5
Keywords:
Id: lG7Uxts9SXs
Channel Id: undefined
Length: 65min 30sec (3930 seconds)
Published: Thu Sep 28 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.