LLM Project | End to End LLM Project Using LangChain, Google Palm In Ed-Tech Industry

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we are going to build end to endend llm project where we will use Lang chain hugging face streamlet and Google Palm model the use case we are solving is for a real e-learning company we have not used any toy data set here that company is my own company code Basics we have this code basic. platform which has many data related courses with more than 20,000 students now these existing students or a person who wants to buy the course they will have questions related to courses fees Etc and we will build a Q&A system where they can typee the question and they will get the answer the project will be great for your resume as well as for your learning so let's begin let's check project requirements Cod bics iio is a platform which has multiple courses and before a person buys the course or even after buying the courses they might have some questions for that they send an email uh we have a specific email where they can send inquiries such as uh what is a refund policy and and what is the validity of the course Etc or even after uh enrolling in the course they will have different type of questions and these emails that I'm showing you are real emails we have more than 20,000 paid student and if you include free courses we have more than 180,000 students so this is a real company real data set we also have a Discord server for our courses where people can ask questions on variet of topics for example I'm showing you this Excel chat where people will ask questions related to excel course uh there is powerbi channel where they'll ask questions related to powerbi and we have a staff uh these photos are the pictures of my team members Shani nain Kiran Etc and they help with these users queries and many times they get this repetitive questions so for that we have created this Excel file where where there is question and then there is answer so if this question comes again they will simply copy paste the response because they don't want to type every time okay so this makes sense but the problem right now that we are facing is there's a lot of manual work involved in it these questions are getting repeated again and again and again and for these people even to go to that Excel file locate the question and you know copy paste is lot of work so we want to build a solution where our website can have this kind of asked question button upon clicking that this kind of interface opens where a person can type in a question and it will give an answer okay let's say you have a JavaScript course no we don't have it so it will say you don't have the JavaScript course Mac computer can I use powerbi so it will be a smart system which will just answer these questions now let's look at some of the existing questions that we have in our CSV file for example we have these two questions and if a person has a simple question which includes both of these questions such as do you guys provide internship and do you offer Emi payments then our system will be smart that it will look into this CSV file and it will pull both the answers and it will answer it in a single line in a very coherent fashion so see yes we provide internship and we don't offer Emi payments let's look at this one so these are another two questions that I have picked and uh student can have this question what is course validity and how do I get doubt clearing now look at this answer this answer is not a simple concatination of these two individual answers that we have in our CSV file it is a very coherent answer as if some human is typing that answer so you can pause this video and read it so this system is pretty good pretty intelligent system and if you have a question which is not present in the CSV file then the system will know that this question is not present so it will just say I don't know we will fine tune our llm so that we'll tell that if you don't find answer in the CSC file just say I don't know all right let's look at technical architecture now if you're working on any llm project the concept that you must know before you begin is Vector database for which I have created a simple six minute video where I explain Vector database in a very very easy language so pause this video right now and go watch that video because that video will be very useful in this particular project I also created another end to endend project with lot of details of Lang chain so if you have time again I know this one is a long video but in this video I have covered all the Lang chain Concepts in know pretty much uh in detail so but in this project I will probably not go too much in detail to explain those Concepts therefore if you can watch this then that will be useful as well now coming back to our project architecture we have this question answer pairs okay so first thing that we need to do is we need to create embedding for these now if you don't know what is text embedding sentence embedding again go to YouTube type code Basics embedding you'll find I think three or four videos where I have explained embeddings in a very very easy language embeddings is nothing but a numeric representation of our text and then we stored these Bing into a vector database after that a person can come to our q&s system and ask a question such as what is course validity and how do I get doubt clearing this question we will first convert into embedding so we'll create a numeric representation of this question so I have not shown here in this presentation but just assume that we have converted this question into another Vector okay and then we try to look for similar vectors so Vector database have will have so many vectors related to our CSV file question answer pair and we are trying to find similar looking question answers okay so for my given question what could be my possible answers okay so that that's what we try to look for and then we will find those similar embeddings and let's say for this question these two question answer pairs are relevant therefore see it pulled those vectors seeus 6.77 0.23 it pulled those vectors and we'll convert these vectors into the original sentence and then we can form a prompt we can say I have some question answer based on below text Chunk okay and the below text Chunk will be the possible answers that we have in our CSV file and when we give this prompt to our llm llm will produce a coherent nice looking human readable answer now for this project to load CSV file we will use CSV loader class of Lang chain for embedding we are going to use hugging face especially instructor embeddings phas is something that we even used in the past project for a vector database you can use chroma Pine con variety of vector databases and then a retrieval QA class of Lang chain is something that will be used to pull the similar looking embedding vectors and to form a prompt and for llm we are using Google Palm now for Lang chain Basics I have a lang chain crash course where I covered all the basics so if you are totally new to Lang chain I will say pause this video go ahead and watch this crash course first and then if you want to go into the detail and Implement a project in depth then this project will be perfect okay now let's look at Google Palm so what is Google Palm you all have used chat GPT behind chat GPT there is a model called gp4 so gp4 is the llm behind CH GPT and the company which build gp4 and obviously CH GPD is open a similar to open Ai and gp4 there are two other popular llms in the market Lama by meta and palm by Google and both of these are free whereas open AI uh gp4 the the see g chat GPT is free but then uh if you want to build an application and call openi Api to access gp4 that's a paid API whereas Google Palm is very similar to open where you will have API key You Will Call some you know server and it will be free whereas meta is little uh I would say it's little complicated because you have to download a big model okay so you have to download a huge model model either in your Cloud infrastructure or locally on your computer and then you have to do inference so therefore we are using pal because it's easy to use so let's generate uh an API key for Google Palm model now we'll set up API key for Google Palm model Now using your Google account credential log to Makers suit. goole.com and go to get API key here you can create API key in your existing Google Cloud project or you can pick a new project so I clicked here and let me just click uh and just get a new API key and when you do that you will get this API key generated which you can simply copy and then we will use this in our Jupiter notebook but before we go there you can take a look at some of the uh sandbox prompts that they have for example text prompt so here you can type anything and then you can just say run so let me just summarize a paragraph I want to summarize this paragraph I'll run it and it will show me this particular output and this is similar to what you will get when you're using Bard okay so Bard if you know it's similar to chat GPT but it is Google's equivalent to chat GPT and it is powered by Google Palm model so let's say I want to request uh refund want to write an email uh bar will produce certain output and you can uh taste similar thing using this test pad here it is using text bison model in the future they may create new models uh you can just play with uh various things here by the way when you're creating an API key uh you will get a link to API quick starter guide and here you can call Google pal model using Google's own python module which is called Google generative AI which you can install using peep install and you can direct L use palm. generate text and it will work so this is like a native Google's API but we are not going to use this module we are going to use Lang chain so go ahead install Lang chain if you have not installed it already if you want to install all the modules which are required for this project you can open this requirement. txt file see this txt file has all these modules which will be using in our project and you can go to this folder and simply run p install r hyph r requirement. txt and it will install all the modules by the way I'm giving all this code check in video description you'll be able to download all this code okay it has this faq.com text file and from this place now I'm going to import Google prom so you'll say from Lang chain. llm import Google pal and then llm object is equal to Google pal and here you need to supply Google API key and this API key is what I created you can create multiple API keys by the way you can create API key per project per you know team organization things like that and API here and there is a temperature parameter which controls the creativity so it is between 0 and one if I say let's say 0.9 it will be very creative if I say 0.1 it will be a little less creative okay so I I don't know what parameters to supply so I'll just give something to it and then I will give some sample prompt okay so what do you want to give all right I love this food Samosa and I will ask it to write a poem for my love of Samosa and then I will print that poem see it's a generative model so it will give this particular output you can also write an email requesting refund you can play with it okay it is as good as you typing a prompt in uh Google bard now that our llm is set up let's import that CSV file uh into our jupyter notebook you can import CSV loader from Lang chen. document loaders I'm just going to import the CSV loader class okay so CSV loer if you hit tab it will autocomplete and then I will create an object of this class and here you can provide the file path our file name is code Basics I think our file name is this and uh you can also specify the source column so when you are doing retrieval later on if you want to know which source column it was referring to then you can use this particular column now if you look at our CSV I'm calling the question a prompt I mean you can call it question to it's just a column name so whatever is a column name here I will provide that here okay so it is called prompt and then this will be my loader object and I will say loader. load it is very intuitive and that will go to data okay control enter and data is loaded I mean you can of course see the data it's simple see each question answer pair is one document object which has page content and it has metadata where you are providing a source column okay so now let me move on to the next step so if you look at our architecture the next step is to create embedding once your data is loaded okay all these CSV data is loaded let's create embedding now there are 100 different ways of create creating embedding you can use open a embedding that is the best one in the market but you have to pay money for it so then you can use Google Palm embedding you can you can Google search Lang chain embeddings and if you go to the documentation here you can click on this integration these are all the embeddings you have available Bedrock you have embas so many you know hugging face instructor embedding my friend works for a company as a data scientist and they use instructor embedding because they're kind of they're really good so if you do Google search on instructor embedding you'll find their page they have mentioned their methodology uh different benchmarks and such uh we are going to use instructor embedding here because I tried few and I found instructor embedding uh performance uh to be pretty good so I will import that from Lang chain again see from Lang chain. embedding hugging face instructor embedding and you can create embedding let me just create a sample embedding and just kind of test it out you can also give a model Name by the way so model that we're giving is the the large I think if you don't give it maybe it's a default one so if you do shift tab uh yeah if if you don't give it it's it's a default one anyway so so it's okay if you don't give it and then in the embedding you can say embed query okay so let's say you have a question uh what is your refund policy for your courses okay it will create an embedding for that and store it in this e object so you can hit control enter and then if you look at e By the way it e is is simply it's a list okay and if you look at the length of the list 768 and if you view these numbers it won't make sense I mean just bunch of random numbers I mean you can't make out anything when you're looking at it but they're pretty powerful it represents the meaning of this sentence very accurately and if you want to try a few sentence and uh you know compare them using cosign similarity uh you can use this documentation so I just said hugging phase instructor embedding and I landed on this page and here see you can have sentence a sentence B and you can find the cosine similarity if sentences two sentences meaning are similar the cosine similarity will be close to one and if they're irrelevant the cosine similarity will be close to zero I think that's it it has that range so you can try it out I'm not going to worry about it I'm going to use the default embedding default even default query I don't want to give any specific query the way instructor embedding works by the way is you give instruction for the embedding so if you look at this sentence you have a sentence that you want to create ambing for you will give instruction that represent this sentence as a science title and that is the power of instructor embedding you can give different type of instruction and it will represent it in a different way and folks if you still confused about what is embedding go to YouTube search code Basics embedding it you will find a very intuitive video so make sure your your concept of embedding is clear now I have embeddings my next step is to create uh the vector database and I'm going to use phase you can use chroma like other databases as well by the way if you want to use chroma you will just say this okay but I'm I'm using f face so I will say pH do from documents when you do that uh you need to specify documents here which is my docs you know this guy and then I need to provide the embedding so which is my embedding well instruction we'll call it instructor embedding so that it's kind of clear okay and that will be my Vector database symol okay it's going to take time so go for a coffee break it will come back after some time once Vector database is created you will see you can save it by saying I think I think there is a persist method or save local there is a method called savecore local but we don't want to save it we'll just use it in the in memory uh and you need to get a retriever object object so what is retriever object let me explain that so I'm creating this retriever object from Vector database and the job of this object is whenever you have a question new question it will convert see we'll create embedding of that question and then we will pull the similar looking vectors from a vector database see this so retriever object is doing this process so it takes a input question it Compares its embedding with the embedding which are stored in a vector database and it returns these embeddings and and eventually these sentences actually uh so let's see how that works so here uh if I look at the documentation of the retriever object let's see where is the documentation yes see you can use this method get relevant documents so I'm going to use that so I will say retriever do get relevant documents and I will say okay okay uh for how long is this course valid sometimes people have this question okay what is the validity one year lifetime what is it and I will store that into our docs these are relevant docs and look at this once purchase this course is available for Lifetime it is so beautiful if you look at the exit words they are very different but this is not searching based on the words this is searching based on the meaning of those words so as a human if I'm looking at it how long is this course valid I will naturally go to my CSV file and pull the answer of this question is this available for Lifetime okay you can type in different question by the way uh you can say do you or let's say what is the other question you can type in different question folks okay uh go to the CSV file and just make some question you know like uh for example job assistance so here uh how about job placement support it wordwise it's very different and see do you provide any job assistance I said job placement support and he said do you provide job it's pulling that relevant question which I have it is pretty powerful I have imported this retrieval QA class and that class is going to perform the last step which is you have relevant documents now you will form this prompt and ask it to llm so let's create the object of this class so the first parameter is obviously llm and we have already created llm object see llm okay so I'm going to pass that llm and then there are a bunch of other parameters which I'm going to pass so I'll just copy paste just to save time and we'll explain you what they are are so here the first one is chain type which can be stuff or let's say map reduce in that other video which I have for Rocky boat I have given a detailed explanation so if you're interested you can go watch it uh but there are two types of chain basically stuff and map reduce we going to use the the stuff one retriever is something we already looked at it is this particular object then the input key query I will explain what that is and return Source document is when you get an answer you also want to return the source document from that CSV file okay which are which which were relevant to this answer and here you will say chain I explained about chains in the Lang chain crash course so once again if you not seen it go ahead and see that one now here you have retrieval QA let's see okay there was some typo I think and now we are ready to ask a question so the first question I'm going to ask it's actually two question question which I'll ask in one sh which is okay you have internship or do you provide a um Emi payments and see it answered correctly yes we provide virtual internship and no we don't offer Emi statement and if you look at our CSV file right here see these are the two questions from which it pulled that answer and it also it is also showing the source documents see do we provide virtual internship do you have uh job assistance and do you provide virtual internship and do you provide um job cting so it will it will pull the relevant ones and it will answer the questions you can say okay should I learn power bi tableu or let's say any question okay so let me ask uh this particular question so again I have it ready I'm just going to copy paste should I learn power we tableau now if you look at our CSV file uh so tab blue our answer is little different actually see uh let's say okay so control F W actually I want to give this particular answer and looks like it generated answer on its own based on its own knowledge so I want to say that look at my CSV file and if the answer is present there pull the answer from that don't try to make things up okay the other problem that you will see with this is a concept of hallucination so it is already hallucinating okay see do you have a JavaScript course now this question is not present in CSV file at all and now it is saying no we do not have well how can you say that we don't know the the appropriate answer will be I don't know because the CSV file which I provided it it doesn't have any mention of JavaScript course maybe I have it maybe I don't have it and I have seen that if you execute again sometimes it says yes we have a JavaScript course see now it is saying something else so it just keeps on saying random things which is bad so I want to control that behavior it is called hallucination it will try to make things up on its own I want to tell it that don't try to make things up you use CSV file and if you don't find answer there just say don't know I don't know okay and in order to do that you have to create something called a prompt template and I covered prompt template in detail in that other video where you are essentially giving a prompt you know specific instruction that given the following context generate answer based on the context only and if the answer is not found kindly State I don't know do not try to make things up and what is the context and what is the question well it is this see this is the question in this diagram this diagram is very useful this is the question and this is the context the relevant documents that you pulling is a context and this particular thing is a question and you can provide this uh prompt template to uh an a class called prompt template and you will get this particular prompt and then here you will provide one extra argument called chain whatever and there you will provide this prompt I don't want to remember the syntax so I just copied it and now when you say do you have JavaScript course okay I think I have seen by the way that this Google Palm model sometimes just doesn't perform really well okay now now it is uh performing well so I don't know okay overall open AI models are much better but Google Palm since it is free and maybe they will improve over a period of time uh so majority of the time this still work okay all right so see I don't know you can say do you know double Sage it should say no I don't know see it's not hallucinating you can say do do you have a do you have a plan of launching blockchain course I'm asking questions which are not present in that CSV file see sometimes again this is hallucinating it's in future I don't know okay okay I mean it's doing sentence completing that's why you see see it's doing sentence completion you see how language model works that's why you see this question mark here if I move question mark here it will say just I don't know okay it's I think it's making sense now um just try different sentences folks I'm going to provide this uh notebook and you will see that it will try to pull the answer from the CSV file sometimes as I said since this model is not as great as openi sometimes it will not perform well but majority of the time it does a pretty good job so far we wrote the experimental code in The Notebook now we are going to put a proper project structure and prepare ourself for productionizing this thing as well as we'll write the streamlit UI code I created this directory called code basic Q&A where I have kept the CSV file the jupyter notebook and the requirement. txt I'm going to launch now py charm which is a free Community Edition version software basically python editor by jet brain and when you launch it you will get an option to open a folder right now I already have another project so it is showing me this so I will go open and I will open from C code directory I will open the folder code Basics Q&A I will say okay new window and in that it is asking me to configure the virtual environment I'll say I don't want virtual environment by the way if you want virtual environment you can use that as well but I'm just going to use the plain default environment here and here you can see the requirement. txt file we have installed all the files uh you can see CSV file and the jupter notebook now I'm going to create a main python file so I will just call it main.py and we'll write our code here so now we are starting a business of copy pasting we have our notebook ready so in that notebook let's see this is our notebook correct we will just copy paste things one by one so here the first step is this one right and and you seeing this error by the way let me configure the python interpreter so I have 3.10 so I have configur that and API key is something you don't keep it in your M main code you can keep that into your environment file that is a standard industry practice so the practice is you create a new file you call it EnV and you create Google API key you create this kind of variable and you keep your API key there okay that's that's the standard practice and to load this environment variable you will use a module called dot environment okay so you're importing this load. environment variable and you will call this function what this will do is it will go to en EnV file it will read all the key and value Pairs and it will set these things as a operating system environment variable so that you can access that later on using this see let me show you so now when I say os. [Music] environment this it will actually get that particular key and this is the key we can set here directly so that you don't have to hardcode it here okay I hope you understood what I just did this is a proper way of doing it and I will keep the temperature variable to be 0.1 or maybe zero because I don't want this thing to be very creative and kind of produce its own answer it should do whatever we are asking it to do basically read CSV file and provide the answers okay what is the next step next step is to create embeddings and Vector database so let me copy paste that part here now here I think we missed this one loading the CSV file okay and I will move all of these Imports at the very beginning okay this looks good instructor embedding docs okay what is docs it is not docs it's data or maybe docs we can call it docs okay so I will call this do Bo and folks what I I will do is I will not use Vector DB in memory because you already saw that creating Vector database which means executing this cell takes around 1 minute time I don't want to execute it every time when I launch my streamlit UI therefore I will save this Vector database to a disk okay to a file and for that reason I need to create a function called create Vector DB so the goal of this function will be create a vector database and serialize it to uh disk so that the database is stored sort of like in a in a file system okay so the vector database creation I think this loader also is something we can move here and instructor embedding I will keep it outside because I will need need to use that in different function as well so let me keep this thing outside and you can specify a model name or I think it's okay if you don't specify model name and once the vector database is created you can save it to a file there is a function called save local which will save it to a local file and that file will be phase index it will actually create a directory okay and later on I'll have to use this file file path so I will just say file path or database path okay so Vector DB Vector DB file path is this and just use that here so this will just create the vector database and save it to a disk you can tast this by the way if name is equal to underscore uncore main then let's taste uh this much part first okay so right click run it is going to take over by the way because we know that creating this Vector database is timeconsuming process so go for a coffee break come back and hopefully it should be created all right we see a directory called face index here so see this is my Vector database basically so my Vector database is created now I can WR write rest of the code and the rest of the code for that I will give a function name as get QA chain get question and answer chain and here once again copy paste business so whatever code we wrote just let me copy paste it from The Notebook okay so see here you are you are loading the vector database from that file okay so this is this this line is new so we are calling face do load local using that Vector database file path we loading it and the second argument that it expects is the embedding that you created so that that that is the reason now you understand that embedding was created outside and then retriever prom template this is all same we need to import prom template by the way so let me import it here so import and retrieval QA also so retrial QA I need to import there was like a red line showing the error and then you will just return the chain so what happens now is when you say get QA chain you get that chain and then to that chain you can ask any question such as do you provide internship do you have Emi option correct and you can print that on console and you can taste this whole thing out by right clicking and running it the benefit of creating database Vector database in advance is that you don't have to create it now every time because it's a timeconsuming process right so Vector datab is something that you do one time and then for question answering you don't you just use it and whenever my CSV file gets updated that is the time I will uh create that database again so you can see that it is providing you an answer yes we provide virtual internship no we don't have an Emi option you can try different questions it's working okay now is the time to create stimulate UI I can write my stimulate UI code right here but I like to separate UI code from rest of the code for that reason I will create a new file called Lang chain helper Lang chain Helper and I will move all of this code see crl a crl xrl V all of this code goes into Lang chain Helper and here comes the UI code so for U UI I have already installed streamlet you can import it uh so you can say import streamlet as St st. title what is title okay code Basics UA I will include a little icon so I will say plant okay and then there will be a text input for the question that a person will be asking and that question you can take it here and when they hit enter it will come here the flow the flow will come here and below that code Basics QA button you want you want to have a button basically which says uh create a knowledge base create knowledge base so this is basically creating your vector database now okay let me just run and show it to you first so that you get an idea and if someone presses presses is that button something happens we'll not we'll we'll look into that later but but let's create a skeleton code at least and you can run that code from here by saying streamlet run main.py and it will launch the UI in a browser see it will look beautiful now when you click a button or type in any question nothing happens nothing happens because this is just a skeleton now this create knowledge based button should be under some kind of admin privilege under some kind of role based privileging system so that only the admin or the data scientist have access to this the normal website user they will not have access to this when they click click on that ask question button there will be a new button here when they click on that new button which has a text ask question it can show this UI but this button should not be available to normal user they should be just able to ask a question but this is when a data scientist want to create a new Vector database let's say we get new questions and questions now we need to recreate a database right for the new questions in our CSV file that is when this button will be useful okay now from langin helper I'm going to import both the function that we wrote right like so create Vector database and get qain let us taste get Q chain function so I will say chain is equal to get Q chain and then uh to this quick chain we will ask a question and we will get some response and response result will be the actual result so I will say s do header okay basically this is the answer you're getting and you can say St do right this is just a syntax of our streamlet this is how you put an answer below the question now one good thing about stimate is that you don't have to okay you don't have to rerun that command you can just click on rerun here and it will rerun that person is running which means it's going to take some time to load now let me put a question hooray it is ready say it's ready and working let's try few other questions see it's it's really working now and producing all the answers folks we are done with our coding all of this code is available by the way uh check video description you can download everything and practice it this particular project is going to look very good on your resume if you want to build a career in Genera Ai and NLP and want to become data scientist we put lot of afford in creating this video and project so if you like it you can give it a thumbs up and share it with your friends I mean that is how you can pay back to uh for our work also remember this entire project is on a real industry use case Cod basic. is a real e-learning company we have 180,000 students including free courses and paid courses for paid courses we have around 20,000 students the website gets a lot of heavy traffic and our Discord Community has like what 40,000 people so it's a busling community and the problem that we try to solve is a real problem okay so in the future I will take this code and maybe integrated on my website for now it is created just for your learning purpose but remember once again it's a real project if you have any question Post in the comment box below thank you very much for watching [Music]
Info
Channel: codebasics
Views: 28,098
Rating: undefined out of 5
Keywords: yt:cc=on, llm, llm project, end to end projects, langchain tutorial, end to end project, lang chain, langchain, streamlit, llm project ideas, google palm, chatgpt, PALM, streamlit project, QnA system, question answer system, QnA sytem using LLM
Id: AjQPRomyd-k
Channel Id: undefined
Length: 44min 0sec (2640 seconds)
Published: Fri Oct 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.