Question Answer ChatBot using LLama-Index & LangChain with Hugging Face Models - Part 1

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey what's up coders welcome to one little coder since I published the first tutorial on how to use Lang chain and GPT index to create your own q a bot a lot of people have been asking can I do it only using open source model because your organization might have some restrictions you may not want to pay money to open AI to create the impedance or index or use the inference model so in this video I am going to show you how to create your own q a bot only using open source Technologies we are only going to use the models that are hosted on hugging face this gives you the flexibility to run the model wherever you want not send the data anywhere running everything completely on local I mean it requires GPU but still if you have got your own GPU you can run this locally without sending the data to let's say some cloud provider or APA and that's exactly what we are going to do in this video I'm going to take you through first show you the demo of how it looks like and then I'm going to take you through the step by step this is a very bare minimum code this code does not have a UI this code does not have a lot of sophistication it's a very bare minimum code but I wanted to get this code out so that you can build the applications that you want on top of this so you're not you know you you're not waiting for me to give you the best one so this is not the best code if there is an interest I might build a UI grade your UI on top of this let me know in the comments if that is what you want but today we can use open source large language model particularly in this case we are using flan from Google to build our own q a bot completely using open source Technologies let me quickly show you the Demo First for example I've got this document this entire document from you loser AI so you looter AI recently announced that they're going to become non-profit or they are already non-profit that but there was an announcement this blog post this entire post I took it and then I gave it to my bot the bot that I'm trying to build and I can ask questions so for now ignore this error this error is there no ignore this error so I'm going to say what is this document about it says it's about illusory irrespective who who wrote this document and I can send this message I it does like there are multiple people who have written this document so I got one corner so I can go here and then say corner is one of the person so it doesn't give me all the names but it works so we are going to learn how to build this bot within Google cool app using open source models first what is the model that we are using the model that we're going to use today is flan T5 it's one of the most underrated language models like honestly speaking this is one of the most underrated large language models today that is currently available so we are going to use flan T5 base we're going to use plan T5 base you can always improve whatever model you want depending upon the ram that you have got you can use the latest flan model as well which is huge in size but you need enough RAM for you to process it so this is the model that we are going to use and we are going to use certain libraries for us to do this thing first we need GPU so go to your Google collab notebook click runtime select change runtime and select GPU Hardware accelerator GPU is essential because we are using large language model so use GPU here after you do GPU here just do Nvidia SMI to check if you have enough GPU in place yes we have got Tesla D4 machine and it has got 16 gig ram so we are good next we are going to install a bunch of libraries we have to install Lang chain we're going to install GPT index GPT index is new name is lamba index so we have got Transformers and we have got sentence Transformers so these are the libraries that we need to install and as you might have noticed all these are open source libraries and we are also going to use open source models for do the inference next once we have installed all the required libraries we need to import certain things we're going to use lamba index which is the latest name for GPT index to import certain things like a directory reader long chain embedding list to build a list index or if you want simple Vector index and also if you want to use a prompt helper now I am not using a directory directory reader in this code and also I'm not um I'm not reading from a directory I'm reading from a text file and converting into a document so you would see me not using this in the code but this would be quite helpful if you want to extend this code for a bunch of files in a directory then we are going to use hugging phase embeddings from long chain so that's what we are doing if you have seen my previous tutorial we have in that case we had used open AI embedding but here we are using hugging face embedding the next thing is we are using llama index llm predictor from lamb index llm predictor it's same I could have you imported it there but I wanted to just keep it separate then torch and we have a launch an alarm space input llm and hugging phase Transformers pipeline once we have this in place we're going to create a simple class this part of code I got it from a different Google collab notebook which I've created here so this uh you can refer that the original one also so we are going to create a simple class we call it flan llm and this class has got certain things so we are going to define the model here and we are going to use the Transformers pipeline if you are familiar with the Transformers ecosystem you know that they have something called pipeline that helps you create pick NLP Vibrance or ml by plants quite easily so here we are saying we want text to text generation one of the reason why we ended up using flanners you know that we have models like opt we have models like gpt2 which are quite good in text generation but those are not instruct fine-tuned one of the reason why charge apt or text DaVinci over 3 model has been doing very well is because they are not a raw language model it's a language model that is instant fine-tuned also with RH reinforcement learning human feedback but for now we have got flan which is an instruct fine-tuned model you can test the same thing with opt or gpt2 or any other large language model that is primarily focused on text generation but I have decided to go ahead with flan T5 base one I wanted to fit it within Google collaboram two I wanted an instant fine tune model and that's why we are using this then build the pipeline after you build the pipeline then you can you know create very basic certain methods here and then we call the we we call the llm predictor so we call the llm predictor which we imported from Lama index with the model that we just have created which is the flan llm the class we just created so we are saying this is the llm that we want to use for llm predictor when we are building a q a bot there are two aspects of it one is how do you combine the existing text or any document and then extract Knowledge from it or store it and then how do you use the question that the user is asking and then retrieve information from it so there are two aspects one the storage or how do you store the knowledge from this to when user asks the question how do you take the question take this and then respond back so the llm predictor is primarily used for the user side user asks the question a large language model is going to process the question then look for the answer then give the answer so that's where we are using flan llm so this is mostly on the inference side of the thing now there is the storage aspect right like we talked about the storage aspect how do you extract Knowledge from this text and store it and that's where we are going to use hugging face embeddings I think it internally uses sentence Transformers that's one of the reason why we install sentence Transformer so we have hugging face embedding here we have hugging face membrane and then we are calling it with the land chain embedding so now our embedding part is ready Define and their prediction the inference part is ready now is the time for us to give the input text right now like I said I've literally pasted copy pasted the text from the blog post and pasted it here but you have lot of different ways to load the document you can load from PDF you can load from text you can load from website you can even load like this what I am doing just a pure string I have pasted the entire uh I've pasted the entire text here and I'm calling it text one now I'm going to build something called a document so from my text string or list of strings I am building something called a document and that document is going to be available inside documents this is from GPT index or lava index if you have a directory and if you have a directory full of text files then you can uncomment this code and then use it so here you can Define where the directory path is and then you can build the document like that both these will give you the same result at the end it's just the medium in which you have got the input text or the knowledge base that differs so in my case I've got the document in text so I'm building a document and from that I'm building these documents but if you have got a folder full of text files you can uncomment this code and then start using it next I'm not going to use this at all totally this was created for a different one so I'm just commenting this but if you want better result this is the place where you are going to optimize you're going to set the number of output the input size the chunk overlap maybe we can set it and you are going to build a prompt helper that prompt helper will help you how the prompt should look like this is to avoid uh you know going out of context and having a good prom structure so prompt helper so right now I didn't use prompt helper let me add it prompt underscore helper is equal to prompt underscore helper yeah so ideally ideally you can use that here so now this is how you use prompt helper not mandatory it helps you next is the place where we are going to build the index what is the index so we are going to take the knowledge from the existing ticks in our case the text one and then store it in such a way that it is easier for us to retrieve the knowledge from this in the future and you can build different types of index so like you you can see here in my code I have also experimented with GPT simple Vector index GPT list index every index has some advantage and disadvantage GPT list index is something that I find it very basic so I I used it here but yeah you can also experiment with simple Vector index so this takes the document as an input the embedding model like what is the embedding model that you're using the llm predictor and prompt helper if it is required say run this you can also save the index locally or save the index so that next time you don't have to do this entire process all you have to do is querying back but for now I'm not saving it because I'm just doing it for a demo and it was throwing a lot of error so I've I've suppressed the errors here like warning messages not not a good thing if you are protectionizing it but just for the demo next is after you have built the index index.query and then you can ask your question whatever question this is about and you get the response and print the response let me ask the question who wrote instead of who wrote this document what is this document about and let's see the response is trying to get the response like last time I did not have prompt helper so I don't know if that is why it is taking time let's see showing some warning you lose their AI Institute okay so it says it's about illusory Institute I might even um it's giving me a warning that we are using pipeline in SQL sequence and that's that's fine so it says sequential on GPU so it doesn't help that's okay so let's ask another question is the what is what is the most important point in this document and let's send it it's going to take some time and also when I showed the demo at the start I did not have prompt helper so maybe we can disable prompt Helper and then see how that changes the speed and response that is that is an important thing also like speed is sometimes important because if you are asking for a question and then you are waiting for a long time you can get a response okay it didn't work so let me disable the prompt helper here and I'm going to ask the same question let's see if I'm going to get any response I'm not very sure because we are using the same model okay it says good this is good like uh maybe if you are going to use prompt helper you need to play with this parameters so it says what is the most important Point here that's good so I can say can you summarize this doc I mean okay eluters A a second retrospective is out this is the summarization quite bad but it works so as you can see you cannot literally compare this with open AI embeddings because they are still the leader in this space but for an open source Model A lot of questions that I asked seem to really do well so now what I'm going to show you is I'm going to show you some random article where we are going to copy text and then paste it and then in real time we are going to see how the entire thing changes so let me get an article so it talks about high resolution image construction with the latent diffusion model from human brain activity I'm going to copy this entire thing okay it's a small text for now I'm going to copy until this go back to my model I'm going to paste this text here as you can see I've used if string because I've got codes from probably inside I'm going to run the same thing again this is not required this one and we have built the index so I'm going to ask can you summarize this document now it says we propose a new method based on a diffusion model to reconstruct that's a good summarization who are who are the authors let's see if it can give the authored names or name at least it didn't give the names okay who wrote this document also the type of question that you ask that's where again gpt3 makes a difference okay it keeps on saying the same thing um what model is this using stable division that's that's good where are they testing this where are they testing this okay they are testing it on fmri cool so this is exactly what I wanted to build completely using open source Technologies and the data doesn't go anywhere in the cloud it all resides within your own environment so we have learned how to use large language models hosted on hugging phase model Hub to build a q a bot using GPT index or Lama index Lang chain and also of course hugging face Transformers I hope this video was helpful to you in in learning something new which is completely occupied by the open AI world but we have used open source ecosystem for the same thing if you have any questions let me know in the comments I'm thinking of building a UI or you know optimize this code how you can upload a PDF and do it if that is something that you are interested in please let me know in the comment section that will help me prioritize the next video the entire code will be linked in the YouTube description I'll put it in my GitHub and then link it in the YouTube description so you can right away get started with this and then start using it I hope this was helpful to you see you in another video Happy prompting

Info

Channel: 1littlecoder

Views: 36,879

Rating: undefined out of 5

Keywords:

Id: 9TxEQQyv9cE

Channel Id: undefined

Length: 15min 46sec (946 seconds)

Published: Wed Mar 08 2023