Falcon-7B-Instruct LLM with LangChain Tutorial

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

one of the most powerful open source model that is available today is Falcon so in this video we're going to combine Falcon and Lang chain because Lang chain helps you do a lot of things with llms and mostly people have been using Lang chain with proprietary llms like open AI so what I'm trying to do here is to tell you how you can use Lang chain with one of the most powerful open source models Falcon to do basic text generation in this video we are not going to cover something like an embedding or chart with PDF but we're going to just look at simple line chain connection with the open source Alum Falcon and also do certain text generation using prompt templates from Lang chain but we can develop this code the basic code that I'm going to share with you to do something like chat with PDF and all these kind of things later on but to start with one of the happiest news is that this entire code actually runs on a free Google collab notebook which means we're going to find out that the memory as you can see here the entire GPU memory is almost used and one of the reason I'm showing you this is because thank goodness that it can actually work on free Google collab so you don't have to spin up another powerful GPU for this so good the T4 machine with the 16 gig RAM is actually quite fine for this Falcon model 7 billion parameter model to work I'm not going to get into the detail of Falcon model in itself we have led 2 videos one explaining the Falcon announcement second one how to use the Falcon 7 billion model on Google collab notebook so I would strongly encourage you to check out both the videos this is going to be purely a Hands-On coding tutorial about using Falcon with land chain to start with this is a GPU environment so go to runtime Click Change runtime and you can see that we are on Python 3 GPU Hardware accelerator and T4 machine if you have got collab Pro you can select powerful gpus but for our use case a T4 machine is completely fine now do a pip install Transformers of course to use a hugging face Transformers to get the models anops is a dependency accelerator rate is for faster acceleration it also helps you manage resources in such a way that some bit of GPU memory would be used some bit of CPU members use and a lot of other advantages long chain of course land chain doesn't need any introduction bits and bytes I was exploring how to load the int 8 model or you know the quantized model to use with Falcon but you don't need this for particularly this tutorial you don't need this after you make all the installation the next one is to just look at the GPU configuration this is a Tesla T4 machine and we have got like 15 gig approximately RAM and that is what we're going to use so this is a decent enough GPU for us to use this this is from the free Google collab the next thing is we need to import certain libraries that we want to use from Lang chain input hugging phase pipeline because we're going to use pipeline from Transformers Import Auto tokenizer and also Pipeline and import torch again not required for this particular piece uh the only thing is for the data type but you can import so these are the three Imports that are required we're not going to do anything else with the pytots here so these are the three Imports once again Lang chain is where we're going to use the llm connection to build AI applications Transformers is a library from hugging phase that helps us connect with open source models that are available in hugging face modeler I and the pipeline is the easiest way to build any NLP task using hugging phase Transformers now specify the model here I have mentioned the 40 billion models link right now I'm using on a very smaller GPU with a limited memory and that's why I'm using the 7 billion instruct model if you want to use or if you have a powerful enough GPU like a100 or even bigger than that then you can try this model but for now we are going to stick with 7 billion parameter model because that's what works on the free Google collab environment the model is specified it's just simply a string nothing more than that but from that string we are going to get the tokenizer first and then we are going to build a pipeline so these are certain parameters that like the mandatory parameters are you have to specify the task of the pipeline what is the task so for example do you want to do text classification do you want to do text generation do you want to do let's say audio classification all those kind of things go inside this pipeline now what is the model the model is something that we have already specified here and what is the tokenizer the tokenizer is something that we have already downloaded and are you trying to use it in let's say fp16 so this is where you specify the data type in and itself and because this model is not yet part of the main core Transformer so you need to enable trust remote code is equal to true and device map is equal to Auto as what I mentioned that sometimes it tries to Mac match and manage the memory between CPU and GPU memory as you can see we have got a GP CPU Ram of 12 gig and the GPU Ram of 15 gig so device map Auto helps you do that and that comes from accelerate and then these are certain things like you know how much is the output token that you want to specify the limit so these are certain things that you can give here or also I'll show you another place where you can give so it's if you want to give as part of pipeline well and good but if you also want to you must be kidding me they just disconnected but for this tutorial I don't have to rerun any code so this is completely fine so this is something that you can give here but also another place which I'll just quickly show you after you run this thing it's going to take a bit of time because it downloads the model and does everything so after this is done now is the place where we are going to specify the llm for long chain so that comes from hugging phase pipeline which we just imported from blank chain and inside the hugging phase pipeline we are going to just send the entire pipeline that we just created with model quarks so model quarks is where you are going to give the model related parameters like temperature is zero and all that kind of details that this information the max length and all the top K all the model parameters can also go inside this so you can give model coils here but also if you want to give you can give here so the advantage if you ask me what is the advantage the advantage is this is like more abstracted it's easier for you to play with this if if you give this thing it's part of the model downloading process so any change that you have to make here you have to redo the entire thing again so it's it's a it's not very healthy so that's why I mentioned the temperature 0 here so temperature 0 is just to make sure that the model doesn't hallucinate much pipeline pipeline model quarks temperature and all the details now I'm going to give that main thing from long chain import prompt template so one of the thing is you can directly go here and just give the question like for example you can give the question what is the capital of India something like this you can just directly do this but one of the reason why people use models like Lang chain is to do beyond that so that's why I'm not covering this you can do that as well but from long chain import prompt template and input llm chain now the prompt template is where you actually set the context and the template about how that large language model should handle this particular case so for example I've said you are an intelligent chatbot help the following question with brilliant answers the question is this answer is this so you can say like this or you can say this is user and it answer C you can play with these things but I just wanted to tell you the prompt template is something where you can play with like how do you want to give the input response how do you want to get the output response and all these kind of things and once you have the prompt template which is just a simple F string the next thing a DOT sorry dog string the next thing is you give the prompt and inside it get it from prompt template the template in itself and what is that input variable that goes inside the input variable here is a question the question is an input variable and that input variable is what you're going to give through the prompted itself and that's what it's saying input variable is equation now llm chain is where you define the llm chain and the prompt is the prompt and llm is the llm typically people use open AI llm but we are using Falcon as an llm here value can 7 billion instruct model has an element here now you can ask the question the question here is that explain what is the what is artificial intelligence as a nursery rhymes and you print output and this is what the response is let me read the response a i is like the sun that shines bringing happiness and knowledge to everyone it helps us in many ways from finding answers to complex questions AI can do amazing things that we can't even imagine it AI is like a powerful tool that helps us solve any riddle it can even predict the future based on current situations and Trends aai is here to stay and will be an essential element in our lives providing a better tomorrow and even an even brighter future and you can see that it ends this with the user prompt because we give the context that it is an intelligent chatbot that's why it's trying to do like this but this is the entire gist of how you connect Falcon 7 billion instruct model with Lang chain using launch hugging face pipeline now the entire thing is happening in free collab and you can just keep on changing the question you can have multiple questions and Lang chain can help you deal with different um different answers and another thing as I said we are using long chain primarily because we want to play with a lot of different things we want to connect with a lot of different agents so use the prompt template to check how to use different prompt template to get the best out of this llm let me know in the comment section if you found out anything interesting from the template prompt template in itself but otherwise this code is completely open source I'm going to share this Google collab notebook in the YouTube description so all you have to do is go to the YouTube description just below the like button click the notebook come here just make sure that first you connect here and then just select run all that will run everything you don't have to touch anything and after you run everything come here play with the prompt template make sure you doesn't delete the question because this is the part from where the question gets fit in and then play with the question here that should ideally give you whatever that you want to play with or whatever you want to use the Falcon 7 billion param Falcon 7 billion model with launching my next tutorial most likely is building a q a grade your application using Lang chain but meanwhile if you have any other requests feel free to let me know in the comment section share this video with your friends if you like it let me know in the comment section otherwise I hope this was helpful to you peace happy birthday

Info

Channel: 1littlecoder

Views: 23,329

Rating: undefined out of 5

Keywords: ai, machine learning, artificial intelligence

Id: mAoNANPOsd0

Channel Id: undefined

Length: 10min 45sec (645 seconds)

Published: Mon Jun 05 2023