AI Q&A with Falcon LLM on FREE Google Colab

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

UAE is Falcon 7 billion or 40 billion model has recently became completely open source no royalty nothing and in this video I'm going to use the Falcon 7 billion instruct model from hugging phase model Hub and then I'm going to show you on a free Google collab how to run this model and how to finally ask some question and get the answer before I begin with anything I'm just going to give you a very quick demo so you get an understanding about how much time it takes and all these things but just go ask a question what is the longest river in the world and I've asked this question as you can see it started spinning now that means the question has been asked and it's trying to get the answer for this the maximum length of the answer would be 200 and you know you have got certain parameters once the model inference is finished now we are going to print the answer and it comes with the question what is the longest river in the world the longest river the longest river is the Yellow River is there a river like that in fact I don't know Yellow River which stretches over 6 500 Miles 10 000 kilometers through Northern China I'm not going to verify the fact at this moment but this is exactly what we are going to do in this video which is to use the Falcon 7 billion instructor model that's another thing you need to remember it's not just the raw large language model this is a fine-tuned model on chat and instruct data set so if you want to build a chat bot this is the right model that you need to use so quickly summarizing what this UA is most famous model that you have got they've got a 40 billion model a parameter model and 7 billion parameter model and as you know if you have watched our previous video it had that royalty thing but now it's purely Apache 2.0 license and tii has waived all royalties and Commercial usage restrictions which means you can do anything you want become rich make money completely fine foreign the next thing that you need to remember is while these are the raw models they've also shared the instruct fine tune model this shared the 40 billion Instagram model and 7 billion instruction model and you can see the data set that they used to fine-tune these models so we are going to use this Falcon 7 billion instruct fine-tuned model and we are going to use it on a free version of Google collab go here click runtime Click Change runtime you can see this is a Python 3 environment the hardware accelerator is GPU and the GPU is T4 because I don't have Google collab Pro so if you have Cola Pro you can use machines like a100 V100 probably you might be able to use the 40 billion parameter model right now we're going to stick to the 7 billion parameter model first thing first you need to install all the required libraries we're going to use Transformers to get the model and also do model inference we're going to use accelerate to make it faster even in you know low memory and leverage Cuda and inops this was one of the dependencies I'm not very sure what are we going to do with this but this is one of the dependencies so install all these three models sorry all these three python libraries Transformers accelerate inops once you have these three libraries installed the next step is to import the required libraries in this case you have got the from Transformers Import Auto tokenizer and auto model for causal LM import Transformers import torch and what is like after you have made the Imports then the model that you are trying to download in this case I'm using the 7 billion instruct model but for example let's say you want to use the 40 billion instant model then you can go here just copy this link come back here and then paste this I mean without the comment you can just literally paste it inside this and that will help you use the 40 billion instruct fine tune model but I'm going to stick to 7 billion instruct fine tune model primarily because what have you got we have got a 13 gig approximately 13 gig system RAM and we have got a 15 gig system Ram sorry GPU Ram the vram and as you can see we have only hit 14 gig of it and we've got a disk space of 878 gig and we have already finished 36 gig and that's why we are sticking to the 7 billion parameter model now after you have mentioned the 7 billion parameter model load the tokenizer we're going to get the tokenizer first from the model then we're going to build the model download the model but instead of classically what we do with the models we're going to use hugging face pipeline the Transformers pipeline uh to generate the model so if you don't want to use transformers.pipeline in fact you can just import pipeline here and you can use that but it's up to you so either do this pipeline input here and use just pipeline there or you can just go ahead with creating the pipeline pipeline is one of the easiest and abstracted way of using hugging phase Transformers to certain predefined tasks like for example you want to do tick summarization quite easy you specify summarization as a task you want to do something else like classification you can say text classification the same way we're going to use this model for text generation and that's why we are using this let's say you don't want to use pipeline then you would ideally end up using Auto model for causal LM let's say you want to use chart or some other thing so but for now we're going to use this probably we're going to just stick to a q a model where you ask a question come back and reply okay we're not going to build a chart interface so what is are we what is that we are doing we are going to build a pipeline this is the task the NLP task and this is the model which has been specified at the top here and this is the tokenizer and what is the D type that we are using we are using B float 16 once again I've got enough memory if you've got an old GPU then you might need to install libraries like bits and bytes and then try to use a different data data type that will help you load a bigger model as a quantized version of the model in a low memory so that's something that you can do the next thing is trust remote code true device map Auto this helps you manage the memory between a system memory and GPU memory so now once you have successfully built the pipeline you might have hit 14 gig memory limit and the next thing that you have to do is um it it also tells me to install Transformers so that uh you know it's going to be fast I didn't sorry xformers I didn't install exformers and try it if you want to do it you can try it but I didn't do it exformers was quite popular during the stable diffusion days to run certain aspects of stable diffusion on Google collab but at that time it had a lot of overhead in installing exformers but you can always try it out yourself if you are want to try now at this point what have we got quick summary we have installed the required libraries we have loaded the required libraries we have specified the model that we want we have got the tokenizer setup we have got the Transformers Pipeline with text generation as a task with the model that we wanted to use which is Falcon 7 billion instruct model the next thing that we need to do is model inference this is this place where we are going to specify what is the prompt that we are going to use you can either give the prompt like this or you can you know keep prompt as something else like for example you can have a prompt like this and use that inside here so that's your prompt and maximum length which is to specify how how much bigger you need and do sample equals to top k equals to play with these parameters based on you want number of written sequences and also the euos token ID this is all model specified here it just sets 11 for open ended generation now to ask a simple question write a joke about Elon Musk I know some of you do not like when I write this prompt but some of you actually absolutely like whenever I ask joke about Elon Musk this is becoming like our the standard question that we ask any large language model what you say 400 character 400 let's stick to 200 that's fine write a joke about Elon Musk shift enter run this shift enter run this now it's going to create the output and store it inside sequences which we are going to finally decode like we are going to show the result so if you just want to see this sequences object sequences how does it look it's going to be a Json object which it's got generated text and it's going to be in this but we're going to finally print it so that we can see it in the Escape sequences formatted result write a joke about Elon Musk why didn't Elon Musk Payton the moon because he wanted to own the Stars again as usual with every Elon Musk joke I don't find it funny but if you find it funny please let me know in the comment section why didn't Elon Musk beat into the moon he wanted to own the Stars cool um any other question okay let's ask one more question which is heavier I mean we can say 20 kg of rice or 25 kg of sugar I mean sounds like a dumb question but let's see if the model can actually you know answer this question so which one is which is heavier 20 kg of rice or 25 kg of sugar expect the model to give me 25 kg of rice I was just like about to say 25 kg of sugar but model was wrong as you can see it did a mistake now let's try out some arithmetic questions I want to say 67 plus 53 what should it be 120 should be the answer I have a strong hunch that the model might do it wrong because charge gpt1 got released for the first time it was quite bad on arithmetics and I think they did a lot of things to do arithmetic now you might ask me and mention in the comment section hey why do you want to use a large language model for arithmetic calculations why don't you use Lang chain and connect it to a python let's say Ripple environment I know I know I know but I just wanted to try it to see if this larger oh my goodness this is so bad 67 plus 53 what is it even doing printing the table 30 plus a 21 plus 13 blah blah blah blah blah yeah it's quite bad anyways um you know what is the limitation I'm going to ask a final question right a poem about Elon Musk find Twitter employees I mean this has also kind of became one of the things that I've been trying to always ask these models to answer just to see how good they are in a creative task I mean it already gave us a joke but what would happen if we ask a poem and that's exactly what we're going to try in this final test and if this test is finished we will wind up the video there I'm going to share this Google collab Link in the YouTube description all you have to do is open the description if you want to make sure to follow the Channel Please Subscribe otherwise just like the video go to the description below the like button click the link and come back here you will see a button called connect and that just click the button call connect and go to runtime click run all and it should run ideally everything that you wanted to use I don't know why is it taking so much time just just the poem uh the poem has been running for more than 51 seconds okay write a poem about Elon Musk firing Twitter employees in a darkened office where the workers cover Elon Musk with a cold and callous Tower shoots down upon the staff like a god firing with no regard does it even understand what is firing in this case I'm not sure like a madman's toy in his wake they quiver and cover and dare not blink for fear that they too may be his latest victims silent covet and meek as Ela as musk's tyrannical rain wrecks Havoc I mean it sounds quite dark to be honest like I thought like something funny it is so dark it's it's it's actually a dog so maybe this is a good model it's it's it's not doing very bad and you get to run a 7 million seven billion parameter model completely on completely on Google collab free use it for commercial purpose and anything you want if you've got a bigger memory you can use 40 billion model as well but try it out yourself like I said all the required links will be in the YouTube description for you to get started immediately after you finish the video I hope this was helpful to you in running the most scored like the top model from UAE I mean the model is top all over the world except like open air models it's been created from UAE United Arab Emirates and the model is free to use even for commercial purposes no royalty required hope it is helpful let me know in the comment section what do you feel about it otherwise see you in another video Happy prompting

Info

Channel: 1littlecoder

Views: 22,023

Rating: undefined out of 5

Keywords: ai, machine learning, artificial intelligence

Id: 21mHov4Whag

Channel Id: undefined

Length: 13min 34sec (814 seconds)

Published: Thu Jun 01 2023