NEW Falcon based AI Coding LLM - Falcoder Tutorial

Video Statistics and Information

Video

Captions Word Cloud

Captions

one of the most prolific model creators Manuel Romeo has created a new model called falcoder falcoder is falcoder 7 billion model fine-tuned on code alpaca 20 000 instruction data set using Q Laura and theft this is a model that has been released with Apache 2.0 lives license which means you can use it for commercial purpose this is a really good model for coding purposes from what I have tested and in this video I am going to show you how to load this model on a free version of Google collab and then run your instructions before even we begin with I actually ran the model and I got the code like I asked a model to create a code right a python code to build a matplotlib bar chart it gave me the code I in fact went and ran the code and it actually gave me a bar chart which means it produces usable python code or like any programming language python is something that I understand using file coder this Google collab notebook that we're going to see in this video is something that manual Romeo has kindly provided me which is using a PFT library before we move further I would like to quickly show you the model details in itself this model does really well a lot of things one other thing is because one the base model falcon 7 billion parameter model is something that a lot of people love in the open source Community but now that has been fine-tuned on the code alpaca data set code alpaca 20 000 instruction data set so it does pretty well on um on lot of coding related tasks now if you want to use this if you want to credit the author this is how you cite them and if you want to use this model technically you need at least a really good memory that is not the free Google collab notebook memory so the free Google collab notebook memory has got 16 gigs approximately 15 gigs of memory in which we have already seen on a different video that we can run the 7 billion file con 7 billion instruct model but unfortunately this model directly will not fit in there so what we are going to do here is we are going to use this model load it in 8-bit and also instead of just loading the new model entire model we're going to load the base model and apply the Lora adapter on top of it and uh if you have enough memory let's say more than 16 GB memory I would strongly suggest you to use this code you don't have to do what I'm doing but if you are like me who loves using free Google collab notebook then the rest of the tutorial is for you manual Romeo has very kindly shared the adapter details here so if you go here this which I'll link it in the YouTube description so you can see the lower components here which means you can load the base model just apply the adapter on top of it and then you would be able to run this model on the free version of Google collab while it's being loaded on 8-bit so now getting it to the Google collab notebook which I'll link it in the YouTube description so you don't have to take notes make sure that it is on GPU so this is currently on GPU go to runtime and click runtime change runtime and then see it's on GPU so you if you have got collab Pro you can use different ones but this is free version that's thing next you are going to install the required libraries Transformers accelerate and piffed and data sets bits and bases and inops inops is a dependency for Falcon so we are installing inops once you have installed all the required libraries now load all the libraries loading torch pift and Transformers and once you load all the libraries now you're going to specify the model ID if you look at this we are specifying the adapter ID not the complete model so this file coder 7 billion is the complete model where you can go see the model in itself like 10 gigabytes 4 gigabytes but what we are going to point to is the adapter detail so here the one that you are seeing here that is what we are giving us pept model ID and from that pivft model ID the configuration is being retrieved and then the model is being downloaded for the model to be downloaded you can see config dot base model path so if you go here from that config the base model path is being loaded so this is a falcon 7 billion sharded bf b float 16 version model that has been downloaded here and then the tokenizer is extracted from the pept model ID so these are downloaded while the model is downloaded one thing that you need to keep in mind is it is loaded in 8-bit to fit in Google collab memory so this is an 8-bit quantization happening thanks to bits and bytes the library that we installed thanks to bits and bytes we're loading the 8-Bit model using Google collab like the free version of Google collab so to fit that model inside this we are using load in 8-bit and also you need to enable trust remote code is equal to true to make sure that this um like Transformers let you use this code because there are certain unmerge part of the model available so anyways so now at this point we have got the model ID set configuration model tokenizer everything done which is like the base model plus the Lora adapter after this is downloaded then it took about three minutes for me after this is downloaded now we are going to load the model this is slightly different from how we typically load our Transformers model so pep model Dot from pre-trained so the model comma model pepft model ID once you load that you get to see the details here what model is this what kind of architecture it has got what are the layers it has got you can see the entire detail here what is the lower embedding here so after you have loaded that now you need to create a utility function this is an instruct following model I mean of course this has been instruct fine tuned as you have seen so this is an instant following model which means you need to give the input in a certain format get the output in a certain format while there are certain hyper parameters given so it takes the instruction as an input it tells how much tokens you need to generate the temperature value the top P top K and the number of beams even if you do not play with these things it's very important to know that the temperature controls let's say the creativity slash hallucination slash Precision part so the higher the temperature is higher the randomness is lesser the temperature is higher the Precision but also at the same time lesser the randomness which means lesser the creativity so because we are using it for coding task we don't need higher temperature that's why the temperature by default is set as a lower temperature next you're going to get the instruction from the user and add this notation at the end just to tell the model that you have to produce the solution and once that is done then you just typical Transformer stuff you have got the tokenizer that's that's creating the tokens and from that you're going to move it to Cuda and from there you are going to get everything generated and finally you're going to decode the output and then to return the output in itself it's pretty straightforward you can if you want you can also use different ways to do this thing but all it is doing is taking the text tokenizing it and then sending it to the model getting it back decoding it and finally displaying it that's the solution split part now given that the utility function call generate is created all we have to do is go give an instruction design a class representing a person in Python so it says okay it's going to design a class it's it's not a classroom it's like object oriented class designer class representing a person in Python so as you know that it is assumed that a person would have these attributes name age gender so it says defined a class name age gender and then it is giving all the details some of the code is not not it's it's a you know you can see and then make sense out of it how well the code is doing not only does python but it also does um like more programming languages but also you can try it with this for example in this case we have said write a script to upload files to an S3 bucket the instruction is given and we have said how much is the output token that we want 256 000 tokens and it is giving us the solution so it's using some boto3 Library I'm not aware of it I'm not sure if the code is actually good but as you can see it is producing an output so to make it easier manual Romeos also given us an infinite Loop where we can go use the like the typical python input field which is something that people still use a lot and you can enter the instruction and then wait for it to give you an output that's exactly what I've done here and every time you get an instruction you send it to the generate instruction generate function which will actually call the instruction get everything undone so I just went ahead and then said write a python code to build the matplotlib bar chart then just it just work completely fine write a python code to build a matplotlip bar chart so I am going to now say write a python code to build a c bond bar chart so I I want a c bond part shot instead of just a matte plot lip as you can see when I ran this thing it takes a couple of seconds to run it one because we have got the memory limitation and even if this model like once you see the output on screen if it is not like the most perfect there are two things that you can immediately blame one is sorry there is one thing that you can immediately blame that is loading the model in 8-bit always reduces a precisional little bit but it should not be like way off but you get it right we are not loading the complete model and then doing everything so we have asked for a python code to build a C1 bar chart if you're not familiar with c bond c bond is another Library python library that helps you build charts visualizations it's built on top of matplotlib okay let me copy the code go back to my jupyter light notebook and paste it here let me run this it has also assumed the data in itself okay that's my but I think I I pasted it my bad sorry AI okay c bond is not available the Jupiter light environment oh my goodness we've installed c bond uh do they okay they don't let me install C1 so most likely there should be a fine code um I'm I'm on a Jupiter light environment so that's why it's not working it's not the AI it says that I could not run the code so we have got the data frame here the data is passed here the x-axis y axis and the access details are here most likely it should work but I should probably refrain from asking any questions using C1 because I cannot directly show you the quality of the code maybe let's uh let's do something else write a python code that can that uses rejects to detect email ID from a given input text so I'm going to give an input text and I want it to detect I recognize I should have said recognize anyways let's see if it can actually work I want to write a python code that uses regular Expressions rejects regex to detect email ID from a given input text we are just testing out python at this point because it's easier for me to actually run the code and do the demo like in my previous video I did GPT engineer couple of comments actually said that I did not run the code which is a very fair point to say when when I'm actually doing AI coding related videos so I wanted to put that extra effort in this video to run the code actually execute the code and then show you if the code works so we are expecting it to give us a python code that can give us rejects as an output I mean it uses Rejects and finds an email ID get this code go back here face this code as you can see it's not completed also because of the token details maybe so it says detect email is a function input and I know this is an input this is an example email so I can say one little coder gmail.com and this is a great day lots of good models okay done next I need to call this function I'm just adding this extra line of code because uh AI didn't do detect email done what did I do to detect email detect email input text print of cool let's run this oh it's none oh my goodness it didn't work okay it didn't work but the way we can test this is to see it actually did to be honest um is uh is my email ID it didn't work and that's probably because my email ID is it's none okay so I think the problem is uh when it is finding the match and then returning but if you just give email ID it's working that's probably because I didn't give my instruction properly but if you have email ID it detects so maybe it fulfills what it is supposed to do let's do one final test with SQL because I also understand SQL right as SQL code SQL query right a SQL query to get all the rows um where the dob is before Jan first 1990. okay I'm not sure if it can do this because in this case as you know the date has been given as um like the weird human format I'm not given date date properly and as a human I would probably know dob is date of birth but I'm not sure if a I can figure out and then combine this as a date so the where Clause is something that I'm expecting it to struggle with but generally otherwise it's should be a simple SQL query where it has got select star and um and um yeah select star from blah blah blah table I also didn't give a table name here that's also probably one of the mistakes that I must have done here but otherwise everything else should be in the where Clause let's see what output it gives you while it is generating I would like to quickly quickly give you an overview of this video first we checked the GPU availability GPU is there free Google for lab we have got the full model and we have got the adapter the Laura details which we can use it on the base model once you have checked whether you have GPU install all the python libraries load all the required python libraries and download the model tokenizer with the configuration from the PIV model now load the pif model itself and create the utility function that can actually take the instruction and generate and give you the output and then start playing around with this model that's what we have done here okay it is given a select star it's done a good job like in fact as you know like I said it needs to assume dob means date and it needs to translate this into a date format which is not an easy task usually um in the past like I've struggled to deal this with the libraries like Spacey rejects in fact there is a library called numerify there are like there are a lot of libraries before all these llms there were a lot of libraries in the classical NLB world where we used to use just to translate this into something like this but very very happy to see that llm has successfully done select star from table where dob is lesser than this is same yeah this is before lesser than this and this is after this cool so from my testing I'm honestly quite happier with this model especially even when it is a an 8-bit model it works super fine but the reason I made these videos for you all to try it out and then let me know the past few days we have seen a lot of AI coding models starting from Star coder to wizard coder and now we have got a falcon there are five encoder so check out Falcon coder or file coder which comes with Apache 2.20 license which means if you want to build a Chrome extension if you want to build a visual studio code extension or if you want to use this for a commercial purpose you get to use it so let's see how like I would like to hear from you what do you feel about this thing if you have any questions let me know in the comment section otherwise all the links including the collab notebook will be in the YouTube description should work completely fine as you have seen in the video Happy prompting take care also thanks to Mana Romeo for giving me all the information

Info

Channel: 1littlecoder

Views: 10,817

Rating: undefined out of 5

Keywords: ai, machine learning, artificial intelligence

Id: g5iAoMmf8OQ

Channel Id: undefined

Length: 17min 27sec (1047 seconds)

Published: Tue Jun 20 2023