Meet Gemma: Google's New Open-source AI Model- Step By Step FineTuning With Google Gemma With LoRA

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello all my name is krishak and welcome to my YouTube channel so guys Google is again on bang and it has created his own open-source llm model that is called as gamma now again it is on that specific race of Open Source llm models also till now the most accurate model that we specifically had was from meta that is called as Lama 2 uh now gamma is there uh in this specific video we'll try to see a practical application along with that we'll try to find you with the help of this specific gamma model okay uh so what exactly is gamma model uh I'll just show you one block first of all why they have actually created it at the end of the day all the companies you know is saying this thing right so gamma is built for responsible AI development from the same research and technology used to create gini models okay uh at Google we believe in making AI helpful for everyone we have a long history of contributing to open source communi such as Transformer tens oflow B T5 Jacks Alpha fold and Alpha code obviously Google is doing I think it is it from past so many years it is doing so much of research specifically in open source contribution it has given tensor flow it has given many things as such uh along with that meta is also in the same race uh both are doing fabulous job so let's go ahead and let's know more about Gamma uh GMA gamma we'll say gamma okay GMA also we can say I don't know how to pronounce it but it's fine so if I probably talk about this the main thing that you really need to know is is about the performance metrics uh over here when I compare see two models are there with respect to this particular model one is 7 billion parameter one is 2 billion parameter if I consider with respect to Lama 2 uh uh we have 13 billion and 7 billion with respect to all these things I think gamma is performing absolutely fine here you can see that it is having 64.3 with respect to various Benchmark accuracy if you want to know more details about it you can actually see this particular blog uh over here if I if we are just compare comparing open source models like gamma and Lama 2 uh gamma is Way Forward when we compare with respect to the 7 billion parameter it has right 64.3 the general accuracy reasoning is 5 55.1 81.2 math 46.4 it is far far better than the Lama 2 models also right so this looks absolutely great uh here the accuracy is also very very good now let's uh talk let's see more about it it is specifically used for you know research purpose you can also use use it over here you can build your own models and the other thing is that right now uh the gamma 2 models right 2 billion parameters 7 million parameters already available over here see in hugging phase so once you probably go to hugging phase and search for gamma 7B or gamma 2B you'll be able to see this kind of page here uh you need to get granted to access this model so here they'll tell you to probably uh check the terms and conditions to get the license of this particular model to use it okay so it'll be like a check boox you just need to check it I understand that that and give the confirmation right once you probably do that you'll be able to get the excess of this specific model um what is the main aim of this particular video I'll try to show you with the help of practical implementation how you can access this particular model and how you can actually use it so now let's go ahead and let's see this fine-tuning technique with the help of Google Gamma model and uh again guys uh this is just a simple use case that I have taken over here but you can do some amazing use cases with the of this and that is what I'll plan in the future some amazing use cases that I've already developed with paid openingi models I'll try to also develop that with this along with fine tuning okay so initially uh what we are going to do we going to see this Google Gamma model how it performs and what kind of task all that kind of NLP task you'll be able to do it okay so initially to go ahead with I will go ahead and install all these libraries that is bits and by PFT TRL accelate data sets and Transformers if you don't know guys I've already created a video with with respect to fine tuning okay now in fine tuning what we have actually done we have done the fine tuning with the help of Lama 2 model okay so let's see a UTF local is required let me restart my [Music] kernel change runtime disconnect and delete runtime okay so let me go ahead and reconnect it and then we will try to do the installation again but I've already created a fine-tuning technique with the help of llama 2 model the same steps and process will basically happen over here also so let's okay it's got connected now let's go ahead and uh do the PIP install so here you'll be able to see the PIP installation will specifically happen with all these libraries to just tell you what we are going to do this bits and bytes techniques bits and bytes libraries used for quantization purpose I've already created a video with respect to quantisation what is the main name of quantisation quantisation actually help uh allows you let's say that you have a huge model right right now we have this Google Gamma model of 7 billion parameters or 2 billion parameters if I want to load it let's say in Google collab so in Google collab right now I have somewhere around 50 GB Ram or 2011 GB hard disk right and uh right now I'm using this Google collab premium version right let's say if you're using it you'll hardly get 15gb Ram it is not possible to load that entire model over here so what quation techniques helps you do is that it helps you to uh convert that float 32 bit right usually all the numbers all the weights right all the bias are saved in 32 bits it'll help you to convert that into 8 bit 16 bits so when you convert that you require less memory at that point of time right so that process is specifically called as quantisation and I have actually explained it I've shown you how mathematically we specifically do quation so everything is explained in this particular video I'll provide you this link in the description of this particular video now the next thing is that we will be importing some important libraries one is OS Transformers then you can probably see torch Google collab user data data sets I'm going to load it sft trainer PFT right PFT PFT Laura configuration as I said that we going to use Laura technique okay sft trainer is specifically used for f tuning purpose supervis fine tuning we basically say right now from Transformers We also going to use Auto tokenizer because tokenization if you want to really perform tokenization while fine tuning we'll be using this Auto tokenizer so that it can load based on the model that we have then you have this Auto model for casual llm this is specifically used so that we'll be able to F tune with respect to Casual large language model okay then from Transformers We also going to import bits and bytes config here we are going to give the configuration with respect to quation and then we have this gamma tokenizer so let's go ahead and import all these things so gamma tokenizer is something for the gamma model itself right a tokenization technique otherwise you can also use other tokenization it is not necessary that you always need to to use gamma tokenizer now let me tell you one more step the next thing is that after importing all this you know that this gamma 7B model is specifically present in hugging phas right now in order to access this model in order to download this model we really need to have something called as exess token exess token of what excess token of the hugging face so how do you get an excess token go over here click on settings and here you'll be having something called as excess token just copy this okay copy this exess token and once we have this exess token access right then only we'll be able to download the model from here so here what we will do go ahead and set it in the Google collab how do you set it click on this key button so everybody will be having this key button and just click on add a new secret and I have written it over here as HF token with that same value that I've have copied from there so once I probably write it down over here so this is basically saved right so this is saved over here now once I've saved it any number of notebook that you specifically work in you'll be able to access this hugging fish token how do you access it it is in simple os. environment HF token is equal to user data. get HF token you just need to write user data. get HF token so once you do that it will ask you for the grant you just need to Grant it and then it'll be able to access and it'll be able to also identify which hugging face uh from which username or password for or which username you are specifically pulling that particular information okay so this is done right now through this you'll be able to access any models from the hugging phase right Let It Be llama 2 any model that you specifically want to use now what we'll do we will go ahead and call this particular model Google Gamma 2B initially I told showed you 7B right I'm going to load it 2B right 2 billion parameters not 7B 7B also you can do it it still take time for the fine tuning purpose now here you can probably see I've taken this and then I have written byes BNB configuration that is by bits and bytes configuration the first parameter is load in 4 bit is equal to true that basically means what my model the gamma model in 2 billion parameters is specifically in the 32-bit right initially we are trying to convert that into 4bit and this process is specifically called as quation right so what we are doing this once we set it in load in forbit all the activation function sorry all the weights that are stored in 32 bits will get converted into 4bit okay now the second parameter that you'll be able to see B and _ 4bit Quant type is equal to nf4 what is nf4 4bit normal float so the quantisation technique how do you convert this into a 4 bit is basically taken from this particular parameter if you really want to know about it I've given a link over here what is 4bit quation and how does it help Lama 2 the reason why I've written Lama 2 because we are talking about open source model then you have BNB 4bit compute type which is nothing but torch. B float 16bit now the reason we have kept this as 15 blit bit 16 bit see this we are performing quation that basically means we are taking a big model we are making it to a small model how by performing quation by converting the 32bit into 4bit now along with that whatever fine tuning takes place for those fine tuning all the weights that gets updated will get updated with respect to this particular 16 bit see there is some loss of information with respect to quantization to balance that we are keeping the new finetune parameters in 16bit so that is the reason we have written it over here so once I execute this so this is my model ID this is my configuration now based on this model ID and configuration I'm going to use the tokenizer and from this particular pre-train model I'm going to call the tokenizer and for here also I'll be using the HF token okay then you have model over here Auto model for casual LM from pre-train model ID we'll use this model ID that see auto tokenizer is to call the tokenizer that is required for this particular model right and here also I've given the HF tokon automatically that tokenizer will called got called right then we using Auto model for casual LM and here also you'll be able to see that we using model ID quantization config with respect to the B&B configuration the device map will be equal to zero which is nothing for the GPU and here you can see that I'm going to use the HF uncore token okay so once I execute this you'll be able to see that I am going to download all those information see auto model specifically to call those particular model the model ID that I've given over here by applying this quation technique and this entire process is going to happen in the GPU the GPU that my Google collab is connected to and over here I've given the token so quickly you'll be able to see that the entire model will get downloaded and now all the information will be specifically present in this particular model now with the help of this particular model I will go ahead and test it now now see this how I'm going to test it I've given a quote over here imagination is more I've used a device Cuda is equal to Zer that basically means whatever is the G that is available and then I'm going to use the same tokenizer see this tokenizer is basically getting used I give the text and I get get this particular information so this input will be a vectors right and then what we do we use this space same model and we say dot generate with respect to this particular input and let's say that I'm going to take a token of 20 okay initially then once I get this particular output the output will have in the form of list like what all information is basically coming so once we write tokenizer do decode of output zero see at the end of the day we are doing sentence generation so when we write model. generated is again going to create some embedding vectors right and that output which is in the form of vector will get decoded with the help of tokenizer and then we'll get the real text so if I execute this you'll be able to and it is saying skip special token if any token is there it'll get removed okay so once I execute it you can see imagination is more than knowledge I'm a self-taught artist born in 1985 five right so this information is basically coming based on the based on the model that it has let me go ahead and execute once more okay so here also you can able to see that uh I'm trying to execute one more time let's see whether I get a different output so here you can see imagination is a more than knowledge knowledge is limited imagination in circles word so this information is given by Albert Einstein okay so two information if you probably search in the internet you'll be able to find this quote also okay so what over here one output we got where with the code completion here you can you could probably see that I'm also getting the author name okay now this is what it is able to do it okay so whatever information you probably put it over here it'll be able to get the output based on the information that you require now what we going to do we I'm going to show you a fine tuning technique now see this. environment van disabled so one parameter we need to keep it as false that was given in the documentation also I did not still understand why this parameter was used but once I know it I'll I'll let you know because I'm going to create a lot of projects as we go ahead now we are going to f tuning with the help of Laura configuration so for Laura configuration what are information I require one is rank okay so Laura technique is something called as rank decomposition we'll get to know I've not still uploaded the theory video so here we can select the rank value okay so rank over here is selected as8 why 8 why cannot select 16 or 64 you can select any number but understand what exactly is rank decomposition for that I need to create a dedicated video and soon I'll be coming up with this the target modules uh in this you specifically require this KV parameters gate up project down project so right these are some of the target models that is required as I said guys once I probably upload a video dedicated to Laura configuration you'll be able to understand the task type we always need to keep it as calore LM that basically means it is specifically used for this language modeling task so this is one of the parameter that we I need to set it up so once I execute this now let's go ahead and do the finetuning okay so this is my Laura configuration now the data set that I'm going to specifically use is this okay so let's see this data set okay so I'll see this arbitrate English quotes Okay okay let me see whether we are able to find it out or not arbitrate hugging f okay I'll I'll just search for hugging face now this is the data set if you see this data set you have two information one is code one is authors right based on this information this is the quote and this is the author and we'll try to fine tune with this let's say that my GMA models knows some of the quotes in the internet along with the author okay now my main thing is that I'll have some more additional data set and we'll train with this specific data set and we'll then identify the author for this particular code okay we'll do something like that we'll Implement something like that so here you can see that I am loading this particular data set from arbitrate onore English quotes and then we are taking a sample from the sample of quotes Okay so here we are going to execute this and if I just execute data train of quote right so I will be able to see all the quotes over here okay quotes you can also see the codes you can see the uh authors and all so this entire uh is getting generated and here you can see train split is also happening and once this is executed here I have my data set so if you probably go ahead and execute it you'll be able to see all the quotes Okay so the quotes the sample of quotes that we have there is nothing Noble in this so so so so information as you can see okay so this is my data set now whenever we perform sft that is supervised fine tuning right we require some information okay one function we specifically require to indicate what is my input and what is my output okay so here we have used something like this quote example quote of zero so this is my first parameter like zero okay and then whatever information is that this is my input and this will be my output so quote of zero author of zero so that basically means quote of Zer indicates I will show you it this quote of Z basically indicates each and every sentence over here author of Zer basically indicates this one okay so this information will be my output and this will be my input something like that okay so once I format this you'll be able to see that I'll be able to return my text okay so let's let's consider over here if I give any examples over here right so here you'll be able to see that uh any kind of examples from that example quote will be separated and author information will be separated and then we return that particular text in the form of key value pairs in this list okay now is the time for SF trainer so here we use the model we use the data of train data of train is what so let's see this okay so so you'll be able to see what is exactly data of train the same data of train you'll be able to see where I'll have all those information this is my features author tag input ID and attention mask okay then we use some training arguments per device batch one tra gradient accumulation steps four warm-up steps thing I'm going to run it for maximum steps of 100 learning rate is so much fp16 is true then output directory will be this optimization will be this right PFT config will be nothing but my Lowa configuration and whatever formatting function I'm using I'll be using it over here so same formatting function okay so that I get the text in this particular key value pairs so once I execute it you'll be able to see it so here you have uh your training will probably start now okay once I write trainer. train and we are going to run this for 100 steps now see how fast it is going to happen the reason is that 2 billion and we trying to convert that into four bits I think the training will quickly happen and we taking just a sample of data okay sample of data so here you can see 26 steps is done 100 steps and here you can also see that loss is decreasing right 3939 2180 2080 it's good1 31 so losses keep on decreasing over here okay now this quote you'll be able to see I've taken from one of the over here let's see so here so a human is like a te bag and then my author should be y roselt right so now once we execute this I think we should be able to get that specific response okay from that particular model so let's say 89 91 92 100 it is reducing but it is increasing a bit but again it is in that step so training output Global step if you still want to reduce the loss what you can basically do you can increase the number of steps now now here I have a quote a human is like a tag Cuda is equal to zero tokenizer this model. generate this same thing now let's go ahead and execute it I should be able to get the author name in forward see illar is coming when is like a teabag you can't tell how strong C is until you put her in hot water okay so that information you can see you can never know how strong it is until you put it in it's hot water so see all this information is basically coming and the author is also coming as elard right so let's see this uh I will also take one more example something like this let's see whether we will be able to generate the same thing or not based on the data set that we have because we have already doing the fine tuning okay so I will something write like this okay now let's execute and see the output and again now every fine tuning you can specifically do this specific way right I just shown you in one example the opposite of love is not hate it's fear and the opposite of fear is freedom and the most wasted of all days let's see whether it is there or not see the opposite of art is oppos of love is not opposite of love is not it's indifference the opposite of art faith is not somewhat it's almost equivalent right fear opposite of Freedom most of the wanted is okay let's try some more okay this looks unique unless and until this person is not a famous the book is man's so I will just go ahead and execute this let's see whether we be and again understand guys we have just max new tokens is 20 because it needs to also complete this entire statement so outside of a box Dog a book is a man best friend see Man's Best Friend author is Nicholas Comfort the quote is again the most wasted days of all days in one so you can probably see over here right inside of a dog something all the information is great right so you can you can check it out with different different examples but again at the end of the day yes uh fine tuning is good in this again as we keep on doing more and more we'll be able to do it now all the application that we have specifically created like text to SQL invoice we'll try to do it with the help of this for we'll also try to do the fine tuning let's see how things will come up but I hope you like this particular video this was it for my side I'll see you in the next video have a great day thank you one all take care bye-bye
Info
Channel: Krish Naik
Views: 26,639
Rating: undefined out of 5
Keywords: yt:cc=on, google open source models, google gemma models, fine tuning with google gemma models, krish naik genertaive ai models
Id: UWo9r6flDjk
Channel Id: undefined
Length: 22min 26sec (1346 seconds)
Published: Fri Feb 23 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.