Fine-tuning GPT Neo 2.7B and 1.3B

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's up guys today we're going to be going over gpt neo the open source implementation of gpt3 if you recall from my previous videos what makes gbt3 and to a lesser extent gbt news so special is the vastly increased size here's gbt3 here's gbt2 at this time the largest gpt neo model is 2.7 billion parameters while this is still much smaller than 175 billion parameters having a larger model has its benefits specifically in the area of fine tuning and few shot learning what we see in this diagram here is the ability of different size models on their impact of learning from one shot zero shot and a few shots what this means is how many examples is a model being given to complete a specific task those tasks can include but are not limited to sentiment analysis of text and text generation in a certain style such as generating text in the proper format of a programming language what we see in this diagram here are three models 175 billion parameters 13 billion parameters and 1.3 billion parameters what we can see with the 175 billion parameter model is that it is a very good one-shot learner jumping up from around 10 percent to around 45 percent given one example and as we can see with a 13 billion parameter model is that given one example it of course does better than zero shot but it's still not as good as the 175 billion parameter model and again a similar story with the 1.3 billion parameter model you may be asking though why should i care about this well the takeaway should be is that larger models tend to do better than smaller models equally as important is that larger models tend to perform better than smaller models given a limited amount of data thus if we want to fine-tune a model to complete a specific task in general we'll be better off using the larger model or the smaller model by doing so we can have a smaller data set to fine-tune on or given a large data set already we'll obtain better results as we can see here so that's exactly what we're going over today how to fine tune gpt neo 2.7 billion parameters in order to obtain good results on fine-tuned tasks like in my previous videos we'll be using a jupiter notebook to run the models we will not be using google collab due to the fact that the resources needed to fine-tune these models are too great for most google collab instances you will want to run this jupyter notebook on your own high-end pc or run this jupyter notebook on a google cloud or azure vm instance with their proper rented hardware to get started then we are first going to need to create a conda environment let's call it conda create in gbt neo fine fine tuned and we'll do python equals 3.7 yes let's now enter that environment so conda activates gpt neo fine tuned now we're going to need to install jupiter notebooks so we're go conda install jupiter hit yes lastly before we enter the jupiter notebook we need to install a kernel so that our content environment is accessible inside of the notebook to do this we'll type the following just copy and paste what i have typed here but the main thing being right here is the name of the kernel and you're going to look for that inside of the notebook before we run jupyter notebook to launch the jupyter instance let's take a look at our folder we run ls we see the notebook and we also see train.csv this is training data that will be used later so now let's run jupiter notebook and we have started a jupiter notebook let's go ahead and enter the jupiter notebook that i have prepared for this video and now the first thing that we need to do is we need to select a kernel so we'll go to kernel change kernel and then we'll select gbt neo fine-tuned you may need to now restart the notebook if you're having issues installing packages this may be the fix you can do that by going here and hitting restart so now let's get started with the actual notebook of course we must first install pie torch so let's go ahead and do that once that's finished let's go ahead and very briefly go over a more naive approach to fine-tuning these models an approach that i originally explored and had some issues with so one easy way to fine-tune small transformers is to use a library called happy transformer heavy transformer is a package bailed on top of the hugging face transformer library using it you can normally fine tune small transformers very easily let's install it and see how that works to our use case here okay and now we're installing it and we're done so at this point we'll now begin fine-tuning the models on the shakespeare day set that i previously mentioned and we're going to do it on my rtx 3090 that has 24 gigabytes of vram so what we're doing now is we're going to set a cuda visible device to zero so it only runs on my rtx 3090 and that's done so what we're going to do here is at first we're going to train on the smallest gbt neo model that is the 125 million per meter model and we're going to fine tune that model with that train csv file containing shakespeare so let's go ahead now and run this segment and it's going to be loading the model here soon so as we can see right here is we see an nvidia smi and we see the vram starting to get filled up so now it's at 2 gigabytes and now it's training and see we see it jump to 10 gigabytes here what this means is that in order to train even just the 125 million perimeter model with a batch size of one we need a gpu with at least 10 half gigabytes of vram this unfortunately eliminates many cards from being able to be used to train even the 125 million perimeter model this will eliminate the rtx 3080 3070. ironically the 3060 would be able to do this the rtx 3080ti would be able to do it the gtx 1080 ti would be able to do it and the 2080 ti would be able to do it and of course the rtx 3090 would be able to fine tune the model but that is it with regards to consumer cards so at this point let's go ahead and stop this early we know how much vram it uses and we don't want to use the 125 million primer model anyways and now what we need to do is we need to run this segment right here what this segment is going to do is it's going to clear the vram usage for the gpu and will allow us to again run the above code segment but this time we're going to see how it performs with the 1.3 billion parameter model so now we're going to run this section here and now our vram should be cleared so now let's go back up to the code block that was training the 125 million perimeter model and let's change it to 1.3 billion parameters spoiler this is going to error out but let's just see what it does exactly so now i'm going to run this here and let's keep an eye on the gpu vram usage should be loading here any second now okay it's at eight so it's eight before we start running 24 and maxed out and we will have an out of memory error if we scroll down let's see it right here out of memory so as we can see while happy transformer makes it very easy to train small transformers it is insufficient for the 1.3 billion parameter model and even more so for the 2.7 billion perimeter model and as i talked in the beginning of the video we want to fine-tune on large models because large models are able to learn better and with less data compared to smaller models so then is there any hope that we can train these larger models with consumer grade hardware fortunately the answer is yes but we need to move to a different library we need to use deep speed first we need to clone the deep speed repo as we must build some optional items in the package from the source deep speed is a deep learning optimization library by microsoft that allows researchers to more easily run and train larger models that they otherwise would not be able to do so by running this block here we'll go ahead and clone the repo and that's done and so now if we ls our folder structure we see that deep speed is located in the folder we now want to cd into deep speed and now we've done that again if we ls the folder we'll see the contents of the deep speed repo so now the next step is to install deep speed from source using a flag that's in the documentation to ensure that all needed ops are installed to do this run this command here ds build ops equals one pip install period period meaning the current directory so let's do that now installing deep speed from source may take a while on my computer it took over six minutes but just wait it out and you should be good to go once we have deep speed installed the next thing we're going to want to do is run ds report so what ds report does is it lets you know if the compatible ops are installed on your pc i'm not quite sure which all ops needs to be installed but i can tell you for sure that cpu adam needs to be installed ninja as well as transformer i can tell you that async i o does not need to be installed these are the ones that are compatible with my system and with these ones installed i can get deep speed to work after we do that we need to now download the repo that will actually be fine-tuning gbt neo using deep speed so let's first exit the deep speed repo repo by doing cd dot dot you can run ls now to see that we're back on the above folder and so what we need to do now is we now need to clone the repo for fine tuning to do this we'll just simply run the code block here to download and clone the fine tune gpt2xl repo and now that it's done we can see that it is in the main folder so now let's enter it by doing cd fine tune gb2 xl so the last thing that we need to do now that we're inside of the fine tune repo is we need to download and install data sets which is used in this repo and we'll do that um so now at this point we're able to fine tune gbt neo including the 2.7 billion parameter model as well as other gbt models such as gpt2 gpt2xl etc so forged gbt neo 2.7 billion parameters we need a high-end machine the readme says that roughly 70 gigabytes of ram is the minimum required for it along with roughly 16 gigabytes of vram on a gpu gbt neo 1.3 billion and other smaller gpd2 based models don't have as high as requirements this can be rented for an okay price from a cloud provider if you don't have a powerful enough machine normally the cost to rent a machine to do this for a month would be a couple hundred dollars but to fine tune a model you'll probably only need to rent it for a couple hours so it's probably around 10 bucks at most personally i had to upgrade my pc from 64 gigabytes of ram to 96 gigabytes of ram just in order to make this video and do some cool things with gbt neo 2.7 billion but as i've gone over in my previous videos once you have a trained model you can run that trained model on much lower end hardware even with just a cpu as long as you have a decent amount of ram so now let's go ahead and fine tune gpt neo billion with the shakespeare data set that i mentioned earlier uh the flags that we're going to be using here are the ones in the readme that i highly recommend you read yourself and this will supposedly work on 16 gigabytes of vram and roughly 70 gigabytes of ram i can personally say guarantee that it works on an rtx 3090 and 96 gigabytes of ram but as you can see you almost certainly can get away with much less once you run this code block here it will take a few minutes to get started once it does it will take a couple hours to fine tune the model depending upon several factors such as number of train epics batch size parameters and flags that you can set on your own depending upon your own hardware and requirements let's go ahead now and run this code block and check out how it's affecting my computer how much ram it's using how much vram it's using and see what the impact is of running this model so now i've just started the process of fine-tuning the model i'm going to now skip ahead until it actually starts training so we can truly see the impact that training the model is having on our hardware what resources are we're using we're now into the process of training we see we're on step 8 of 178 just the way that the data is batched up with this model we can see that we are using 19 gigabytes of vram and 63 gigabytes of ram the reason that we're using 19 gigabytes of vram is due the fact that we have the resources available if we had a less vram my understanding is that we would be using more ram although i could be wrong about that the last thing we're going to do with this notebook is i'm going to kill the training process and then reload the training process but with different parameters so that we can further reduce the hardware demands i have now restarted a notebook and we can now change some of the flags one of the more obvious ones is that we can change the model we use from 2.7 billion we can change that to 1.3 so now let's run that and let's see what kind of performance impact on our hardware we see so right here we are training a gbt neo 1.3 billion perimeter model we can see he's using significantly less ram using 38 gigabytes it's also using significantly less vram using 13 gigabytes so we can see that of course the 1.3 billion perimeter model requires less powerful hardware to fine-tune and you can definitely do this on more mid-range hardware or cheaper hardware that you can rent in the cloud in both of these cases whenever we're training a model the output of the model using the flies that we ran will be in the fine-tune folder we'll create by default the flags that we use it will only save the last uh evaluation step so when it's done but you can use flags that are described in the readme to have it save every epic or every two epics or etc so this is where you will find the trained model that you can use for later to end the video i want to briefly go over the github page and the readme here we can see some of the steps to create a vm that i talked about if you don't have the hardware required and you can feel free to follow that or ask questions and i'll be happy to help here are some steps to fine-tune a gpt-2x which is 1.5 billion parameters so if you're interested in that you also can do that it's very similar to what we did in fact the only difference would be instead of running the commands that we have in the jupiter notebook you'll run these here we can see some steps on how to generate text with your fine tune model this is for gpt 2x but it's similar for gpt neo here's the steps on how to train gptno and then here is the steps to generate text with your fine tuned model i plan on making a another video on how to best generate text using your fine tune model though this will work i have found that there are some few things you can do just to make life easier to play around with your fine tune model and lastly but still very importantly is the optional configurations i recommend looking at the training arguments that you can find the link on the bottom of this readme there are optional flags that you can append to this area here for training that will allow you to change how often a model is saved is it saved every 10 epics or 5 episodes or whatever just things to make it easier to get the result you are looking for when training the model you're going to want to keep training until your evaluation loss is the lowest that it can be at some point your evaluation loss will start increasing again so it's a good idea to stop once you notice that the evaluation loss has reached its minimum and is starting to go up for me training on a custom data set not the one provided i found that it takes roughly four epics to get the best result so that's something you ought to play around with to find what works best for you so at this point i'm going to end the video i hope that this video has helped you be able to fine-tune these very cool very large models for future videos i plan on making a video on how to best use your fine tune model to get the outputs you want i also plan on making a video on fine-tuning a model on your own custom data set rather than the provided data set so feel free to subscribe if that interests you so you can find those videos when they're published as always thanks for watching please like and subscribe this helped you out and please have a great day you
Info
Channel: Blake
Views: 4,225
Rating: undefined out of 5
Keywords: GPT3, GPT Neo, NLP, Machine Learning, Transformers, Fine-tuning, Transfer Learning, DeepSpeed
Id: Igr1tP8WaRc
Channel Id: undefined
Length: 20min 49sec (1249 seconds)
Published: Mon Jun 14 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.