How to train a YOLO-NAS Pose Estimation Model on a custom dataset step-by-step

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so in this video here we're going to see how we can train a custom post estimation model we're going to use the yolan NZ post estimation model from DAC aai we're going to see whole training pipeline how you can take a data set annotate it export it into a Google Cola notebook then we're going to run you through every single step so you can go and fine tune your own custom post estimation models so first of all let's jump straight into the supergradient GitHub repository let's go inside this Yol NZ post so if you just take a look at this Benchmark with yolo V8 compared with yolo Nash for post estimation we can then see that the YOLO N Post estimation models here are act like significantly better in state-of-the-art and is better compared to the YOLO V8 models here we can even see that if we're taking the small model it has comparable performance with the medium model from YOLO V8 and here also the medium model act like outperforms the large model and it is the exact same thing with the Nano and small small and medium where we get like way faster inference speed with the Yol NZ post models so the Yol NZ models are actually like optimized for NVIDIA T4 dpu but also Alo for Intel CN CPUs and also the Jetson Nano if you want to deploy these Yol Nas and Yol Nas post model on an edge device so da AI has a bunch of different models both open source and proprietary and I definitely recommend you guys go in and check them out they both have these y n models which are open source and they also have some proprietary models definitely recommend check those out you will learn a ton and you'll be able to use some pretty cool models also have other videos here on the channel where I'm basically just running inference with some pre-trained models on human post that could be use for like fitness apps or whatever you can come up with but this is actually a pretty cool post model and again I'm going to show you how we can do the whole training Pipeline and then at the end we will have a custom model that we can use in our own applications and products so in here in GitHub repository you can just get some information about the yolz post model and also auton neack from DAC AI we get some metrics here for a model the air precision and also the latency if you scroll a bit further down we can see the quick start card how you can actually like do predictions on on a pre-trend model so it's actually pretty simple to run inference we just import super gradients we set up a model here to have these four different variations of it specify the pre-train weights to the Coco post and then we can just call predict Yol n. predict throw in whatever URL P nire array and so on and then we'll get the mple predictions and this is how we can extract both the bounding boxes the poses and also the confidence scores for each detection here we can see the results so these are four guys walking we get all the poses so we detect like the ears eyes shoulders like all the different key points that we're interested in in the human body here we can see all the results they are very accurate we can see all the key points and also the edges between each key point we're going to take a custom data set I'm going to show you how we can label this data set and how we can train our own custom model because the preum models only works on humans they also have some documentation for this postestimation if you want to go into more details again we're basically going to cover most of it throughout this tutori here so you can just follow along but they have some training some prerequisites for the training like how you can set out the data set it need to be in the Coco data set format how we can run training from resipe but we're going to cover all of it how we can set up these custom config files and what you need to change I'm going to have templates for you guys as well the different data sets Target generators we don't really care too much about that the different metrics for postestimation so we're looking at like average position and also average recall for these key points that we detecting and we also do some post processing and visualizations here at the end we can also read about like how you can connect your own data set set up a new data set class data loaders pytorch for example but I'm going to show you all of that as well so let's now just jump into the fun part and let's get started creating the whole training pipeline starting with the data set where we will go in and annotate it first of all explore data set and then train our models so let's go inside rlow here and see how we can set up our project label our data set I have a bunch of examples in here that you can use directly as well the good thing about rlow is that we can export our data set directly into our Google collab notebook but I'm also going to have a data set in a Google Drive with some tigers so all of those are already labeled and you will have access to that as well if you want to use it we also have these clue sticks here that we're also going to do post dation on so you can pretty much do Post estimation on whatever object that you want you just have to draw the key points and then you're going to train the model on that so you can predict upside down on different objects maybe you want to have like multiple different uh points that you want to detect instead of just doing traditional object detection or in bounding boxes instant segmentation and so on you act like want to know the orientation with respect to some aess and all of those different kind things so right now let's just jump straight into the tiger data set we just have a bunch of images here with tigers then we can go in and label our images draw each individual key point and also Mark the edges between the key points so let's now go ahead and start The annotation first of all we actually need to set up some classes so we'll go back again and set up our classes so if we go inside class we can now add a new class let's call it tiger add the class success and now we can actually like set up the key points so right now let's say that we want to have have let's just make a simple have two key points in the Lex so we are just going to choose key point zero and then we're going to set a new one with keyo one and then let's take a new key Point here and make it the hip so here the hip is actually connected to two so we are going to have two key points for the hip and then we're going to have an additional leg and also an additional foot down here at the bottom then we're going to connect the hips to the shoulders so we can do it in this way and again we're going to have two shoulders and then these shoulders here can be connected to two more Lex and this is basically just how you set up these classes for doing post estimation in here in Robo flow so now we can hit save and basically just go in and start The annotation of our images so we hit save and continue go inside annotate and we choose all the images so now we're going to start here now we can just draw a bounding box around our object so we need to do that first of all and then we'll get all the key points so now you can see we have all the key points and we can go in and also like modify each individual key Point give it a specific name so this could be like for example left foot but this is just their deer so here we're just going to have the the foot and the upper leg and then let's take the hip so I'm just going to set some rough points here and then we're also going to have a more simple data set again you can just use them directly in here from a robo flow or annotate your own images and data set so now let's take a shoulder so this is the other shoulder and our right leg there we go and the last one here so now we basically have our key points for the first image again we just need to draw our bounding box probably also take this one up here at the top then we can skip to the next one this is pretty blurry then we can take this one basically just take all the individual key points place them at the desired places and you can do it for whatever type of data in whatever domain that you want to do Post estimation on I also have this other data set here with the glue stick you can use that directly it is already annotated in here in rlow so I have 134 images let's just go in and take one of the examples so this is just a glue stick where we have a bottom and also a top so we just have two key points that we want to detect let's say that we want to find the orientation the rotation with regards to for example the x-axis or how is it act like rotated and act like position in the frame so when we're going to do Post isation later on we're both going to get the bounding box for localizing where our object AC like is in the image and then we can use our key points to get the orientation so we get a bit more information when we're doing post dision compared to just traditional object detection so now we you can just label your data set let me just save this one here let's delete this and just give you guys one more example and you can just use these data sets out of the box or go in and label your own so right now I have the bottom I drag it down to the bottom and also the top once we have our data set here inside rlow you can go in and generate a new version so after you labeled your images you can go inside this generate we can go in and add pre-pressing steps augmentation and so on so here we just have it like 640 by 640 we're going to add a bunch of different augmentation steps if we don't have enough data or we want to have our models generalized better to different conditions then we can hit continue and hit create and now it's going to create a version of our data set we can export that and use it directly inside of a Google cab notebook but it also have this data set within my Google Drive I'm going to give you guys access to that as well where we have our train validation and test split and we also going to have a yl file here that I'm going to cover in just a second which is the config file for a custom post estimation but basically just have all our annotations in here so we have a key pointor train. Json which contains all the different annotations in the Coco format so that is very important when we're working with yolz we need to export the data in coko format so here we can choose between Koko and Yol 5 P torch in this example we're going to go with Coco it's just going to have a Json file with all annotations file path to the models the whole skeleton the connections with the edges and so on and then you can either like download a siip folder to your computer or show the downloadable code if just export it as the K format and show download loadable code in just a second we will go get a code snippet we can just copy it and paste it directly into Google cab notebook and we have the data set in the correct format and we can train it directly with the collab notebook that I'm going to share in a second and it will also be down here in the description okay so let's now jump straight into the Google cab notebook let's see how we can train our custom yolon Nas post model from da AI we have the data set we went over to GitHub repository and so on now we have the data set we have the label data set I both have it on my Google Drive but also on Robo flow so right now we're just going to connect to a runtime make sure that we're act like using a dpu because this actually takes a lot of resources to be able to train these custom post estimation models so right now we have a Tesla T4 then we need to P install super gradients and we also need to set up YouTube DL if you want to use YouTube videos just download them directly down do inference on it and we're just paper installing all the dependencies and Frameworks that we need to use so after it's done downloading we're just going to import all the different modules and Frameworks that we need so both opencv Json mad plot lip numpy torch gaml sklearn Google collab and so on we're going to set the device equal to true if it's available or else we're going to use the CPU so let just go and do that after the installation is done so that's not really too important we're just going to run these two blocks of code in the next code snippet here from superg gradients. trining we can import models and here for object names we can also import models by doing this we can just set up the model create an instance of a model in just a single line of code so models. get and then we can specify the version that we want to use so this is just pre-trained one so yolan asore postore L4 large and then we have a pre-trained weights which is Coco data set we're going to throw it on GPU let's create an instance of our model it's going to run that directly after it has imported the modules up here so it'll just take a second there we go here we we can see that it's act like downloading the model 300 megabytes and we can see that it has successfully loaded the pre-rain weights for architecture jolon N Post large first of all before we're going to go in and do custom Training let's do inference on on an image just to make sure that we have everything installed and this act like works so it takes in PLL image npy path to video file image file URL and whatever so let's scrap this image and do inference on that so I just have a URL here for the acts like image and we're going to call the predict function with a confidence score of7 so that's around 70% confidence score and we're just going to call. show for visualizing our results so let's run this it's going to take the image download the image throw it through the postestimation model it's going to extract the results and it's also going to visualize it directly so now we can see the post estimation these are some very very good results and we can also see the confidence scores are very high so now we know how to set up a model do inference let's go down and fine tun it now on our custom data sets first of all we need to set up a trainer we need to download our data set do all the data set handling we need to set up a config file and also basically just create AR data loaders so we can train our models directly in here so from superg gradients. trainer we're going to import our trainer we set up our checkpoint directory where we're going to store all the checkpoints all the different kind of like information all the outputs from our training pipeline we're going to set up an experiment name and also the checkpoint route directory so let's not going and do that so the only requirement is that our data set is exported in the Coco data set format let's now go ahead and connect to drive so it's just going to ask for permission I'm going to connect to Google Drive I'm going to use my account here and now it's basically just going to set up and connect to my Google Drive and I can access it over here to the left in the files so first of all we're just going to copy my data set from my Google Drive into the content here here into the local memory and hard disk in my Google collab environment again just remember like everything that is inside this environment this virtual environment in Google collab will be deleted if you disconnect from the runtime or if you close this notebook the reason why I'm copying it from Google Drive into this content is that we only have to do it once so we actually take longer to go in and download data set load the data set while we're training from Google Drive compared to if you just copy it directly in here so it's not do that and it will probably take some time but while it's copying the files let's go in and see how we can do it with roof flow I just copy it from roof flow paste it in here you just need to specify your API key and it's going to set up your data set in the exact same format as I have this tiger data set so I've been talking about the configuration file a couple of times so let's now go and take a look at that and see how we can set that up I'm going to have a custom one which will be available on my GitHub but if you want to use a custom one here that you can set up on your own you can download this from uh supergradient GitHub repository if you just go inside there then we have the get super gradients GitHub repo then we we go inside Rees here and then inside data set parameters you can see all of these different configuration files the yl files for setting up different data sets you can just scroll down to the Coco po data sets if you're going to use any of those if you're using a fully custom data set you'll have to go in and modify these as I'm going to show in just a second but we just have like number of joints it's links it's colors key points and so on so we need to specify that if we just go back again here I actually have this configuration file on my own and all of this will be available but if we just go over here and take a look at my specific configuration file and then we're going to set that up then we need to specify the number of joints so in this example for the Target data set we have 15 joints then we also need to have some Sigma values for each individual joint and make sure that you like have Sigma values for every single joint that you have in your data set right now I'm just going to go over all the different parameters and then I'll explain the sigma values in just a second so Ed links here is basically just the links connecting all the individual key points in your data set so you will get that from your annotations when you're exporting your data set and so on so you need to set up the Ed links and we also need to specify the edge colors so these will have to be the same sizes or the exact same number of edge colors as we have Edge links so this is just like out colors like you can just choose whatever color you want to have but we want to have like unique color for each individual Edge or that's pretty much up to you we also need to have specific keypoint colors here just to be able to identify different key points and also be able to distinguish between different ones so we have unique keyo colors as well and that's pretty much everything that we have to set up in this configuration file the data loaders and so on we're going to set that up in the code now we just go back to the sigma values here again so right now I've just point one for all of my key points but it's basically just determining like how much error do we want to allow for individual key points in the upper part of the body we might actually be able to allow more localization error so the sigma the higher Sigma value the more localization error we allow and again the lower the the lower localization error we want to allow in a data set so now we have our configuration file we can load it in use it and then we can going and train our data so let's go in and take a look at how the Coco data set annotations look like so we have our annotation we have our bounding box for the specific object we have our category ID again we just have one in this example and then we have all the individual key points as we can see here so these are the labels um that we have done inside r flow so we have our bounding box we have our class class ID or the category ID and then we also have our key points the number of key points image id id area and so on so this is basically just all the annotations from our labeling tool so for each individual image we have we're going to have a specific annotation if you just close The annotation we're also going to have some information about our images the width and the height so the image Dimensions the F name so we can go in and take that from the folder while it's training so this is just what our data loaders is act like going to load load in when we're going to set that up so let's close the images we also have some categories so these are the key points left ear right ear nose right shoulder uh right front paw so this is for the tiger data set so for the key points we also have the skeleton so this is just all the values for our links have our key points we have our skeleton and that's pretty much it so we have this Json file for our train test and validation split and it's basically just the Coco data set format so here have created a utility function just to open a file we're going to use that for opening up our Json files and also our yo configuration file for our uh kogo post estimation data set parameters so this is just how we can open up a file we're going to set up our annotations and also our config file we're going to specify the path so keyp pointor train and here we have our data set parameters tiger file so I'm just going to copy the path here just to make sure that right I have the correct one this is really important because we just did the modifications over here to the right so here we can see that we are able to run the code we're able to load in our annotations and also our config so this is The annotation file breakdown if you guys are just going through the data set um so this is just everything that I just went over inside the Json file so the super category ID name key points skeleton and so on so let's just skip that for now then I have another utility function for plotting some random images right now we're just going to plot five it is good practice to go and plot some images from your data set just to make sure that you're able to load in the images in the correct way and also the annotations and you can also index it and get the correct um key and values in your dictionary so let's run that and now we can call the function plot random images we set our data equal to annotations and our image based directory so we're just going to take five images from our training set and plot those with our annotations so here we can see that we're able to load in our data we're not getting any errors and we're visualizing five frames of our Tigers so let's now go down and see how we can create our data loaders load in our data so we can start training our custom Yoles post model so to start with we need to create a class called post estimation data set and we need to set up a bunch of different things so we need to have our data directory image directory Json file extract all the information create our data loaders around that so I won't really go too much into details but we're just extracting like all the joints the number of joints we go in and just index all the images so we're just going to run through all our images store all the paths and images on the list so we can use that later on so we both give the image IDs and also the image files then we're also going to have a list here for annotations we run through all the annotations extract the key points from our annotation and also the bounding box and then we're just going to put it on some arrays that we're going to store and then we can have our self. annotations. append and then we're going to throw our annotations onto that list then we both have all the images extracted and also all the annotations then we're going to have a class function for loading a sample because we're going to use that when we're training our data so every time we want to tr trainer model is just going to call this function load sample it's going to load the sample from a data set throw through the model do some optimization with the models and also the weight of the model it's going to load in new sample from the data set and it's going to do it in batches so right now let's just go in and actually like just run this block of code let's create our class our post estimation data set class which is really important to create our data loaders then we're going to open up all our annotation files both for train validation and test spit so just run it directly here now we can go in and do some transformations to our key points so it's basically just doing some some random augmentation and so on we won't really dive too much into details it's just good practice to do data augmentation on top of your data if you're talking about postestimation optic detection segmentation or whatever so here we just run this blocker code you can dive into details if you want to do that but let's just save some time and Skip past that let's set up our transform for our training set and also our validation set because we don't want to have this same augmentation steps in both now we can go down and create our instances of our postestimation data set which is act like our data loaders so we just need to specify the data directory image directory Json files the transforms that we want to use its links its colors and also our keypoint colors which we're extracting from our configuration file we're going to do the exact same thing both for our training data set our relations data set and also our test data set so now we have everything we should be able to run this so if it's able to run this we have loaded in our data and also all of our annotations and now we can see that we're printing 15 15 15 which means that we have 15 key points extracted for all of these different validation splits now we can set up our data loaders so train data loader parameters and we can also extract those from our configuration files or set them directly in here so for the data loaders we need to use pytorch so torch. .data import data loader we throw in our training set motation set and also our test dat data set together with the parameters so let's run this and let's see if we're able to do that now we have a whole data set and now we're ready to go we have all the data and we can train our models now we're going to create an instance of our model you guys already know how to do that we're just going to do the exact same thing as we did when we ran inference in the start of video so now models. getet let's use the large model the number of classes we need to specify that when we're creating our own custom models so the number of joints which will just be 15 and we're also going to use some pre-train weights for the Coco POS so we're just going to fine-tune this model on our own custom data set if you have a large enough data set you can also train it from scratch we can probably do that with this tiger data set but let's just save some time and go and find tun it instead on some pre-trained weights let's run this and see if we're able to create an instance of our model there we go now we need to set up some different training parameters our maximum number Epoch laws our optimizers that we want to use and also some metrics that we want to take a look at while we're training so we can do that down here at the bottom so first of all we have some non- Maxima intersection over Union threshold post confidence threshold that we need to set up the configurations for our number of joints and also our sigas so these needs to be the exact same length in our configuration file as I went over earlier the Keyon colors it's colors it it's length and so on for our visualization call back and then we can also have early stopping in case our model is not improving any longer for each Epoch now we have our train parameters there's probably a bunch here but you guys are probably familiar with most of them we can have some initial learning rate learning rate warm-up Epoch and also our warm-up initial learning rate but this is probably the learning rate that you can go in and fine tune a bit you can specify the number of epoch so how many iterations do we want to run our full data set through in our model when we're doing fine-tuning we can also specify when we want to save checkpoints for a number of epo specified loss so YOLO Nas post loss if you guys don't know like how to find two each of these individual parameters just use these default ones they are pretty good and you will get some good results anyway if you want to tune some try to tune the learning rate how long you're training like how does your model actually converge and you can also probably like play around with your Optimizer so now we have everything we can start the training of our model we specified a model Yol as pose the training parameters train underscore params and the training loader and also our validation loader so it's not just run it here start the whole training first of all it's go just going to do some setup we can see where it's actually like storing our checkpoints so we first Yol Nas Post run and then it's going to store it inside of this one and here it's going to store all the weight files during each iteration so for every single Epoch it's going to store the best model and average model and also the last model for our Epoch we see some information the mode we're running single GPU one dpu here full data set size so we have 3,600 images bad size per dpu 16 and you can also see a bunch of other stuff in here now we can see that the training has started train Epoch Zero 9 out of 225 in this first iteration so this is basically just the number of our data divided by a bat size so each individual one here is a single batch so it's basically just going to run through the whole data set optimize our weights and it's going to go to the next depod and we just keep doing that until our metrics and our model converges and we get good results that we can use an exp for in our own applications and products so let's not just let it run here for some time and get back once it's done training and we can see how we can either resume the training if our results is not good enough yet or how we can use the models to do inference and actually do predictions on our own images with our custom trained Yol as post model so our model is now done training let's take a look at the results we end up with an errors position here of 60 and also errors recall of 74 we can also go up and take a look at the losses here so the post class loss relatively low loss intersection of Union and also the loss class here for our validation set we can also go a look take a look at the metrics for our training so right now we're only training for 5 e par to start with we can always go down and resume the training but let's go up here to the start then we can actually see that our um error position and also the recall is very low so it's pretty much close to zero and then it's basically just learning over time so here after the first Depo we can now see that it has improved by uh 36 so here you can actually see in green the improvements and in red um where it acts like decreases from EPO to Epoch or for every iteration so it's basically just learning over time here now we can see that the AR position and aror recall is increasing still now we can see that we end up at around here so one the the one I mentioned by 6 and also 74 we can also go down and resume the training so we can set our train Ram is resume equal to true and also maximum number of epoch equal to 8 then we can just train our model again it will use the earlier checkpoint and it will keep training your model from that Epoch right now we're just training for three more because before we had five Epoch now we have eight so we're going to train our model for three more Epoch to see if we can squeeze out a tiny more accuracy of our model of course we can go and tune High parameters train the model for longer and so on just to make fully sure that it's converted but this is a rather large data set and will take very long to train that model so definitely try some small dayss also have the one that I showed you on radlow try to test that out with only like 100 images or 200 images so here let's go down we can see that our model is still improving after we're training it for a bit longer or resuming our training if we go all the way down to the bottom we can actually like see that our model um act like increases from 6 to 65 and also here to uh 76 so let's now go down and see how we can use this best train model after you finished your training and you're satisfied with the results then we can go down and create a new instance of our model we need to choose the same architecture that we trained it on and we also need to specify the number of joints or the number of classes that we have trained our model on then we can specify the directory or path to our checkpoint best so if you go up inside our checkpoints we have our first y s Post Run we have our run here and then we get our average model checkpoint uh fifth here our checkpoint best if you just right click and hit download we can download the models to local computer so we can use them in our own applications and projects you'll just get like a blue circle here it will take some time here in Google collab before it actually like downloads to your computer but just let it run here and the files will download now we can go down and evaluate our best train model on the test set but this is just for the validation set as well so we should get the exact same values as before but let's just run it here for good practice after we've done that we can now go down and do predictions with our best train model now we create an instance of it we evaluating it and then we can do predictions with it so this is exactly how you would do if you're using it in your own custom applications and projects so yeah we get the same values now we can do predictions with the best model so let's now go in and act like just specify an image URL so tiger data set let's go inside our validation data set then let's go and grab one of the images so let's go with this one here let's just open it up to start with to see um direct act like image before we feed it into the model so let now go just go over copy the path paste it in here run this program or like run this block of code we need to set the confidence score we can play around with that as well if you're not getting any detections so here we can see the key Point detection for this line uh in the face it's a bit off and also for the front legs here could be because of the the the low image quality but this is still like fairly good we have like the hip we have the shoulders we have the we have all the connections in between the different key points and these are the correct predictions we can see here the predictions in the rear legs are act like significantly better in compared to the front legs which seem to be shifted a bit might be because of this shoulder joint here or like this shoulder position which is probably should have been a bit more forward so yeah this is just one of the examples that we we have we can just run through like a couple of more just take one more take a look at that and see how that works so let's just scroll further down let's see if we can find another one so let's try to go with this one okay so let's go with this one it's still blurry a bit but again you can test it out on your own data set right now I'm just testing it on dev validation set just to make sure that we haven't trained these images before okay so it didn't change it so let's just change it manually 436 there we go let's not run it just keep the same confidence score let's try to see the results for for this one so this also looks pretty good for the rear legs there's something with the rear legs and the front legs we have all the key points here maybe the left foot here or like the right foot here right rear foot is is a bit off and the front leg is off as well also the face here is now off the eyes is probably probably pretty close but uh but definitely like the nose here is a bit off again could be because of the shoulders the shoulders should probably have been a bit further up here where we have the back of the line looks pretty good so you can download the model test it out on your own go through some more examples examples try it out on some other data sets also use the ones that I have in Ro flow you can use that directly in this Google cab notebook everything will be available down in description so you can use it directly for training your own yolan custom post models so that's it for this video here guys I hope you have learned a ton and you're now familiar with how you can do custom Training of a yolan Nas post estimation model on your own custom data set assured your whole pipeline try it out on your own applications and projects try to train your own models it is really cool to play around with so if if you guys want to dive more into like AI computer vision how we can use optic tracking optic detection hold pipelines again or if you want to learn how to read research papers Implement architectur from research papers basically just take an architecture from research paper implement it into code I have a bunch of different courses it also be down description here you can go in and check them out we have a whole Community we have all the code available structure courses with videos and quizzes so if you're interested in learning more definitely check those out as well or else I just see next video guys until then happy learning
Info
Channel: Nicolai Nielsen
Views: 6,653
Rating: undefined out of 5
Keywords: deep learning, yolo, yolov8 vs yolonas, yolonas, yolo nas, yolo-nas, yolo-nas google colab, yolo-nas vs yolov8, ai, computer vision, deci ai, pose estimation, pose estimation python, state of the art pose estimation, how to train pose estimation model, yolo-nas-pose, Yolo-nas-pose, fine-tune pose estimation model, Yolo-nas-pose fine tune, animal pose estimation, Yolo-nas-pose vs yolov8-pose, custom pose estimation, yolo-nas pose custom
Id: J83ZvWfxjoA
Channel Id: undefined
Length: 31min 58sec (1918 seconds)
Published: Tue Feb 06 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.