Train YOLO-NAS - SOTA Object Detection Model - on Custom Dataset

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I know I'm late to the party yolanas was released last week and judging by the pace of improvements in computer vision scene that's pretty much old news but I decided to take a bit more time to give you quite possibly the best training tutorial that you can find on YouTube today we'll cover everything from using models pre-trained on Coco data set and understanding yolonas output data structure through training and keeping track of all important metrics up to evaluation and using fine-tuned model for inference so as usual sit back relax and let me show you how to use YOLO Nas and how to fine tune it on custom data set but before we dive into the code let's take a step back and let me tell you a bit more about the model itself so according to Destiny which is the company behind yolanas this model is new state of the art when it comes to real-time object detection most notably beating yellow V6 and yellow V8 when it comes to accuracy and speed at the same time looks like is also a new state of the art when it comes to roboflow 100 which is a set of 100 data sets coming from roboflow universe that you can use to measure the ability of the model to be fine-tuned according to those materials yolanas is also very good at detecting small objects even in low inference resolutions which from my experience is usually very hard problem to solve and that's one of the things that we plan to test in our second Yolanda's video but for now let's jump into the collab and let's learn how to train yellow nasan custom data set by the way if you plan to use the model for Enterprise purposes as usual make sure to read the license enough of the talking let's dive into the code okay as usual we start in roboflow notebooks repository by the way we crossed 2 000 Stars quite recently I guess one week ago so if you haven't left the start until now make sure to do it right now and we we select the first notebook from the top train YOLO Nas on custom data set and open it in Google collab at the top of the notebook you can read a bit more about the model itself but we already covered that part so let's scroll a bit lower into before you start section and Trigger the first cell that will run Nvidia SMI command we obviously need to confirm that we want to run this notebook and after just few seconds we should be good to go the next thing that we need to do is to set up our python environment and to do that we will need to install free python packages yellow Nas is distributed via super gradients package and because the model is so fresh although I'm like a week late with that video the team still actively developed that model so for that reason we decided to pin the version of this package just to keep the python environment as stable as possible the two remaining packages are roboflow and supervision because we will use data set coming from roboflow Universe to train our model and we'll use supervision to display the results spoiler alert the installation process is actually quite time consuming and that's because super gradients package serves multiple purposes other than being home for yolonas model so as usual I will speed up that process and see you on the other side okay the installation is completed but there is one more thing that we need to do and that is restarting the runtime and it is important to restart the runtime not delete the runtime the difference is when you delete the runtime you lose all the packages you installed all the files that you downloaded are created when you restart the runtime all you do is restart python interpreter we need to do that because one of Yolanda's dependencies is pre-imported into python environment but an incorrect version and during the installation process we change the version of that library but who cares that problem only exists in Google collab you will not face it when you will run it locally or in Docker so let's just restart to do that we go into runtime and select restart runtime now we just need to confirm and after a few seconds we'll see that new runtime is initialized we can now open up resource tab and keep it open because it will be important during the training process it will influence our batch size selection I really believe that one of the first things that you should do when you start to play with new model even if you want to train that model on custom data set simply to load pre-trained weights probably on Coco data set and run in France on few example images but before we load the model we need to select the size of the model that we want to use YOLO Nas comes with three different sizes s m and L and as usual When selecting size of the model we need to deal with accuracy speed trade-off essentially the larger model we use the higher accuracy of predictions we can expect but at the same time the inference time is also going up in this tutorial we'll use l version but in real life you need to take into account all different factors maybe your model needs to run in real time maybe you have limited resources and you need to make sure that the model is small or maybe high accuracy is the main priority it all depends on your use Case by the way let me know if you would like to watch a video about model selection I was thinking about it but I'm still not sure so let me know in the comments if that sounds interesting to you let's hit the ground running and load the model into the memory so we select the device in our case it will be GPU accessible in Google collab and we select large version of the model loading takes a little bit of time but after a few seconds we should be fine to put the model to the test we need to have data this particular model is pre-trained on Google data sets so anything with people dog chairs and tables should be fine as usual I'm using images from my own Gallery you can use those tool or upload your own into Google collab or you can even download some example data set from roboflow Universe now we pick one of those image address loaded using opencv and push it through neural network you can see that the first inference take a little bit of time but then when I change the path and hit shift enter once again the second inference goes much much faster this is common problem with neural networks they very often need a little bit of work to reach their full performance capacity that's why you always should preheat your neural network when you do any benchmarking anyways now it's a good opportunity to take a look at the output format that is produced by yolonas you very often ask about this kind of insights in videos where we show new models so let's take a look for every image that you will use during the inference network will return image detection prediction object that object contains three public properties the first one is image this one is pretty self-explanatory it is pretty much numpy array containing the image we used for inference the second one is class names which is the list of categories that were used during the training and the third one is prediction not predictions uh like you can see I made that mistake and this one stores boxes in X Y X Y format confidence and labels all three are float 32 numpy arrays now you can see I'm creating new supervision detections object and I'm passing bounding boxes confidences and labels as arguments and when I hit shift enter we can see visualization funny enough you'll learn us detected uh Leo as both Doc and bird now let's change the path leading to the image to something different to run the inference and go back to our visualization now instead of creating detections object manually we can use one of the latest supervision features and use from YOLO Nas connector we just pass the result of the inference comment out the old implementation hit shift enter once again and we see the visualization awesome looks like we are finally ready to start the training I will use Bundesliga data set the data set that they actually used previously during my YOLO V8 custom Training tutorial if you have your own feel free to use that if you don't feel free to use mine or just pick another date data set from roboflow Universe I guess we have like 100 000 different data sets there so I'm pretty sure we will find something suitable for you before I can download my data set from roboflow universe and to authenticate myself so I just click the URL I get redirected into roboflow I click generate token copy the token go back to notebook paste it into input field press shift enter and my data set should be downloaded in just a second you can see that the data set is already in my collab environment now we can examine the location in my case it's under content football player detection one and you can see in file explorer it's their train test and validation subsets are also in that directory so far so good we are very close to start training but before we do that we need to select values of few key parameters the first one is model size I'm going for the large one but like I said we have two other options small and medium keep in mind that this decision May influence the training process larger neural network may take longer to train and require more memory to do that so if you don't have a lot of resources maybe going for a smaller architecture is the right choice for you next up is batch size this parameter basically dictates how many images will go through neural network with every iteration if you have large batch size the neural network will train faster but it will require more memory to do so before I started recording I done some experiments and for my neural network when I train in call up 8 is pretty much as high I can go anything over that may result in out of memory error which is something I really wouldn't like to happen in the middle of multi-hour training so let's keep it safe and stay at eight ah and we keep the batch size as the number divisible by two it helps with memory allocation although I saw some papers that argued with that strategy and said that with newer gpus it doesn't really matter anymore I'm not sure I'm used to using powers of two so that's what I do but if you have some experience with using batch sizes that are not divisible by two or are not powers of two let me know in the comments I'm actually super curious and I would like to learn more and the last parameter that we need to set is the EPO count I'm going for 25 and we can start training we start by importing the trainer and then we need to set the data set parameters those are pretty much information like data set location data set split or the name of classes that we are going to use during the training now we can use those data set parameters and pass them into our data loaders to Cache our annotations before the training we hit shift enter and we should see those small loading bars below the cell feel up to 100 percent our data is ready everything worked as expected now we need to pick the checkpoint that we are going to use as the starting point for our training I will use model that is pre-trained on Coca data set but remember you can use any checkpoint you want you can for example use that mechanism to restart your training if you feel that your model is not strong enough to keep track of our key metrics during the training we'll use tensorboard it will refresh automatically after every Epoch it is however important to trigger it before you run the actual trainer unfortunately Google collab won't allow you to run NSL during the training so it's very important to start it before finally the time has come we can hit the shift enter for the last time and actually start the training one thing that you have probably noticed is that training YOLO Nas is much more verbos than for example yellow 8 looks like it's more flexible at the same time but people who are used to Yellow V5 or yellow V8 would need to get used to the fact that you need to do some of the things manually stuff that with yellow V8 require just passing additional parameter into the CLI here Force us to import stuff from pip package and write a bit of python code the first thing that came to my mind when I was creating this tutorial was our recent tutorial for DTR with Transformers over there we also needed to write a bit of python script to train the model at the end it's just a preference but I believe that there are some people that would take that into consideration so keep that in mind in the meantime our training have started and I want to show you a spike in memory consumption that is visible right now on right hand side in the resource step if we would use a larger batch size we might go over the allowed 15-ish gigabytes of memory which would obviously result in training failure okay I think it's time to speed up the training process obviously I won't keep you here for two hours to see the results [Music] okay is Epoch number 13 I believe so we are somewhere in the middle of training and it is always a good idea to take a look along the way just to confirm that everything goes according to plan and like I said tensorboard is absolutely great tool to do that because those charts are being refreshed in real time and although yellow Nas is absolutely great with providing us with a lot of information during the training especially if you use verbose mode it is always better to look at the chart than to compare values in the looks and if you prefer logging mechanisms like weights and biases over the tensor board YOLO Nas supports that two out of the box okay now let me just show you a few charts that are accessible in the tensor board the one that I'm obviously the most interested in is map at 0.5 we see that we are still going up although the curve is flattening but the F1 score is still going up by a lot everything looks fine so let's speed up the rest of the training process and take a look at the final results and the training is done I know it was fast in real life it was like another one and a half hours of training something like that the video is getting a bit long so let me just quickly show you how you can use that model for inference and for evaluation the trained model is stored under average model pth file inside our experiment directory to load it into the memory we can use models that get method once again we need to pass the architecture the name of the classes that will be used during the inference and the path leading to our pth file elonas provides us with convenient way of evaluating our model we just call the test method of our trainer and provided with test data loader that we created few steps above we hit shift enter and after a few seconds we are are provided with the list of the metrics amongst them map metrics are great they provide us with deeper understanding of model performance but I decided it would be cool to use our trained model for inference on test images and visualize those results this way we can pick few images from our train set and compare side by side annotations and the predictions to straight away see some patterns emerging for example I see the model performs quite well when it comes to detecting players goalkeepers and referees but the model detection is quite off we can confirm that intuition by calculating confusion metrics which will show us model performance class by class and after just few seconds we get the nice chart that we can analyze and sure enough the two classes that are undetected most often are ball and referee and at the same time goalkeepers are most frequently misqualified as players in principle they are players they just have completely different role on the pitch and wear completely different uniforms now if during evaluation you will decide that your model underperforms you can go back and train it a bit more like I said before you can use your final weights as the starting checkpoint for the second phase of your training if you like your model you can download it from Google call app or I hope quite soon use it with roboflow deployment and that's all for today I hope that you find that video interesting I tried to show you everything that I learned about YOLO Nas in this short period of time I'm actually in the process of creating second part of the video when we will use weights that we trained and push yellowness to the limit when it comes to speed and ability to detect small objects if we'll have enough time maybe we will also compare it against yellow V8 let me know in the comment if that's something that you would like to see but that's all for today if you liked the video make sure to like And subscribe to stay up to date with more computer vision content coming to this channel soon my name is Peter and I see you next time bye
Info
Channel: Roboflow
Views: 17,950
Rating: undefined out of 5
Keywords: object detection, deep learning, yolo, object detection deep learning, object detection python, object detection tutorial, yolov8 vs yolonas, yolonas, yolo nas, state of the art object detection, yolov5 vs yolov8 vs yolo-nas, yolo-nas, how to train custom object detection model, real time custom object detection, yolo-nas custom object detection, custom object detection yolo-nas, yolo-nas google colab, yolo-nas vs yolov8, best object detection model, ai, computer vision, deci ai
Id: V-H3eoPUnA8
Channel Id: undefined
Length: 19min 48sec (1188 seconds)
Published: Thu May 11 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.