Object Detection on Custom Dataset with YOLO (v5) | Fine-tuning with PyTorch and Python Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys in this video we're going to have a look at how we can fine tune euro version 5 using our custom dataset and see how well does it perform on some images that we have in our test set let's get started euro stands for you only look once and the most current or latest version is known as euro version 5 note that there is a huge controversy going on right now about the naming of this project and there is a post on Hacker News that at least the first the first comment right here says that this is just [ __ ] and there is a lot of accusations about calling out the authors of Euro or the author of your version 5 and yeah I'm not going to spend any more time on this issue but just be warned that this is something Kong going and this video is going to focus on this repo which is done by ultra lytx euro version 5 so we're going to have a look at that one euro version 5 is completely written in Python and in my opinion this project is very well done this implementation is heavily based on your version 3 and the experience that the guys at Ultra lytx have using euro version 3 at least and this is somewhat what they say and this model appears to be very efficient in a very well performant at least compared among the Euro implementations of course you can see here that efficient that which is a project or a model introduced by Google is more or better at this task and the task that we are talking about right here is a real-time object detection and I'm going to have a look at the leaderboards there in a minute and you can see that the authors acknowledge that in the future they might include updates using even some of the features which are presented in euro version for euro version 4 was introduced in the paper euro before optimal speed and accuracy for object detection of course this paper is available on archive and the paper at or at least this version is available as April 23rd so you can see that right here we have a similar chart comparing the MS Coco of the performance on euro version 4 as compared to the efficient debt on the MS cocoa object detection validation set and you can see at least here in this chart that the old version 4 gets an AP of around 44 while this yeah while the euro version 5 at least the largest the largest model gets around 47 48 I would say so at least from this very bad board barebone comparison we see that your version 5 might be better performing better so this paper introduced a lot of cool new stuff I mean like cool new features and a cool new ways to speed up your version 3 and here we are seeing a chart or image that is available in this paper and this is the main difference between a one-stage detector and two-stage detector yo if all the your implementations are known as one-state detectors so they basically get some input and they have some backbone network and then they produce those dense predictions and the purpose of all this is to have something that is very fast and runs in real time so what this means is like something maybe 30 or even 60 plus frames per second so yo are very good and those tasks and if you want more accuracy if you don't care that much about the speed or the speed of the influence of your model then you might want to use something like retina net or faster RNN or faster or CNN or something like that when you use those you have a second stage which is essentially a qualifier which states or which says that those bunch of predictions that we have from the first stage of the detection for from something like Yahoo then you have an additional stage in this classifier right here gets the prediction or the dense predictions and classifies each one as whether around how believable or how probable this really this prediction really is so yeah of course two stage detectors are fast are more performant but they're smaller and there is a trade-off depending on the task that you want to handle finally we are going to have a look at the leaderboards provided by the papers with code which are of course bought by Facebook research so on the real time object detection on cocoa at least this task defined right here you can see that efficient that which the largest amount of parameters is performing very very well compared to the Euro v4 but the frame rates and I believe that this is done on test for visto I'm not really sure about that but we have around nine frames per second with the efficient debt which is very poor but with your version four we have 62 frames and at least in the hacker news we have a post right here by some guy or girl I'm not sure and we have at least this breakdown which states that euro version 5 is much smaller which it's much faster and is right about there on the accuracy so you might expect that of course this again can be controversial but you might expect that you get a similar accuracy right here but much faster performance or inference speed using euro version 5 so in the next video we are going to try out your version 5 maybe on a mobile device and see how well does it perform in the real world we are going to compare euro version 4 with the state of the art object detectors which are not that interested in how well the accuracy goes so here we have a leaderboard again on paper sweet code and you can see that this detector s or something like that which was introduced during this year and it has the best results on object detection again using the cocoa that's it and if you look at this result we have almost 55 of box ap so if we go and distance for average precision ap yeah if you look for your version 4 which is the first result right here we have forty three point five so this is very very wall result at least compared to the first model and you can see that the leaderboard position is a very whoa as well all right it is time to start coding and I'll open up Google co-op notebook and right here I am going to start with checking the current GPO that we have for this machine and we have a test for p100 with 17 gigabytes of vram and right here I'm going to install the three common line tool and then I'm going to copy and paste some of the requirements for the euro version 5 and right here you can see that I'm installing PI torch 1.5 1 which is currently the latest release which contains some very important bug fixes and I'm installing also the torch version 0.61 we are fixing or specifying than the version of non PI which is required by the euro version 5 project we are going to also use PE mo for some configuration files and then I'm installing the cocoa API which is again required by the euro version 5 project and if I run this this will go ahead and start downloading everything after the installation is complete we are required to restart the runtime so I'm going to do that and I'm going to install the final dependency which is going to be a project known as apex which is provided by Nvidia so I'm going to paste in the command right here and this and run it so what this does is roughly quandary po and starting the installation project and what is this thing called apex so this is recommended installation from the author of euro version 5 and this is a package extension which is used for mixed precision computations so your version 5 can be trained using more than one GPUs and I guess they're using this mixed precision so they might be using FP 16 or FP 32 so different 14-point Precision's and this project speeds up the computations when you are using those kind of computations so this is an ID nice additional benefit when you're training your version 5 at least now that the installation of Apex is complete we are going to return basically the whole code that we've used to generate the data set and I'm going to run all of the cells below this one so this will set up some imports downloads the JSON file then we'll see the example image right here and next we are going to execute our create data set function for both the training and the validation set and then we are going to see the examples right here for at least one of the annotations the one thing that I did right here compared to the worst video is that I've get gotten the categories converted them to a list and then I've sorted them so we have reproducible cross names right here which is going to be important at least when we are training our y1 model so the basic equation is now complete and I'm going to start the fine-tuning of the euro version 5 I'm going to go to the github repo and get the cone URL copy that one then I'm going to get clone the repo right here alright and then I'm going to check that the repo is indeed here so here it is all right so next just for the purposes of reproducibility I'm going to enter the cup directory Euro v5 and then I'm going to check out a specific commit that I have tested the project with this might change into the blog post that I'm going to write and this might be a bit further if there are some books that need to be fixed in the current implementation so next I'm going to go to the repo once more and have a look at the type of models that we have right here so we have a pre trained check points for Euro v5 s n LX n SP P so for our purposes I'm going to take the best-performing model which is of course the slowest but it has the largest amount of parameters and I expect that this will indeed give us somewhat good performance at least in our case we are not interested in a real time object detection while you might be and you will have to take that into consideration depending on your hardware and the accuracy that you want to achieve so for this purpose I'm going to show you how you can create a specification for your dataset and then we are going to download or create a configuration file for the model that we are going to use so both of those files are stored in Google Drive and I'm going to show you their contents right now so I'm going to download the two files I'm going to copy and paste the commands so we are in the yo version 5 directory and you can see that we are downloading data coding dot llamo and yellow version 5x which is again the largest model so those files will be stored right here into data coding chiamo so let me show you the contents of this coding dot EML file so here we have cut to the training set and this if you recall from the last video is into this directory coding so the structure right here says that the data set should be outside of the yellow version 5 directory and we have the training and the validation set and if you recall right here we have the images and the labels so both of those or training and validations we have NC which stands for number of classes and we have nine classes and this one is really important the cross names should be the same as those that we print it out right here for all of the categories that we have because yeah you just want them too much so this is why the sorting right here was required all right so what about the next file which is this yellow version 5 X so let's go and check it and I'm going to download this as well open it and show you the config for this one so the essentially the the only thing that I've changed right here compared to the default file was this again the number of classes that we have right here so again we have nine and this is a specification of the backbone and the anchors which are the initial positions at which the bounding boxes should be looked for so this is pretty much unchanged and and the same thing that is going on with the current implementation so now that we have on think of those files we can continue and next we are going to basically follow the guide which says that yeah if if you want to train the yoga model you have to run this train dog by file and pass it some conflicts so I'm going to do to do just that I'm going to call the file and I'm going to specify the maximum image width of this which is going to be 64 sorry 640 pixels I'm going to specify that each batch should contain 4 images this is roughly how many can we fit into the memory of one p100 you might write 6 it might work as well but I'm not risking some out of memory exceptions here then we are specifying the number of epochs and I'm going to find you in this 438 box I'm going to specify the config for the data set and this will say that we are going to use the coding data said that with file just shown you next I am going to specify the config of the model which is again models yo V 5x y mo so this works very good and another thing that is very important to pass in right here is the weights so I'm going to do the pre-training with UV v x dot PT which is a pretty raincheck point for the PI torch implementation of your v5 and this will be automatically downloaded since those this checkpoint is not available currently into the directory of our project and I'm going to specify the name the model so I'm going to name it yo v5 x/d 5 X underscore holding and finally I want to cache the images into the proper format for maybe later training if you are inclined to do some fine-tuning or hyper parameter tuning conduct so the model the whole process has started at least for now and you can see that this is using CUDA apex so the actual apex implementation is being used and we have some of the hyper parameters that we are going to use for fine-tuning current model and you can see this is basically the whole architecture of the model another interesting thing right here is that this is actually starting tensor board for us all the results will be wallet right there if you are interested into checking those out and you can see that we are actually downloading the weights of the model so your will be 5 5 X dot PT and this checkpoint is around let's see do we have the information for this one I'm going to open the model and have a look at Yayoi v5 x PT so it's roughly under 200 megabytes at least that checkpoint and next you see that we are training for 30 epics we are analyzing some anchors which is basically a smart way to changed all those anchor values that we've seen into the model config file and fine-tune them a bit for our data set so we have 4 5 3 images that we are going to use for training and we have 51 examples or images for the validation and you can see that we are using roughly 10 gigabytes counting gigabytes of vram now that the training is finally complete you can see that we are done in approximately 0.45 hours so that's a bit strange anyway so we've done 30 epochs and you can see that the best and the worst weights were stored right here into the weights directory so here you can get your fine-tuned model or the check point for that with the best ma P so probably mean average precision so you can see that the final results are actually very good at least on the 50 images that we've tested right here another thing that you can see into the project structure right here is that you have some images and unfortunately of on any of those I don't think that any predictions were done and for example you can have a look at some training batches so you can see the annotations from some of the images including of course some image segments augmentations that were done by the euro version 5 next I want to show you this image which we are going to see in a minute and this one takes the information from the directory code runs and you can see that today we have sorry this is the runs for the tens report actually and results were taken from this text file so this is a summary of all of the metrics that were done during each epoch so I'm going to show you a brief port that is provided for from these results so from the youtubes I'm going to import what results and I'm going to call the port results function right here so this should take into account the charting preferences that we had and you can see that roughly we have Rico that was very close to one at some point but the precision was increasing up until the epoch 20 you can see that the classification was steadily decreasing and overall the project was very well trained and you might argue that you might benefit from training this model even for more epochs for example 50 or more all you might want to just go ahead and fine-tune the hyper parameters that were used so this is not something that I'm going to do at least for this video but if you are working on some real-world project I would suggest that this might be a viable next step of course if you are using your version 5 so next I am going to create some images on which we are going to do some inference and I'm going to select 50 images from the validation set and I'm going to take just the first 50 and I'm going to copy those into inference images so this is the folder right here so we have two images already and I'm going to copy some of the validation images right here so inference I made a typo and again I have an error so coding or I should exit this directory all right so hit 50 yep and this should copy the images right here we have those so from this directory I'm going to run the detect dot PI file and I'm going to pass in the weights that are best that we find out doing the best during our training and I'm going to specify that again I want the images to be in at max at 640 pixels I want the predictions to be at least 40 percent confident and I'm going to pass in the source which is again inference images so I guess this should be it but let me just check again the command yeah let's run this so this should be relatively fast the most the slowest part I guess is loading of the model and after that you can see that the inferences are done sub 0.1 second so if I open this and go to the output let's open up an image right here so you can see that for this the model is drawing this bounding box which is very good and it predicts with accuracy of 46 percent that this is indeed correctly identified and as jeans so let's open up another one this should be trousers I guess which is again very good again jeans I guess yeah the the labels of the bounding box are not very well done in this project so this is a jacket so this is very good and let's see that we have this image which was not part of the data set in any way and you can see that at least one of the jackets were detected right here so this is again on an image that the model haven't seen or haven't been fine-tuned on because all the images that we have right here are in this format so I would say that this model performs much better than I expected because yah yah models are not very well known for their accuracy but this does a very good job so now you know how to find him a yellow version 5 model using your own custom date set and you can see that the output or the best-performing model was stored as a check point which is available for download by you so in the next video I'm going to show you how you can use that checkpoint and wall it on a mobile device deployed right there and will be with a very simple mobile app on which we are going to use our own custom model so please make sure to watch the next video as well thanks for watching guys I'll soon the next one please like share and subscribe bye bye
Info
Channel: Venelin Valkov
Views: 69,645
Rating: undefined out of 5
Keywords: Machine Learning, Artificial Intelligence, Data Science, Deep Learning, YOLO, Object Detection, PyTorch, Python, Finetuning
Id: XNRzZkZ-Byg
Channel Id: undefined
Length: 30min 51sec (1851 seconds)
Published: Sun Jun 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.