Object detection yolov8 | How much data you need to train a machine learning model?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

So on today's tutorial we are going to make an experiment we are going to find out how much data we need to train a machine learning model, there is a huge misconception in machine learning which says that the more data you use to train a model the better and that is not true that is false because the more data you use to train a model means you are going to spend more time training the model and it also means you are going to spend more time annotating the data and curating the data so the more data you use to train a model is just going to make the entire process way more expensive, it's not about maximizing the amount of data you use to train a model but it's actually the other way around you want to minimize how much data you use to train a model, obviously you want to achieve a very good performance you want to achieve a very good accuracy when you are training a machine learning model but for a given threshold of performance you want to minimize how much data you are using to train the model, now let me give you more details about the experiment we will be doing today I have already prepared all the datasets you can see over here each one of these directories is a different dataset and the difference between all of these different datasets is how much data we have in each one of them right we are going to train a machine learning model with each one of these datasets and then we are going to compare their performances, we are going to train an object detector so so we are going to train a model in order to detect objects and we are going to use yolov8, so we are going to train an object detector with yolov8 and the difference between each one of these datasets is how many images we have in each one of them, and for example this one which is called 10 means we have exactly 10 images in this dataset, this one which is called 50 means we have 50 images in this dataset, this one which is called 100 means we have 100 images and so on then we we have these other datasets which are comprised of 200 500 1,000 2,000 and 4,000 images, remember we are going to train an object detector with each one of these datasets and then we are going to compare their performances so we're going to use yolov8 in order to train this object detector and we are just going to use all the default parameters as you can see over here the only parameter we are going to specify is the number of epochs which we are going to set in 20, so we are going to train each one of these object detectors for 20 epochs and we are going to take the model we produced at the end of the training process at the end of the 20 epochs, and then the only thing we're going to do is to compare the performances of all these models so this is exactly the experiment we will be doing today, so this tutorial is about showing you this experiment is about showing you the results but it's not really so much about showing you how to do all the training process right, how to train an object detector using yolov8 on a custom dataset, no, this is only about showing you this experiment, if you want to know how I trained this object detector then I invite you to take a look at other of my previous videos where I show you the entire process of how to train an object detector using yolov8 on a custom dataset and this previous video oh my God I show you absolutely every single detail which is involved in this process, from how to annotate the data, how to train the model, how to evaluate the performance of the model and so on so if you want to know how I trained this model how I trained each one of these object detectors I invite you to take a look at that video over there but for now let's continue now let me show you the data I used to train this model, we are going to train an object detector in each one of these cases we're going to train an object detector to detect ducks so this is the data we will be using today you can see we have many images of ducks and this is the data we are going to use in order to train all these object detectors, in each one of these cases you can see over here the in each one of these datasets the only thing I did was sampling, to take a sample, of the images you can see in this directory so for example in the dataset which is comprised of only 10 images I took 10 images at random from this directory then for this other dataset which is comprised of 50 images I took 50 images at random from this directory and so on so the only thing I did was taking this dataset and just taking a few images at random in order to generate each one of the datasets you can see over here right and this is a dataset I downloaded from the Google open images dataset version 7 which is an amazing dataset with a lot of images a lot of categories a lot of annotations millions of annotations you can use in order to train your machine learning models and if you want to know how I downloaded this data from the Google open images dataset version 7 I invite you to take a look at this other previous video where I show you the entire process of how to download an object detection dataset from the Google open image dataset version 7, this is an amazing video I show you absolutely every single step of this process but this previous video is not available in my YouTube channel but this is available in my Patreon, so it's available to all my Patreon supporters this is all about the experiment we are going to do about the data we are going to use let me tell you something else about the experiment remember we are going to train each one of these object detectors for exactly 20 epochs, we're going to take the model we produce at the end of the training process and we're going to compare the performance of this model and we're are going to compute the performance of this model on a test set which is comprised of 100 images and this is very important we are going to use always the same test set of 100 images so we are going to change the datasets we use as training set but as a test set we are always going to use exactly the same dataset of 100 images, this is very important please remember although we are going to change the datasets we are going to use as a training set the test set is always going to be the same this is very important because otherwise the experiment doesn't make make any sense whatsoever right, so this was another thing which was very important let me show you a few examples I'm going to open for example this two directories over here, the one that's comprised with 10 images and the one that's comprised with 50 images and you can see that the data is already in the format we need in order to train a model with yolov8 and if I open this directory which is images, I'm going to open images in each one of these directories you can see that in this case the training set is comprised of 10 items and the test set is comprised of 100 items, now if I show you the other directory the other dataset you can see in this case the training set is comprised of 50 items but the test set is also comprised of 100 items now I'm going to open the test set in each one of these datasets and you can see that we have exactly the same images in each one of these test sets right because we are going to use exactly the same 100 images in order to test the performance of each one of the object detectors we are going to use today, the only thing we are going to change is the training set but the test set is going to be always the same please remember this is very important and otherwise the experiment doesn't make any sense right we need to use exactly the same test data in all cases so this is exactly the experiment we will be doing today and now let let me show you the results because remember this video is not about showing you the... how I trained these object detectors but this is only about showing you the experiment and the results and so on so I'm going to take this script I have over here and this script is going to take all the results from all the training process from all these different datasets and it's going to take all the data and it's going to take all the performance and it's just going to produce a few plots we are going to use in order to make this experiment... in order to analyze all the results from this experiment you can see we have two plots over here one of them is the mean average precision in the last epoch as a function of dataset size and I'm talking about the mean average precision in the test set right, and you can see we have a plot which has the mAP in the Y axis and the dataset size in the X axis and then we have also this other plot over here which is the training time as a function of the dataset size we have the training time in the Y-axis and the dataset size in the X axis, so let's get started analyzing this plot we have over here you can see that the mean average precision, the performance, is increasing as we increase the dataset size right, we start with a mean average precision of around 60% and then in the last dataset with 4,000 images we have a mean average precision of 91.1% so the mean average precision increases as we increase the dataset size but please notice that although we are always increasing the mean average precision, in some cases we are not really increasing that much right in some cases it's only a very small Improvement right for example from... for example from 50 images to 100 images you can see that we have pretty much the same performance we have pretty much the same mean average precision and if I show you over here for example if I show you these four models over here you can see that the mean average precision although it's increasing with the dataset size is not really increasing that much, for the dataset with 500 images we have a mean average precision of around 86%, I'm looking at this number over here so you can see this is 86% and in the case of a dataset with 4000 images we have a 91% so it's increasing but it's only increasing a little it's only like a very very small Improvement but if we look at the training time as a function of the dataset size you can see in this case the training time is increasing exponentially right you can see that this is growing exponentially and let me do exactly the same as before I'm going to take these four models over here and I'm going to compare the performance with these four training times over here and you can see that although we have only a very small Improvement in the mean average precision, we have a huge increase in the training time right if we take this model over here the one we trained with 500 images and the one we trained with 4,000 images if we take this two models you can see that we are just improving the performance... we're just improving the mean average precision in something like a 0.05% right something around a 0.05% because in this case we have a 0.86 mean average precision and in this case we have a 0.91 mean average precision, so it's a very small Improvement of only 0.05% but if we look at the execution time... at the training time you can see that the training time increases by a factor of seven right, if we take this value over here which is 500 seconds and if we compare with this other value over here which is 3,500 seconds you can see that it's increasing by a factor of seven so it takes seven times more time to train a model with 4,000 images that the time it takes to train a model with 500 images, we have a very small Improvement in the mean average precision but we have a huge increase in the training time so that's the first conclusion we should take from looking at these plots, this is very important because remember it's not only about achieving the best performance, the highest performance, but you have to look at many other factors and if you are taking much more time to train the model and you are not really gaining a lot of performance you're only gaining like a very very small performance then maybe it doesn't make any sense right, you will need to make a conclusion in each particular case if it makes sense or not but I would say that in the most generic case maybe it doesn't really make a lot of sense now let me show you another way to to evaluate the performance of all these models which is looking at some results right remember from my previous videos I always told you that yeah the mean average precision is important and all these metrics are very important but at the end of the day the most important thing is to look how it performs with a few images right with a few samples with a few videos so I prepared this video over here let me show you and in this video we have many ducks which are just doing nothing... or they are just like walking in the water... or actually they are swimming... or they are doing something, I'm not sure how this action is called right, because they are doing something I don't know how this is called but it doesn't matter we are going to use this video which has many many ducks in order to see how each one of these object detectors we trained over here how each one of these models performs right so let me show you the results this is a very important test always remember yes look at the mean average precision look at all these numbers but also take a look how it performs on a few videos on a few images because otherwise it doesn't make any sense so these are each one of the videos I produced with all the results and I'm just going to open each one of these videos one next to the other so it's going to be much easier in order to evaluate all of them at the same time, so these are the results, and you can see that for example in this case this is the video I produced with the model I trained with only 10 images then in this other case this is the video I produced with the model I trained with 50 images then this is the video I produced with the model I trained with 100 images and so on these are all the results from all the models and you can see that in these two cases, in the case of the... when I used the dataset of only 10 images and the dataset with 50 images we are not really detecting anything at all, the mod doesn't perform well at all we're detecting nothing we're not detecting any duck whatsoever so this is the first thing we should notice then in this other case in the model I trained with 100 images you can see we are detecting something it doesn't perform very well we have many missdetections and it's not really very stable so it doesn't work very well but you can see that at the very least we are detecting something and then for this other model... with 200 images it also performs okay we have a few missdetections and so on but it's okay and the same happens for this other model with 500 images, with 1,000 images you can see it's okay but it's not perfect then I would say this other model with 2000 images it performs better I would say I really like how it performs and then this other one I trained with 4,000 images it also performs very well as well, so these are all the results from all the models and there are many conclusions we can take from here, the first one is this situation we have over here that we are not detecting any duck whatsoever with these two models we trained with 10 images and with 50 images and if we go back to the performance plot, to this plot over here you can see that for the model we trained with 10 images we have a 60% mean average precision and with the model we trained with 50 images we have something like a 73% mean average precision so if we look at the mean average precision on itself we would say oh okay it doesn't really perform that bad right it's like okay it's like an okay performance it's not perfect but it's like an okay performance, 60%, 73%, but if we look at some very specific values if we look at these videos you can see that the 60% and the 73% doesn't really mean anything at all because we are not detecting anything we're detecting nothing whatsoever so these numbers the 60% and the 73% % doesn't really say a thing doesn't mean anything it's completely meaningless in this case right this is a very important conclusion this is one of the reasons why I always tell you yes look at the mean average precision look at the accuracy look at all these metrics but also look at how it performs with a few images with a few videos because otherwise this may not be very relevant right this may be meaningless and also a very important conclusion that if we look at the video we produced with the model we trained with 100 images we can see that it performs okay right many missdetections is not stable at all but it's like okay we're detecting something at the very least we are detecting something and if we look at the mean average precision in these two cases in the model we trained with 50 images and the model we trained with 100 images we can see that the mean average precision is pretty much the same in both cases we have something like a 73% mean average precision so we have exactly the same mean average precision but the performance is completely different right this is the video we produced with 50 images and this is the video we produced with 100 images so the performances are completely different right with 100 images we are detecting something at the very least we are detecting something and in the other case we are not detecting anything whatsoever so that's another very interesting conclusion the mean average precision... it's important it's important to take it into consideration but also take many other things into consideration because if you look at the mean average precision on itself... it doesn't say anything and you can see this is a very good example it doesn't say anything at all so this is another very interesting conclusion and now let's take a look at this other video again with all the results and if I were to choose if I were to select the best models based on this performance we have over here I eould say that the best models are these two, the one we produced with the model we trained with 2,000 images and the one we produced with the model with 4,000 images I would say in these two cases the detections are more stable and we have the least amount of missdetections right we are detecting all the ducks and everything looks very stable and if you ask me I don't really see a huge difference between these two videos I would say they perform pretty much the same now let's get back to these other plots, now I'm going to focus on this one over here which is the training time as a function of the dataset size and now let's take a look at the training time of these two models we have over here and you can see that for the model we trained with 4,000 images we spent pretty much twice the time to train that the model we trained with 2000, images so they... both of them perform pretty much the same based on the example I showed you but this model took twice as long to train that the other one so in this particular case it seems it doesn't make any sense to train the model with more than 2,000 images because you are not really improving the performance that much and you are just wasting a lot of time and therefore you are wasting a lot of money right so this is another very interesting conclusion from these results so this is going to be pretty much all for this video this is the experiment I wanted to show you in this tutorial let me know what you think about this video in the comments below and let me know if you would like me to make other similar videos in the future with other type of experiments, I have other ideas of other experiments we could make in other tutorials but let me know what you think about this video first in the comments below so this is going to be all for this video my name is Felipe I'm a computer vision engineer and see you on my next video.

Info

Channel: Computer vision engineer

Views: 1,638

Rating: undefined out of 5

Keywords:

Id: 8YXk_zcllC8

Channel Id: undefined

Length: 20min 48sec (1248 seconds)

Published: Mon Feb 26 2024