Object detection yolov8 | How much data you need to train a machine learning model?

So on today's tutorial we are going to make  an experiment we are going to find out how   much data we need to train a machine learning  model, there is a huge misconception in machine   learning which says that the more data you use  to train a model the better and that is not true   that is false because the more data you use  to train a model means you are going to spend   more time training the model and it also means  you are going to spend more time annotating the   data and curating the data so the more data you  use to train a model is just going to make the   entire process way more expensive, it's not about  maximizing the amount of data you use to train   a model but it's actually the other way around  you want to minimize how much data you use to   train a model, obviously you want to achieve a very  good performance you want to achieve a very good   accuracy when you are training a machine learning  model but for a given threshold of performance you   want to minimize how much data you are using to  train the model, now let me give you more details   about the experiment we will be doing today I  have already prepared all the datasets you can   see over here each one of these directories is  a different dataset and the difference between   all of these different datasets is how much data  we have in each one of them right we are going to   train a machine learning model with each one  of these datasets and then we are going to   compare their performances, we are going to train  an object detector so so we are going to train a   model in order to detect objects and we are going  to use yolov8, so we are going to train an object   detector with yolov8 and the difference between each  one of these datasets is how many images we have   in each one of them, and for example this one which  is called 10 means we have exactly 10 images in   this dataset, this one which is called 50 means we  have 50 images in this dataset, this one which is   called 100 means we have 100 images and so on then  we we have these other datasets which are comprised of 200   500 1,000 2,000 and 4,000 images, remember we are  going to train an object detector with each one of   these datasets and then we are going to compare  their performances so we're going to use yolov8   in order to train this object detector and we  are just going to use all the default parameters   as you can see over here the only parameter we  are going to specify is the number of epochs which   we are going to set in 20, so we are going to train  each one of these object detectors for 20 epochs and   we are going to take the model we produced at the  end of the training process at the end of the 20   epochs, and then the only thing we're going to do is  to compare the performances of all these models   so this is exactly the experiment we will be doing  today, so this tutorial is about showing you this   experiment is about showing you the results but  it's not really so much about showing you how to   do all the training process right, how to train  an object detector using yolov8 on a custom dataset,   no, this is only about showing you this  experiment, if you want to know how I trained   this object detector then I invite you to take a  look at other of my previous videos where I show   you the entire process of how to train an object  detector using yolov8 on a custom dataset and this   previous video oh my God I show you absolutely  every single detail which is involved in this   process, from how to annotate the data, how to  train the model, how to evaluate the performance   of the model and so on so if you want to know how  I trained this model how I trained each one of these   object detectors I invite you to take a look at  that video over there but for now let's continue   now let me show you the data I used to train this  model, we are going to train an object detector in   each one of these cases we're going to train an  object detector to detect ducks so this is the   data we will be using today you can see we have  many images of ducks and this is the data we are   going to use in order to train all these object  detectors, in each one of these cases you can see   over here the in each one of these datasets the  only thing I did was sampling, to take a sample,   of the images you can see in this directory so  for example in the dataset which is comprised   of only 10 images I took 10 images at random from  this directory then for this other dataset which is   comprised of 50 images I took 50 images at random  from this directory and so on so the only thing   I did was taking this dataset and just taking a  few images at random in order to generate each one   of the datasets you can see over here right and  this is a dataset I downloaded from the Google   open images dataset version 7 which is an amazing  dataset with a lot of images a lot of categories a   lot of annotations millions of annotations you can  use in order to train your machine learning models   and if you want to know how I downloaded this  data from the Google open images dataset version 7 I   invite you to take a look at this other previous  video where I show you the entire process of how   to download an object detection dataset from the Google  open image dataset version 7, this is an amazing video   I show you absolutely every single step of this  process but this previous video is not available   in my YouTube channel but this is available in my  Patreon, so it's available to all my Patreon supporters   this is all about the experiment we are going  to do about the data we are going to use let   me tell you something else about the experiment  remember we are going to train each one of these   object detectors for exactly 20 epochs, we're going  to take the model we produce at the end of the   training process and we're going to compare the  performance of this model and we're are going   to compute the performance of this model on a test  set which is comprised of 100 images and this is   very important we are going to use always the same  test set of 100 images so we are going to change   the datasets we use as training set but as a test set  we are always going to use exactly the same dataset   of 100 images, this is very important please  remember although we are going to change the datasets   we are going to use as a training set the  test set is always going to be the same this is   very important because otherwise the experiment  doesn't make make any sense whatsoever right, so   this was another thing which was very important  let me show you a few examples I'm going to open   for example this two directories over here, the one  that's comprised with 10 images and the one that's   comprised with 50 images and you can see that the  data is already in the format we need in order   to train a model with yolov8 and if I open this  directory which is images, I'm going to open images   in each one of these directories you can see that  in this case the training set is comprised of 10   items and the test set is comprised of 100 items,  now if I show you the other directory the other   dataset you can see in this case the training set  is comprised of 50 items but the test set is also   comprised of 100 items now I'm going to open the  test set in each one of these datasets and you   can see that we have exactly the same images in  each one of these test sets right because we are   going to use exactly the same 100 images in order  to test the performance of each one of the object   detectors we are going to use today, the only  thing we are going to change is the training   set but the test set is going to be always the  same please remember this is very important and   otherwise the experiment doesn't make any sense  right we need to use exactly the same test data   in all cases so this is exactly the experiment  we will be doing today and now let let me show   you the results because remember this video is  not about showing you the... how I trained these   object detectors but this is only about showing  you the experiment and the results and so on so   I'm going to take this script I have over here and  this script is going to take all the results from   all the training process from all these different  datasets and it's going to take all the data and   it's going to take all the performance and it's  just going to produce a few plots we are going   to use in order to make this experiment... in order  to analyze all the results from this experiment   you can see we have two plots over here one of  them is the mean average precision in the last   epoch as a function of dataset size and I'm talking  about the mean average precision in the test set   right, and you can see we have a plot which has  the mAP in the Y axis and the dataset   size in the X axis and then we have also this  other plot over here which is the training time as a   function of the dataset size we have the training  time in the Y-axis and the dataset size in the X   axis, so let's get started analyzing this plot we  have over here you can see that the mean average   precision, the performance, is increasing as we  increase the dataset size right, we start with   a mean average precision of around 60% and then in  the last dataset with 4,000 images we have a mean average   precision of 91.1% so the mean average precision  increases as we increase the dataset size but please   notice that although we are always increasing  the mean average precision, in some cases we are not really   increasing that much right in some cases it's only  a very small Improvement right for example from...   for example from 50 images to 100 images you can  see that we have pretty much the same performance   we have pretty much the same mean average precision and  if I show you over here for example if I show   you these four models over here you can see that  the mean average precision although it's increasing with   the dataset size is not really increasing that much,  for the dataset with 500 images we have a mean average   precision of around 86%, I'm looking at this number  over here so you can see this is 86% and in the   case of a dataset with 4000 images we have a 91% so it's  increasing but it's only increasing a little it's   only like a very very small Improvement but if we  look at the training time as a function of the dataset   size you can see in this case the training  time is increasing exponentially right you can see   that this is growing exponentially and let me do  exactly the same as before I'm going to take these   four models over here and I'm going to compare  the performance with these four training   times over here and you can see that although we  have only a very small Improvement in the mean average   precision, we have a huge increase in the training  time right if we take this model over here the one   we trained with 500 images and the one we trained  with 4,000 images if we take this two models you   can see that we are just improving the performance...  we're just improving the mean average precision in   something like a 0.05% right something around  a 0.05% because in this case we have a 0.86   mean average precision and in this case we have a 0.91  mean average precision, so it's a very small Improvement   of only 0.05% but if we look at the execution  time... at the training time you can see that the   training time increases by a factor of seven right,  if we take this value over here which is 500   seconds and if we compare with this other value  over here which is 3,500 seconds you can see that   it's increasing by a factor of seven so it takes  seven times more time to train a model with 4,000   images that the time it takes to train a model with  500 images, we have a very small Improvement in the   mean average precision but we have a huge increase in  the training time so that's the first conclusion   we should take from looking at these plots, this  is very important because remember it's not only   about achieving the best performance, the highest  performance, but you have to look at many other   factors and if you are taking much more time to  train the model and you are not really gaining a   lot of performance you're only gaining like  a very very small performance then maybe it   doesn't make any sense right, you will need to  make a conclusion in each particular case if it   makes sense or not but I would say that in the  most generic case maybe it doesn't really   make a lot of sense now let me show you another  way to to evaluate the performance of all these   models which is looking at some results right  remember from my previous videos I always told   you that yeah the mean average precision is important  and all these metrics are very important but at   the end of the day the most important thing is to  look how it performs with a few images right with   a few samples with a few videos so I prepared  this video over here let me show you and in   this video we have many ducks which are just doing  nothing... or they are just like walking in the water...   or actually they are swimming... or they are doing  something, I'm not sure how this action is called   right, because they are doing something I don't  know how this is called but it doesn't matter   we are going to use this video which has many many  ducks in order to see how each one of these object   detectors we trained over here how each one of  these models performs right so let me show you   the results this is a very important test always  remember yes look at the mean average precision look   at all these numbers but also take a look how it  performs on a few videos on a few images because   otherwise it doesn't make any sense so these are  each one of the videos I produced with all the   results and I'm just going to open each one of  these videos one next to the other so it's going   to be much easier in order to evaluate all of them  at the same time, so these are the results, and you   can see that for example in this case this is the  video I produced with the model I trained with only   10 images then in this other case this is the video I  produced with the model I trained with 50 images then   this is the video I produced with the model I trained with 100  images and so on these are all the results from   all the models and you can see that in these two  cases, in the case of the... when I used the dataset   of only 10 images and the dataset with 50 images  we are not really detecting anything at all, the   mod doesn't perform well at all we're detecting  nothing we're not detecting any duck whatsoever   so this is the first thing we should notice then  in this other case in the model I trained with 100 images   you can see we are detecting something it doesn't  perform very well we have many missdetections and   it's not really very stable so it doesn't work  very well but you can see that at the very least   we are detecting something and then for this other  model... with 200 images it also performs okay we   have a few missdetections and so on but it's okay  and the same happens for this other model with 500   images, with 1,000 images you can see it's okay  but it's not perfect then I would say this other   model with 2000 images it performs better  I would say I really like how it performs and   then this other one I trained with 4,000 images  it also performs very well as well, so these are   all the results from all the models and there are  many conclusions we can take from here, the first   one is this situation we have over here that we  are not detecting any duck whatsoever with these   two models we trained with 10 images and with 50  images and if we go back to the performance plot,   to this plot over here you can see that for  the model we trained with 10 images we have   a 60% mean average precision and with the model  we trained with 50 images we have something like a   73% mean average precision so if we look at the  mean average precision on itself we would say oh   okay it doesn't really perform that bad right it's  like okay it's like an okay performance it's not   perfect but it's like an okay performance,  60%, 73%, but if we look at some very specific   values if we look at these videos you can see that  the 60% and the 73% doesn't really mean anything   at all because we are not detecting anything we're  detecting nothing whatsoever so these numbers   the 60% and the 73% % doesn't really say a thing  doesn't mean anything it's completely meaningless   in this case right this is a very important  conclusion this is one of the reasons why I   always tell you yes look at the mean average precision look  at the accuracy look at all these metrics but also   look at how it performs with a few images with a  few videos because otherwise this may not be   very relevant right this may be meaningless and  also a very important conclusion that if we look   at the video we produced with the model we trained  with 100 images we can see that it performs okay   right many missdetections is not stable at all  but it's like okay we're detecting something at   the very least we are detecting something and if  we look at the mean average precision in these two cases   in the model we trained with 50 images and the model  we trained with 100 images we can see that the mean average   precision is pretty much the same in both cases we  have something like a 73% mean average precision so   we have exactly the same mean average precision but the  performance is completely different right this   is the video we produced with 50 images and this  is the video we produced with 100 images so the   performances are completely different right with  100 images we are detecting something at the very   least we are detecting something and in the other  case we are not detecting anything whatsoever so   that's another very interesting conclusion the  mean average precision... it's important it's important   to take it into consideration but also take many  other things into consideration because if you   look at the mean average precision on itself... it doesn't  say anything and you can see this is a very good   example it doesn't say anything at all so this  is another very interesting conclusion and   now let's take a look at this other video again  with all the results and if I were to choose if   I were to select the best models based on this  performance we have over here I eould say that   the best models are these two, the one we  produced with the model we trained with 2,000   images and the one we produced with the model with  4,000 images I would say in these two cases the   detections are more stable and we have the least  amount of missdetections right we are detecting   all the ducks and everything looks very stable and  if you ask me I don't really see a huge difference   between these two videos I would say they perform  pretty much the same now let's get back to these   other plots, now I'm going to focus on this one over  here which is the training time as a function of   the dataset size and now let's take a look at the  training time of these two models we have over   here and you can see that for the model we trained  with 4,000 images we spent pretty much twice the   time to train that the model we trained with 2000,  images so they... both of them perform pretty much   the same based on the example I showed you but this  model took twice as long to train that the other   one so in this particular case it seems it doesn't  make any sense to train the model with more than   2,000 images because you are not really improving  the performance that much and you are just wasting   a lot of time and therefore you are wasting a lot  of money right so this is another very interesting   conclusion from these results so this is going  to be pretty much all for this video this is the   experiment I wanted to show you in this tutorial  let me know what you think about this video in the   comments below and let me know if you would like  me to make other similar videos in the future with   other type of experiments, I have other ideas of  other experiments we could make in other tutorials   but let me know what you think about this video  first in the comments below so this is going to be   all for this video my name is Felipe I'm a computer  vision engineer and see you on my next video.
Channel: Computer vision engineer
Views: 1,638
Rating: undefined out of 5
Id: 8YXk_zcllC8
Channel Id: undefined
Length: 20min 48sec (1248 seconds)
Published: Mon Feb 26 2024
