Object detection with Yolov8 | Data issues vs model performance

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
one of the most challenging parts of building  a computer vision model is that you need a huge   and super clean dataset in order to build this  model and the way usually goes in real life, in a   real project is that you don't have access to a  huge and super clean dataset in order to build   your computer vision model and the data you do  have access to may have different type of issues   or may have different type of challenges, for  example it may be a very noisy dataset or the   data, the images may be in a different color  space for example the images could be in grayscale   or it may not be a huge dataset but it may be  comprised of only a few images, it may be a very   small dataset, and whatever the issue may be you  are a computer vision engineer and you need to be   able to train a computer vision model with the  data you have available with the data you have   access to and also remember that building a huge  and super curated dataset is very expensive and   very often happens that you just need to make  compromises with the data you use because you   don't really have the budget to build a super  huge and super clean dataset so you definitely   need to make compromises with the data you use  and you end up using data which may not be as   clean or as perfect as you would expect, so I  decided to make a few videos, a few tutorials   where I show you what happens when you train a  computer computer vision model using data with   different type of issues and in this tutorial in  the one you are currently watching I'm going to   show you what happens when you train a computer  vision model using a dataset which is comprised   of images in grayscale right we are going to  use these images you are currently looking at   in order to build a computer vision model and I'm  going to show you how this model performs so this   is the experiment we will be doing today we are  going to train two object detectors one of them   on this data that you can see over here which is  comprised of 1,000 images in grayscale and then   we are going to train another object detector on  this other dataset over here which is comprised of   exactly the same 1000 images but in BGR, we are  going to train two object detectors and then we   are going to evaluate how they perform on another  test set which is comprised of 100 images in BGR   so this is exactly the experiment we we will be  doing today and by doing this experiment we are   going to gain a lot of intuition regarding how  to train a computer vision model in real life   conditions, because remember the way it usually  goes in real life data will not be perfect most   likely data is not going to be perfect so you will  need to make some compromises and you will need   to have to use data which have different type  of issues and different color spaces is one of   the issues you will encounter with the data you  end up using in order to train a computer vision   model so this is going to be an amazing video and  now let me just show you the results because in   this tutorial this is only about showing you  the results from this experiment but it's not   about showing you the training process itself, I  trained this object detectors using yolo v8, using   this powerful and super amazing technology which  is yolo v8 and if you want to know exactly how   I trained these object detectors I invite you to  take a look at other of my previous videos where I   show you the entire process of how to train an  object detector with yolo V8, so please take a   look at that video if you want to know how I did  this training process and then about the data I   used to train this model I got this data from  the Google open images dataset version 7, all these   images you can see over here, I downloaded this  data from the Google open images dataset version 7,  this is an amazing dataset, I definitely invite  you to take a look at this dataset, you will have...   you will find I don't know how many categories  many categories many annotations millions of   annotations so please take a look at this dataset over  here you can see you can just choose between many   different categories and this is an amazing dataset  for sure, and also I invite you to take a look   at other of my previous videos where I show you  the entire process of how to download data from   this datet and I also give you a couple of scripts,  a couple of Python scripts which are just going to   make the entire process much simpler and this  previous video about how to download data from   the Google open images dataset version 7 is available in  my Patreon, to all my Patreon supporters now let's   continue, remember this tutorial, the tutorial you  are currently watching is only about showing you   the results from this experiment, so let me show  you the results, I'm going to show you first the   results of the training process of the model  I trained with the 1000 images in BGR right   with these images over here in BGR, and these are  the results from the training process remember   that when we are looking at these plots over  here... all the plots which are related to   the loss function which are these three over here  related to the loss function in the training set and   these three over here which are related to the loss  function in the validation set all of these plots   should go down as we increase the number of epochs  and you can see that this is exactly what's going   on in all of the plots you can see over here  so everything is okay for now and then if we   look at these four plots over here these plots  are related to the performance of the model in   the validation set and these four plots should  go up as we increase the number of epochs and you   can see that this is exactly what we can see over  here all of these plots go up as we increase the   number of epochs and for example the mean average precision  reaches a very good value around 85% or maybe 90%   so everything looks the way it should, everything  looks okay so so far everything is okay remember   these are the results of the model we trained with  the 1000 images in BGR, now let's see what happens   with the model we trained with the 1000 images  in grayscale and you can see that this is how   the plots look like in this case if we look at  the loss function in the training set and in the   validation set we can see that all of them are  going down as we increase the number of epochs so   so far so good and if we look at these other four  plots over here we can see that everything looks   okay as well right everything is going up and we  are reaching some very high values for example   around 80% mean average precision so everything  is okay as well so after a very quick evaluation   we could say that the model is able to learn the  distribution of the data very well because if the   loss function is going down in the training set  means we are able to learn something of the data   and if the the loss function is going down in the  validation set as well means we are able to learn   the same something both in the training set and  in the validation set so means everything is   going well right means everything is okay this  is only after a very quick evaluation looking   at these plots but let me show you something you  can see that in the model we trained with the 1000   images in grayscale you can see that we are  reaching a loss value around 1 in the last   epoch right you can see the loss value is around  1 and if I open the other plot of the model   we trained with the 1000 images in BGR you can  see we are reaching a much lower value of the   loss function in the validation set in the last  epoch which is something around maybe 0.9 right   so although we are learning the distribution of  the data in both cases in the model we trained   with the 1000 images in BGR and the model with  trained with the 1000 images in grayscale, with the model   we trained with the images in BGR, we are able to learn  the distribution of the data much better because   we are able to reach much lower values of the loss  function in the validation set that's one of   the conclusions we can get by looking at these  plots and also if we look at the mean average precision,   you can see that in the model we trained with the  1000 images in BGR we are reaching something   like a maybe 85% or 80 something percent and in the  model we trained with the 1000 images in grayscale   we are reaching something like an 80% so we are  also achieving a much better performance in the   model we train with the 1000 images in BGR, this  is a very interesting conclusion but now let's   take a look at some examples to see how the models  perform in each case so let's take a look at some   examples for this case over here I'm going to show  you first some examples of some predictions with   the model we trained with the 1000 images in BGR and  you can see that currently we are looking at the   annotations so this is the ground truth these are  the annotations all the red bounding boxes are the   annotations and if I show you the predictions  we got with this model you can see everything   looks okay we are able to detect all the ducks,  it may not be a perfect performance but we are   getting a very good performance nevertheless we  are able to detect all the ducks maybe we have a   situation over here we are just detecting a lot of  objects and we don't really have so many objects   but nevertheless I would say we are getting a very  good detection nevertheless now let's see how the   other model performs the one we trained with the  images in grayscale I'm going to open this one   over here remember all the red bounding boxes we  are currently looking at these are all annotations   these are the labels this is the ground truth so  the ground truth is exactly the same in both cases   but now let's take a look at the predictions and  you can see that the predictions look pretty well   with the model we trained with the images in grayscal  you can see that it looks pretty pretty   well the results are very decent they may not  be as good as with the other model for example   over here we are detecting only one duck and we  have many other ducks over here and also maybe   over here we have a missdetection over here we  have an error over here because we're detecting   two ducks and we only have one and so on we also  have an issue over here we are detecting I don't   know how many ducks but we only have one if I show  you the other model performance again you can see   in this case we have a much better performance  we are detecting only one duck over here we are   detecting only one duck as well... over here we are  detecting... everything seems to be better as well   yeah we are detecting more than one duck so yeah  the model we trained with the 1000 images in BGR   is performing way better we can see some results  some very specific examples and is performing much   better but nevertheless the model we trained with  the 1000 images in grayscale I would say it   performs very well nevertheless I would say this  is a very decent performance these are some very   decent results so by using a dataset which is  comprised of images in grayscale although we   are missing a lot of information because if we are  using data which is in grayscale we only have one   channel for every single one of these images  and the images in BGR have three channels so   they have way more information, if we use images  in grayscale we are missing a lot of information   but nevertheless the model we trained is very decent  anyway, over here the results we got are   very decent anyway so if you are ever in a  situation in which for whatever reason using this   type of dataset, using images in grayscale is  cheaper, is more affordable than using images   in BGR, for whatever reason that may be, based on  this experiment it seems that this is not really   a huge issue at all and that we are able to train  a very powerful object detector even with the   images in grayscale so this is a very interesting  conclusion from this experiment obviously that   this is only a very dummy experiment and we will  need to make more tests and we will need to make   more experiments in order to make like a more  general conclusion or a more robust conclusion   but nevertheless I would say it's a very good  experiment in order to have like an overall idea   of what happens when your images are in grayscale  instead of BGR so this is going to be all for this   tutorial, if you enjoyed this video I invite you to  take a look at other of my previous videos where I   showed you a similar experiment regarding the size  of the dataset we use to train the model, I show you   what happens when you train a model with 10 images  with 50 images with 100 images, 1000 images and   so on, how the size of the dataset affect the  performance of the model we trained and I also   invite you to take a look at other of my previous  videos where I show you what happens when we train   a computer vision model using a very noisy  dataset, a data set with many annotations which are   just wrong which are just not enclosing any object  whatsoever, this is a much more important case   in computer vision so please take a look at this  other video as well and this other video is not in   my YouTube channel but is available in my Patreon  to all my Patreon supporters, so this is going to   be all for today my name is Felipe, I'm a computer  vision engineer and see you on my next video.
Info
Channel: Computer vision engineer
Views: 1,320
Rating: undefined out of 5
Keywords:
Id: pluJWGpVh5Y
Channel Id: undefined
Length: 13min 56sec (836 seconds)
Published: Mon Mar 25 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.