Object detection with Yolov8 | Data issues vs model performance

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

one of the most challenging parts of building a computer vision model is that you need a huge and super clean dataset in order to build this model and the way usually goes in real life, in a real project is that you don't have access to a huge and super clean dataset in order to build your computer vision model and the data you do have access to may have different type of issues or may have different type of challenges, for example it may be a very noisy dataset or the data, the images may be in a different color space for example the images could be in grayscale or it may not be a huge dataset but it may be comprised of only a few images, it may be a very small dataset, and whatever the issue may be you are a computer vision engineer and you need to be able to train a computer vision model with the data you have available with the data you have access to and also remember that building a huge and super curated dataset is very expensive and very often happens that you just need to make compromises with the data you use because you don't really have the budget to build a super huge and super clean dataset so you definitely need to make compromises with the data you use and you end up using data which may not be as clean or as perfect as you would expect, so I decided to make a few videos, a few tutorials where I show you what happens when you train a computer computer vision model using data with different type of issues and in this tutorial in the one you are currently watching I'm going to show you what happens when you train a computer vision model using a dataset which is comprised of images in grayscale right we are going to use these images you are currently looking at in order to build a computer vision model and I'm going to show you how this model performs so this is the experiment we will be doing today we are going to train two object detectors one of them on this data that you can see over here which is comprised of 1,000 images in grayscale and then we are going to train another object detector on this other dataset over here which is comprised of exactly the same 1000 images but in BGR, we are going to train two object detectors and then we are going to evaluate how they perform on another test set which is comprised of 100 images in BGR so this is exactly the experiment we we will be doing today and by doing this experiment we are going to gain a lot of intuition regarding how to train a computer vision model in real life conditions, because remember the way it usually goes in real life data will not be perfect most likely data is not going to be perfect so you will need to make some compromises and you will need to have to use data which have different type of issues and different color spaces is one of the issues you will encounter with the data you end up using in order to train a computer vision model so this is going to be an amazing video and now let me just show you the results because in this tutorial this is only about showing you the results from this experiment but it's not about showing you the training process itself, I trained this object detectors using yolo v8, using this powerful and super amazing technology which is yolo v8 and if you want to know exactly how I trained these object detectors I invite you to take a look at other of my previous videos where I show you the entire process of how to train an object detector with yolo V8, so please take a look at that video if you want to know how I did this training process and then about the data I used to train this model I got this data from the Google open images dataset version 7, all these images you can see over here, I downloaded this data from the Google open images dataset version 7, this is an amazing dataset, I definitely invite you to take a look at this dataset, you will have... you will find I don't know how many categories many categories many annotations millions of annotations so please take a look at this dataset over here you can see you can just choose between many different categories and this is an amazing dataset for sure, and also I invite you to take a look at other of my previous videos where I show you the entire process of how to download data from this datet and I also give you a couple of scripts, a couple of Python scripts which are just going to make the entire process much simpler and this previous video about how to download data from the Google open images dataset version 7 is available in my Patreon, to all my Patreon supporters now let's continue, remember this tutorial, the tutorial you are currently watching is only about showing you the results from this experiment, so let me show you the results, I'm going to show you first the results of the training process of the model I trained with the 1000 images in BGR right with these images over here in BGR, and these are the results from the training process remember that when we are looking at these plots over here... all the plots which are related to the loss function which are these three over here related to the loss function in the training set and these three over here which are related to the loss function in the validation set all of these plots should go down as we increase the number of epochs and you can see that this is exactly what's going on in all of the plots you can see over here so everything is okay for now and then if we look at these four plots over here these plots are related to the performance of the model in the validation set and these four plots should go up as we increase the number of epochs and you can see that this is exactly what we can see over here all of these plots go up as we increase the number of epochs and for example the mean average precision reaches a very good value around 85% or maybe 90% so everything looks the way it should, everything looks okay so so far everything is okay remember these are the results of the model we trained with the 1000 images in BGR, now let's see what happens with the model we trained with the 1000 images in grayscale and you can see that this is how the plots look like in this case if we look at the loss function in the training set and in the validation set we can see that all of them are going down as we increase the number of epochs so so far so good and if we look at these other four plots over here we can see that everything looks okay as well right everything is going up and we are reaching some very high values for example around 80% mean average precision so everything is okay as well so after a very quick evaluation we could say that the model is able to learn the distribution of the data very well because if the loss function is going down in the training set means we are able to learn something of the data and if the the loss function is going down in the validation set as well means we are able to learn the same something both in the training set and in the validation set so means everything is going well right means everything is okay this is only after a very quick evaluation looking at these plots but let me show you something you can see that in the model we trained with the 1000 images in grayscale you can see that we are reaching a loss value around 1 in the last epoch right you can see the loss value is around 1 and if I open the other plot of the model we trained with the 1000 images in BGR you can see we are reaching a much lower value of the loss function in the validation set in the last epoch which is something around maybe 0.9 right so although we are learning the distribution of the data in both cases in the model we trained with the 1000 images in BGR and the model with trained with the 1000 images in grayscale, with the model we trained with the images in BGR, we are able to learn the distribution of the data much better because we are able to reach much lower values of the loss function in the validation set that's one of the conclusions we can get by looking at these plots and also if we look at the mean average precision, you can see that in the model we trained with the 1000 images in BGR we are reaching something like a maybe 85% or 80 something percent and in the model we trained with the 1000 images in grayscale we are reaching something like an 80% so we are also achieving a much better performance in the model we train with the 1000 images in BGR, this is a very interesting conclusion but now let's take a look at some examples to see how the models perform in each case so let's take a look at some examples for this case over here I'm going to show you first some examples of some predictions with the model we trained with the 1000 images in BGR and you can see that currently we are looking at the annotations so this is the ground truth these are the annotations all the red bounding boxes are the annotations and if I show you the predictions we got with this model you can see everything looks okay we are able to detect all the ducks, it may not be a perfect performance but we are getting a very good performance nevertheless we are able to detect all the ducks maybe we have a situation over here we are just detecting a lot of objects and we don't really have so many objects but nevertheless I would say we are getting a very good detection nevertheless now let's see how the other model performs the one we trained with the images in grayscale I'm going to open this one over here remember all the red bounding boxes we are currently looking at these are all annotations these are the labels this is the ground truth so the ground truth is exactly the same in both cases but now let's take a look at the predictions and you can see that the predictions look pretty well with the model we trained with the images in grayscal you can see that it looks pretty pretty well the results are very decent they may not be as good as with the other model for example over here we are detecting only one duck and we have many other ducks over here and also maybe over here we have a missdetection over here we have an error over here because we're detecting two ducks and we only have one and so on we also have an issue over here we are detecting I don't know how many ducks but we only have one if I show you the other model performance again you can see in this case we have a much better performance we are detecting only one duck over here we are detecting only one duck as well... over here we are detecting... everything seems to be better as well yeah we are detecting more than one duck so yeah the model we trained with the 1000 images in BGR is performing way better we can see some results some very specific examples and is performing much better but nevertheless the model we trained with the 1000 images in grayscale I would say it performs very well nevertheless I would say this is a very decent performance these are some very decent results so by using a dataset which is comprised of images in grayscale although we are missing a lot of information because if we are using data which is in grayscale we only have one channel for every single one of these images and the images in BGR have three channels so they have way more information, if we use images in grayscale we are missing a lot of information but nevertheless the model we trained is very decent anyway, over here the results we got are very decent anyway so if you are ever in a situation in which for whatever reason using this type of dataset, using images in grayscale is cheaper, is more affordable than using images in BGR, for whatever reason that may be, based on this experiment it seems that this is not really a huge issue at all and that we are able to train a very powerful object detector even with the images in grayscale so this is a very interesting conclusion from this experiment obviously that this is only a very dummy experiment and we will need to make more tests and we will need to make more experiments in order to make like a more general conclusion or a more robust conclusion but nevertheless I would say it's a very good experiment in order to have like an overall idea of what happens when your images are in grayscale instead of BGR so this is going to be all for this tutorial, if you enjoyed this video I invite you to take a look at other of my previous videos where I showed you a similar experiment regarding the size of the dataset we use to train the model, I show you what happens when you train a model with 10 images with 50 images with 100 images, 1000 images and so on, how the size of the dataset affect the performance of the model we trained and I also invite you to take a look at other of my previous videos where I show you what happens when we train a computer vision model using a very noisy dataset, a data set with many annotations which are just wrong which are just not enclosing any object whatsoever, this is a much more important case in computer vision so please take a look at this other video as well and this other video is not in my YouTube channel but is available in my Patreon to all my Patreon supporters, so this is going to be all for today my name is Felipe, I'm a computer vision engineer and see you on my next video.

Info

Channel: Computer vision engineer

Views: 1,320

Rating: undefined out of 5

Keywords:

Id: pluJWGpVh5Y

Channel Id: undefined

Length: 13min 56sec (836 seconds)

Published: Mon Mar 25 2024