one of the most challenging parts of building
a computer vision model is that you need a huge and super clean dataset in order to build this
model and the way usually goes in real life, in a real project is that you don't have access to a
huge and super clean dataset in order to build your computer vision model and the data you do
have access to may have different type of issues or may have different type of challenges, for
example it may be a very noisy dataset or the data, the images may be in a different color
space for example the images could be in grayscale or it may not be a huge dataset but it may be
comprised of only a few images, it may be a very small dataset, and whatever the issue may be you
are a computer vision engineer and you need to be able to train a computer vision model with the
data you have available with the data you have access to and also remember that building a huge
and super curated dataset is very expensive and very often happens that you just need to make
compromises with the data you use because you don't really have the budget to build a super
huge and super clean dataset so you definitely need to make compromises with the data you use
and you end up using data which may not be as clean or as perfect as you would expect, so I
decided to make a few videos, a few tutorials where I show you what happens when you train a
computer computer vision model using data with different type of issues and in this tutorial in
the one you are currently watching I'm going to show you what happens when you train a computer
vision model using a dataset which is comprised of images in grayscale right we are going to
use these images you are currently looking at in order to build a computer vision model and I'm
going to show you how this model performs so this is the experiment we will be doing today we are
going to train two object detectors one of them on this data that you can see over here which is
comprised of 1,000 images in grayscale and then we are going to train another object detector on
this other dataset over here which is comprised of exactly the same 1000 images but in BGR, we are
going to train two object detectors and then we are going to evaluate how they perform on another
test set which is comprised of 100 images in BGR so this is exactly the experiment we we will be
doing today and by doing this experiment we are going to gain a lot of intuition regarding how
to train a computer vision model in real life conditions, because remember the way it usually
goes in real life data will not be perfect most likely data is not going to be perfect so you will
need to make some compromises and you will need to have to use data which have different type
of issues and different color spaces is one of the issues you will encounter with the data you
end up using in order to train a computer vision model so this is going to be an amazing video and
now let me just show you the results because in this tutorial this is only about showing you
the results from this experiment but it's not about showing you the training process itself, I
trained this object detectors using yolo v8, using this powerful and super amazing technology which
is yolo v8 and if you want to know exactly how I trained these object detectors I invite you to
take a look at other of my previous videos where I show you the entire process of how to train an
object detector with yolo V8, so please take a look at that video if you want to know how I did
this training process and then about the data I used to train this model I got this data from
the Google open images dataset version 7, all these images you can see over here, I downloaded this
data from the Google open images dataset version 7, this is an amazing dataset, I definitely invite
you to take a look at this dataset, you will have... you will find I don't know how many categories
many categories many annotations millions of annotations so please take a look at this dataset over
here you can see you can just choose between many different categories and this is an amazing dataset
for sure, and also I invite you to take a look at other of my previous videos where I show you
the entire process of how to download data from this datet and I also give you a couple of scripts,
a couple of Python scripts which are just going to make the entire process much simpler and this
previous video about how to download data from the Google open images dataset version 7 is available in
my Patreon, to all my Patreon supporters now let's continue, remember this tutorial, the tutorial you
are currently watching is only about showing you the results from this experiment, so let me show
you the results, I'm going to show you first the results of the training process of the model
I trained with the 1000 images in BGR right with these images over here in BGR, and these are
the results from the training process remember that when we are looking at these plots over
here... all the plots which are related to the loss function which are these three over here
related to the loss function in the training set and these three over here which are related to the loss
function in the validation set all of these plots should go down as we increase the number of epochs
and you can see that this is exactly what's going on in all of the plots you can see over here
so everything is okay for now and then if we look at these four plots over here these plots
are related to the performance of the model in the validation set and these four plots should
go up as we increase the number of epochs and you can see that this is exactly what we can see over
here all of these plots go up as we increase the number of epochs and for example the mean average precision
reaches a very good value around 85% or maybe 90% so everything looks the way it should, everything
looks okay so so far everything is okay remember these are the results of the model we trained with
the 1000 images in BGR, now let's see what happens with the model we trained with the 1000 images
in grayscale and you can see that this is how the plots look like in this case if we look at
the loss function in the training set and in the validation set we can see that all of them are
going down as we increase the number of epochs so so far so good and if we look at these other four
plots over here we can see that everything looks okay as well right everything is going up and we
are reaching some very high values for example around 80% mean average precision so everything
is okay as well so after a very quick evaluation we could say that the model is able to learn the
distribution of the data very well because if the loss function is going down in the training set
means we are able to learn something of the data and if the the loss function is going down in the
validation set as well means we are able to learn the same something both in the training set and
in the validation set so means everything is going well right means everything is okay this
is only after a very quick evaluation looking at these plots but let me show you something you
can see that in the model we trained with the 1000 images in grayscale you can see that we are
reaching a loss value around 1 in the last epoch right you can see the loss value is around
1 and if I open the other plot of the model we trained with the 1000 images in BGR you can
see we are reaching a much lower value of the loss function in the validation set in the last
epoch which is something around maybe 0.9 right so although we are learning the distribution of
the data in both cases in the model we trained with the 1000 images in BGR and the model with
trained with the 1000 images in grayscale, with the model we trained with the images in BGR, we are able to learn
the distribution of the data much better because we are able to reach much lower values of the loss
function in the validation set that's one of the conclusions we can get by looking at these
plots and also if we look at the mean average precision, you can see that in the model we trained with the
1000 images in BGR we are reaching something like a maybe 85% or 80 something percent and in the
model we trained with the 1000 images in grayscale we are reaching something like an 80% so we are
also achieving a much better performance in the model we train with the 1000 images in BGR, this
is a very interesting conclusion but now let's take a look at some examples to see how the models
perform in each case so let's take a look at some examples for this case over here I'm going to show
you first some examples of some predictions with the model we trained with the 1000 images in BGR and
you can see that currently we are looking at the annotations so this is the ground truth these are
the annotations all the red bounding boxes are the annotations and if I show you the predictions
we got with this model you can see everything looks okay we are able to detect all the ducks,
it may not be a perfect performance but we are getting a very good performance nevertheless we
are able to detect all the ducks maybe we have a situation over here we are just detecting a lot of
objects and we don't really have so many objects but nevertheless I would say we are getting a very
good detection nevertheless now let's see how the other model performs the one we trained with the
images in grayscale I'm going to open this one over here remember all the red bounding boxes we
are currently looking at these are all annotations these are the labels this is the ground truth so
the ground truth is exactly the same in both cases but now let's take a look at the predictions and
you can see that the predictions look pretty well with the model we trained with the images in grayscal
you can see that it looks pretty pretty well the results are very decent they may not
be as good as with the other model for example over here we are detecting only one duck and we
have many other ducks over here and also maybe over here we have a missdetection over here we
have an error over here because we're detecting two ducks and we only have one and so on we also
have an issue over here we are detecting I don't know how many ducks but we only have one if I show
you the other model performance again you can see in this case we have a much better performance
we are detecting only one duck over here we are detecting only one duck as well... over here we are
detecting... everything seems to be better as well yeah we are detecting more than one duck so yeah
the model we trained with the 1000 images in BGR is performing way better we can see some results
some very specific examples and is performing much better but nevertheless the model we trained with
the 1000 images in grayscale I would say it performs very well nevertheless I would say this
is a very decent performance these are some very decent results so by using a dataset which is
comprised of images in grayscale although we are missing a lot of information because if we are
using data which is in grayscale we only have one channel for every single one of these images
and the images in BGR have three channels so they have way more information, if we use images
in grayscale we are missing a lot of information but nevertheless the model we trained is very decent
anyway, over here the results we got are very decent anyway so if you are ever in a
situation in which for whatever reason using this type of dataset, using images in grayscale is
cheaper, is more affordable than using images in BGR, for whatever reason that may be, based on
this experiment it seems that this is not really a huge issue at all and that we are able to train
a very powerful object detector even with the images in grayscale so this is a very interesting
conclusion from this experiment obviously that this is only a very dummy experiment and we will
need to make more tests and we will need to make more experiments in order to make like a more
general conclusion or a more robust conclusion but nevertheless I would say it's a very good
experiment in order to have like an overall idea of what happens when your images are in grayscale
instead of BGR so this is going to be all for this tutorial, if you enjoyed this video I invite you to
take a look at other of my previous videos where I showed you a similar experiment regarding the size
of the dataset we use to train the model, I show you what happens when you train a model with 10 images
with 50 images with 100 images, 1000 images and so on, how the size of the dataset affect the
performance of the model we trained and I also invite you to take a look at other of my previous
videos where I show you what happens when we train a computer vision model using a very noisy
dataset, a data set with many annotations which are just wrong which are just not enclosing any object
whatsoever, this is a much more important case in computer vision so please take a look at this
other video as well and this other video is not in my YouTube channel but is available in my Patreon
to all my Patreon supporters, so this is going to be all for today my name is Felipe, I'm a computer
vision engineer and see you on my next video.