Hey my name is Felipe and welcome to this fully comprehensive course on object detection. We will start by discussing what object detection is and how to measure the performance of an object detector. Then I'm going to show you a step by step guide on how to train your own object detector on a custom dataset. And I'm going to show you three different
ways to detect objects on your images and videos: Yolov8, detectron2 and AWS Rekognition This course is ideal for beginners as well as for more advanced developers as it contains very valuable information and insights
I gathered from years of experience as a computer vision engineer. By the end of this course, you will be familiar with different object detection algorithms and you will be able to create amazing projects using state of the art computer vision technologies. And now let's get started. So let's start with this lesson about what is object detection. I'm going to cover the definition and I'm also going to mention a few examples. So object detection is a computer vision technique to identify and locate objects within images and videos. And there are many technologies to perform object detection. These are only a few of all the available technologies of all the available algorithms which you can use to do object detection. For example, you can use the Python library mediapipe, which is a very popular library to do
hand detection and face detection. You can also use OpenCV,
which is a library available for Python and C++ You can use Yolov8,
which is the most recent version of YOLO. You can use Detectron2,
which is a high level framework based on Pytorch. And this is a very popular framework in order to do many different
computer vision related tasks. You can also use AWS Rekognition,
which is a service available through a cloud provider. And these are only a few of all
the different ways to do object detection. And there are many, many, many,
many, many, many, many ways. And I don't know how many ways to do object detection.
These are only a few of them. And although there are many algorithms
and many technologies, all of them were pretty much the same from
a high level perspective. From an input output perspective, all of them receive an image as input and the output is a list of all the detected objects in that image. And the objects in that image are given
by these three values. The bounding box which is the location of the object in the image.
Then the confidence score, which is a value from 0 to 1. And it means how confident the object detector is
regarding that detection. And then the object category or the class name, right? Because if we... if we have detected an object, we want to know what object we have detected, right? We want to know the name of that object. So pretty much all the object detectors
work pretty much the same way and they are going to return
something which looks like this. The bounding box is usually specified with four values and there are many different formats, many different conventions
in order to specify the bonding box. And this is one of the most popular formats in order to do so which is the X and Y position of the top left corner and then the
X Y position of the bottom right corner, right, with these two values with these two corners, then we have specified the bounding box and then the confidence core and then the class name. So remember, although there are many,
many, many ways to do object detection, they all work pretty much the same way
from an input output perspective. And this is a very specific example
of how to do object detection in this image. You can see that this is the image of a cat and a dog. And this is a Python script, a very, very simple Python script
which uses yolov8 in order to detect all the objects within this image. And I'm not going into the details
of how this script works, but this is going to be available in the github repository of this tutorial. So if we execute this script, you can see that at the end we have...
we are iterating in all the detections we have detected
in this image and we are printing all the detections. And if we execute this script,
we are going to print something like this, you can see that we have detected a cat. This is the bounding box where the cat is located. And this is the confidence score
of the object detector regarding this detection. And then we have also detected a dog. This is the bounding box of the dog and this is how confident the object detector is regarding this detection. So this is a very simple example
of how object detection works on a very specific image and this is going
to be pretty much all for this lesson. So remember, object detection is a technique, a computer vision technique to identify and locate all the objects within an image or a video. And although there are many, many
many many different ways to do object detection, all of them were pretty much the same
from an input output perspective. Now let's move to the next lesson
about object detection metrics. So let's talk about object detection metrics. We will answer the question
how to measure the performance of an object detector. And you can see that we are
just starting with this lesson. And we immediately got this huge warning sign which says when using object detection metrics, you are only comparing your predictions with your ground truth. This is very, very, very important and you're going
to see exactly why later on this lesson. But for now, let's continue. So this is the road map we will be covering today. I have divided all the content in this lesson into two sections. The first one is about fundamentals
and this is where we will discuss all the definitions of all the metrics
we will be using today, all the different examples. I'm going to show you about these metrics and we will assume we are working under
ideal conditions and this is very, very very important and you're going to see exactly what I mean with ideal conditions later on. And then we have the other
section which is the one for the more advanced topics. And this is where we will assume real life conditions for now, let's continue. So let's start with the fundamentals,
we are going to cover the most common metrics. And we will assume the data we are using to
train the model is perfect. This is what I meant with ideal conditions, right?
We will assume our data set is perfect, which involves we have
many samples, we have a huge dataset. And in case we have many
different classes in our dataset, we we assume all of our
classes are equally distributed, which means we have the same
amount of objects for each one of our classes. But most importantly,
we will assume our dataset is perfectly annotated. So we have no issue in our dataset whatsoever, right? These are the ideal conditions
we will be assuming in this section. Now let's continue. These are the most common metrics
which are commonly used in object detection. So we have the loss function which is used during training during the training process. And then we have these two other metrics which are part of the evaluation process of an object detector, which are the intersection over union
and the mean average precision. Now let me show you a very specific example
of how this looks like on real life, right? Remember from our previous lesson
I told you there were many, many, many different ways to do object detection,
many different technologies, many different algorithms. Now yolov8 is only one of all
the different options of doing object detection. And when training a model with yolov8, this is what we get
at the end of the training process. At the end of the training process,
we will have all these many plots. So we can analyze the training process itself. And we can also analyze
the performance of the object detector we have just trained. And from all of these plots, you can see that six of them are related to the loss function. And I'm not going into the details
on why we have so many plots for the loss function. But just keep in mind that this is such an important metric that we have all these
many plots in order to analyze the performance of the model and the performance of the training process
regarding the loss function. This is such an important metric
that is why we have so many plots. And then the remaining four plots
are related to the main average precision. And in the case of yolov8 right, in the case of training an detector with yolov8,
the intersection of a union is not provided. But this is also a very important metric
in object detection. Now let's continue, let's start with the loss function. This metric is related to the learning process
to the training process. And there are different loss functions. There are many loss functions and they usually involve very complex mathematical expressions, very, very, very complex mathematical formulations and expressions. And the only thing I'm going to cover in this course about object detection is that
regarding the loss function lower is better. So a lower value of the of the loss function
means it's better. And if we go back to these plots we have over here, you can see that in all of these plots,
regarding the loss function in all of them, you can see that the loss function
is going down as we increase the number of epochs, right. So the the this is the only thing
I want you to remember for now, the the loss function is related
to the learning process. They usually involve very, very, very complex and very advanced
mathematical expressions and lower is better. Now, let's continue, let's move on to the intersection over union. And this metric measures the detection accuracy. It ranges between 0 and 1. So the intersection of our union
is a value between 0 and 1 and higher is better. And this is exactly how
the intersection over union is computed. So we have... given two bounding boxes, right? Remember we are going to be
comparing our detections with the ground truth. So we will have a bounding box
for our detections and we will have a bounding box for the ground truth. Given these two bounding boxes, we will measure the area of overlapping
and we will measure the area of union. And then we will just compute the intersection of union making this very simple calculation, right?
Let me show you an example. So we have these two pictures of a cat,
we have a cat in each one of these pictures. And you can assume these are
the ground truth bounding boxes for these objects, right? You can see in each case, this is a bounding box
which encloses the object perfectly. This is the ground truth. Now let's assume we are using an object detector and these are the detections we got
with our object detector. And now let's assume we want to compute
the intersection over union for each one of these cases. In the case of this example over here,
we have a very, very, very small intersection, we have a very small overlapping. So if we apply this formula we have over here,
which is the area of overlapping over the area of union, we will have a very,
very, very, very low value and this value is 0.15 right. So in this case, we have a very, very small value because these two
boundary boxes have a very, very, very, very small area of overlapping, of intersection. But in the other case, you can see that
our prediction is very, very, very close to the ground truth, right? It's it's almost perfectly matching the ground truth. So in this case, we will have
a higher value of intersection over union. And in this case is 0.95. So this is a very,
very simple example for you to get like a much better idea regarding
the intersection over union. Now let's continue, let's move to the mean
average precision. The mean average precision
is based on the precision recall curve. And the precision recall curve is based on the intersection over union
and the detection confidence score. Right? Remember from our previous lesson
on what is object detection, remember I mentioned that all
of the different frameworks, all the different algorithms
in order to do object detection, all of them have pretty much the same
structure regarding the input output and the output will always involve a
bounding box and also a confidence score. So the precision recall curve is based on the intersection over union and the
detection confidence score. The recall measures how effectively
we can find objects, right? From the precision recall curve, we have two elements,
one of them is precision, the one is recall and the recall measures
how effectively we can find objects and then precision
measures how well we perform once we find an object please mind these two definitions. Please mind the difference between these two definitions. This is very, very, very important and this is going to be much more clear in a few minutes because I'm going to show you a few examples. But please please uh focus on each one of these two definitions of how we are defining recall
and how we are defining precision And then about the mean average precision,
remember that higher is better. Now let's move on. Now it's where we are going to describe an example on how to compute the mean average precision. So this is our dataset, right?
let's assume that we have 10 apples in our dataset. And for each one of these apples,
for each one of these objects, we have the ground truth, right? We have a bounding box which
encloses the object perfectly, right? So this is our data and these are our annotations,
this is our ground truth. Now let's assume we are working with an object detector and these are our detections, right? In some cases, for example,
here or here or here we are getting like an OK detection. But in other cases like here or here,
we are not getting a very good detection, right. So let's see how we can compute the
mean average precision in this example. So this is the ground truth with the predictions on top. Now, we are visualizing both
the predictions and the ground truth. And these are values which are going to be super important in order to compute
the mean average precision. You can see that for each one of these objects,
we have two values, the score, which is the confidence score of that prediction, right? It's the confidence score of the green bounding box and then the intersection over union which is the intersection over union between the green bounding box and the blue bounding box is the intersection of a union between
our prediction and the ground truth. You can see that for each object in our dataset,
we have these two values, the confidence score of the green
bounding box and the intersection over union between the green bounding box
and the blue bounding box. And what we will be doing now
is we will be applying this very, very simple process. This is pseudo code, this is not real code, right?
This looks like Python but its not really Python. This is the pseudo code of the process
we will follow in order to compute the mean average precision. Please mind and please please
pay attention because this is very important. You can see that we are defining a variable which is intersection over union threshold
and we are defining this variable as 0.5. Then we are iterating in many different
values for the confidence score threshold. And we are defining two variables.
For each one of these iterations, we are defining two variables which
are true positives and false positives. And we are initializing each one
of these variables in zero. Then for each one of our detections
for each one of our green bounding boxes, we are going to verify if the confidence score we got is greater than the confidence score threshold. we are computing in this iteration. And if it is greater than this
confidence score threshold. We will take a look at the intersection of over union between the green bounding box
and the blue bounding box. And if this is greater than the
intersection of over union threshold, then we are going to increment
the true positives variable. And in any other case, we are going
to increment the false positives variable. So this is a very simple and a very straightforward
process in order to compute the mean average precision. But please focus on this process,
please go through this process more than once. Please be super super clear on
how this works because it's very, very important to understand how this process works. So once we have computed the
true positives and the false positives, right? Remember that for each one of the
values in confidence score threshold, we will be computing the true positives
and the false positives, we will be computing these two variables. Once we have computed these two variables, we are going to define precision and recall
exactly like this, precision will be the true positives divided by the number
of true positives + false positives. And in case of recall, we will be dividing the true positives
for the total number of ground truth objects. And in our case, the total number
of ground truth objects is 10, right? Remember we have 10 blue bounding boxes
and that's exactly our ground truth. So in our case, this number will always be 10. Now, let's continue. So let's go through this situation.
Let's go through this process once and again. And let's start with a
confidence score threshold of 0.75 right. In order to do so, we are going to go
through each one of these green bonding boxes. We we're going to go through each
one of our detections and we are going to keep only the ones which are... which have a confidence score greater than 0.75 which are these three bounding boxes, right? If we go back, you can see that in this case,
the confidence score is 0.7. In this case is 0.40 0.2 and so on. In all the other bounding boxes, the confidence score is lower than our threshold of 0.75. And we are going to take a look
at the intersection over union. And in case the intersection over union
is greater than the threshold we have defined of 0.5 then we are
going to increment the true positives. And if not, if it is not greater,
if it's lower than 0.5 we are going to increment the false positives. And in this case, you can see the
intersection over union is 0.85. So this is greater than 0.5.
So this is a true positive. This is also 0.85. So this is also greater than 0.5.
So this is also a true positive. And in this case, this is also a true positive because
the intersection of union is 0.8. So in this case, we have three true positives
and we have zero false positives. If we go... if we move here, you can see that this is exactly
what we have just mentioned. The true positives is three,
the false positives is zero. So if we compute precision and recall,
we get that precision is 1 and recall is 0.3 right? It's a very, very, very simple process.
A very straightforward process. Please go through this example once
and again until you are completely and 100% clear on
what we are doing because once you get familiar with
the process is very, very simple. But now let's move on to a
confidence score threshold of 0.5. In this case, we are going to do exactly the same. We're going to filter all the detections
with a confidence score lower than 0.5. And this is what we got. And now let's go
through each one of these detections. And let's see if the intersection over union
is greater or lower than 0.5. In this case, it's greater than 0.5.
So this is a true positive. This is also a true positive, also true positive,
this is also true positive. This is also true positive. But in this case, the intersection over union is 0.4 w
hich is lower than 0.5. So this is a false positive. So we have 5 true positives and
only 1 false positive. And if we compute the position and recall we get the precision is 0.83 and the recall is 0.5. Right? Let's continue. Now, let's move to
confidence score threshold equal to 0.25. We are going to filter all the all the
detections with a confidence score lower than 0.25. This is what we got. And let's take a look at the intersection over reunion. You can see in this case true positive
true positive, true positive. In this case, it's 0.1. So this is a false positive. This is also true positive true positive
and this is a false positive and this is also true positive. So we have 1 2 3 4 5 6 true positives
and 2 false positives. And this is what we have over here. We have six true positives, two false positives
and the precision is 0.75 and the recall is 0.6. Now let's continue. Now let's compute exactly the same values
but for a confidence score threshold of zero. In this case, we are not going to
filter any attention because all of them have a confidence score
which is greater than zero. And in this case, you can see that
this is a true positive true positive, true positive. This is a false positive. This is a true positive.
This is also a false positive. And then all the other ones are true positives... except this one which is a false positive. So we have 1 2 3 4 5 6 7 true positives
and only three false positives. And this is exactly what we have over here. So the precision is 0.7 and the recall is 0.7. So we have computed all these different
values for precision and recall. And from here is super super easy
and super straightforward to put everything together under a precision recall curve, right? We can very,
very easily to take all these pairs of values of precision and recall and to put
everything together on a plot which looks like this, right. And if we compute the area under the curve, we will be computing the average precision, which is a very important value we need to compute before computing the
mean average precision. And please do the math yourself,
and if I'm mistaking, please let me know in the comments below. But if I'm not mistaking, this is the value I have computed for
this curve we have over here. And a very quick note is that as we
were using an intersection over union threshold of 0.50 this is sometimes referred to as
average precision at 50 right? If you search for the literature or if you search for other blogs or youtube videos and so on, if you search for other places in
which they talk about the, the average precision or
the mean average precision you will find that sometimes this value
is referred to as average precision at 50. And we can also compute other values.
For example, if we were using an intersection over
union threshold of, for example, 0.90 then this will be the
average precision at 90 right. This is a very quick note.
But for now let's just continue. So we have computed the average precision. And from here,
if we want to compute the mean precision, the only thing we need to do is a very, very, very simple calculation because in our case, we are working with only one class right,
we are detecting apples and we are working with only one class
which is apple. But in the most generic case, you will be computing the average precision
for many many many different classes, right? So in the most generic case, the mean average precision will look
something like this, right? You will have many different average position
values for each one of your classes. And then in order to compute
the mean average position, the only thing you need to do is to sum everything together
and to take the average right, that's exactly how you can compute the
mean average precision And remember in our case, we are always working with
an intersection over union threshold of 0.50 as we are using only one value
for the intersection over union threshold. This is exactly how the mean average position l
ooks like in our case. Now, let's continue. This is your homework. This is your
homework from this tutorial. Now tell me which model performs better. We have two models and for each one
of these models, we have the intersection over union and we also have the mean average precision, right? The intersection over union of the model A is 0.70 and the mean average precision is 0.80. And for the model B, the intersection over
union is 0.55 and the mean average precision is 0.72. Now, your homework from this video
is to tell me which model performs better. So let me know in the comments below
if you find the answer to this question and I will be super happy to read
your answer in the comments below. And if you don't know which
model performs better, then also let me know in the comments below. And I will be super happy to help you
or maybe another member in our community will be super happy
to help you as well. But this is going to be all for this section,
for the fundamentals. And now let's move to the more advanced section. This is where we are going to work
with imperfect data, right? This is where we are going to have a dataset which is going to comply with one of the following sentences. Maybe we don't have enough samples. Maybe we are working with a very small dataset. Maybe we have an unbalanced dataset, which means we have many different
classes and we don't have the same amount of objects for all of our classes. Or maybe most importantly,
we have errors in our annotations, right? If you have one or more than one of
these issues in our dataset, this is where it's super super
super important to remember we are comparing our detections
against the ground truth, right? All of the metrics we have mentioned so far. The only thing we're doing is comparing
the detections against the ground truth. This is where it's super super super
important to remember the warning we got when we
were starting this lesson. So if we are in this situation, I want you
to take your performance matrix with a grain of salt which means compute everything
you want to compute, compute the intersection of reunion. Please compute the mean average precision,
all your losses, compute every single metric you want. But please take all of your metrics
with a grain of salt. And a very good example of this
situation is one of my previous tutorials where I showed you how to train a
semantic segmentation model using yolov8. This previous tutorial was not really
about object detection, but this was about semantic segmentation,
but I think it's a very good example nevertheless. in this previous tutorial, we had a ground truth,
we have a dataset which had many, many, many, many many different errors. And in this previous tutorial, we noticed
that the detections we got with the model we trained, were even better than the data
we used to train the model, were even better than the ground truth. So this is a very, very good example
of what happens in a situation like this, right? This is a very good example of a situation
in which we have many issues in our data. And we have to be super, super, super
cautious in the way we interpret, in the way we read in the way we make sense
of the object detection metrics now, I'm not going to show you the entire previous tutorial on semantic segmentation using yolov8. But let's just remember a few minutes from this previous tutorial, those few minutes where we noticed, we had an issue with our data and we noticed that the predictions
we got with our model were even better than the data
we used to train this model. Let's remember these few minutes. in order to continue with this process with
this validation is that we are going to take a look at what happens with our predictions how
is this model performing with some data with some predictions and for this we are going to take
a look what happens with all of these images right you can see that these are some batches
and these are some some of our labels some of our annotations for all of these images and then
these are some of the predictions for these images right so we are going to take a look what happens
here and for example I'm going to show you these results, the first image, and you can see
that looking at this image which again these are not our predictions but this is our data these are
our annotations these are our labels you can see that there are many many many missing annotations
for example in this image we only have one mask we only have the mask for one or four ducks
we have one two three four five dogs but only one of them is annotated we have a similar behavior
here only one of the ducks is annotated here is something similar only one of them is annotated
and the same happens for absolutely every single one of these images so there are a lot of missing
annotations in this data we are currently looking at and if I look at the predictions now these are
the same images but these are our predictions we can see that nevertheless we had a lot of missing
annotations the predictions don't really look that bad right for example in this case we are
detecting One Two Three of the five Ducks we so we have an even better prediction that we have
over here I would say it's not a perfect detection but I would say it's very good right it's like
it's not 100% accurate but it's like very good and I would say it's definitely better than the
data we used to train this model so that's what happens with the first image and if I take a look
at the other images I can see a similar Behavior right this is the data
we used for training this algorithm and these are the predictions we got for these images and
so on right it seems It's like exactly the same behavior exactly the same situation for
this image as well so my conclusions by looking at these images by looking at these predictions
is that the model is not perfect but I would say performs very well especially considering that
the data we are using to train this model seems to be not perfect seems to have a lot a lot
of missing detections have a lot of missing elements right a lot of missing objects so
that's our conclusion that's my conclusion by looking at these results and that's
another reason for which I don't recommend you to go crazy analyzing these plots because when
you are analyzing these plots remember the only thing you're doing is that you are comparing your
data the data you are using in order to train this model with your predictions right the only thing
you're doing, you're comparing your data with your predictions with the predictions you had with
the model right so as the only thing you are doing is a comparison between these two things then
if you have many missing annotations or many missing objects or if you have many different errors
in your data in the data you're using to train the algorithm then this comparison it's a little
meaningless right it doesn't really make a lot of sense because if you're just comparing one thing
against the other but the thing you are comparing with has a lot of Errors it has a lot of
missing objects and so on maybe the comparison doesn't make any a lot of sense whatsoever right
that's why I also recommend you to not go crazy when you are analyzing these plots because they
are going to give you a lot of information but you are going to have even more information
when you are analyzing all of these results and this is a very very very good example of what
happens in real life when you are training a model in a real project because remember that building
an entire dataset, a dataset which is 100% clean and absolutely 100% perfect is very very very
expensive so this is a very good example of what happens in real life usually the data you're using
to train the model, to train the algorithm has a few errors and sometimes there are many many many
errors so this is a very good example of how this validation process looks like with data which
is very similar to the data we have in real life which in most cases is not perfect And obviously you are more
than welcome to watch the entire tutorial after you complete this course. For now, let's just move to the next video
where I'm going to show you how to train an object detector
on your own custom data. hey my name is Felipe and welcome to my channel
in this video we are going to train an object detector using yolo V8 and I'm going to walk you
step by step through the entire process from how to collect the data you need in order to train an
object detector how to annotate the data using a computer vision annotation tool how to structure
the data into the exact format you need in order to use yolo V8, how to do the training and
I'm going to show you two different ways to do it; from your local environment and also from
a Google collab and how to test the performance ofthea model you trained so this is going to be
a super comprehensive step-by-step guide of everything you need to know in order to train
an object detector using yolo v8 on your own custom data set so let's get started so let's
start with this tutorial let's start with this process and the first thing we need to do is to
collect data the data collection is the first step in this process remember that if you want to train
an object detector or any type of machine learning model you definitely need data, the algorithm, the
specific algorithm you're going to use in this case yolo V8 is very very important but the data
is as important as the algorithm if you don't have data you cannot train any machine learning model
that's very important so let me show you the data I am going to use in this process these are some
images I have downloaded and which I'm going to use in order
to train this object detector and let me show you a few of them these are some images of alpacas
this is an alpaca data set I have downloaded for today's tutorial and you can see these are all
images containing alpacas in different postures and in different situations right so this is
exactly the data I am going to use in this process but obviously you could use whatever data set you
want you could use exactly the same data set I am going to use or you can just collect the data
yourself you could just take your cell phone or your camera or whatever and you can just take the
pictures the photos the images you are going to use you can just do your own data collection
or something else you could do is to just use a a publicly available data set so let
me show you this data set this is the open image dataset version 7 and this is a dataset which is
publicly available and you can definitely use it in order to work on today's tutorial in order to
train the object detector we are going to train on todays tutorial so let me show you how it looks
like if I go to explore and I select detection uh you can see that I'm going to unselect all
these options you can see that this is a huge data set containing many many many many many
many many many categories I don't know how many but they are many this is a huge data set
it contains millions of images, hundreds of thousands if not millions of annotations thousands
of categories this is a super super huge data set and you can see that you have many many different
categories now we are looking at trumpet and you can see these are different images with trumpets
and from each one of these images we have a bounding box around the trumpet and if I show you
another one for example we also have Beetle and in this category you can see we have many different
images from many different type of beetles so this is another example or if I show you this one
which is bottle and we have many different images containing bottles for example there you can see
many different type of bottles and in all cases we have a bounding box around the bottle and I could
show you I don't know how many examples because there are many many many different categories
so remember the first step in this process is the data collection this is the data I am going
to to use in this project which is a dataset of alpacas and you can use the exact same data
I am using if you want to you can use the same data set of alpacas or you can just collect your
own data set by using your cell phone your camera or something like that or you can also download
the images from a publicly available dataset for example the open images dataset version 7. if you
decide to use open images dataset version 7 let me show you another category which is alpaca this
is exactly from where I have downloaded all of the images of alpacas so if in case you decide to use
this publicly available data set I can provide you with a couple of scripts I have used in order to
download all this data in order to parse through all the different annotations and to
format this data in the exact format we need in order to work on today's tutorial so in case
you decide to use open image data set I am going to give you a couple of scripts which are going to
be super super useful for you so that's that's all I can say about the data collection remember you
need to collect data if you want to train an object detector and you have all those different ways
to do it and all these different categories and all these different options so now let's move on
to the next step and now let's continue with the data annotation you have collected a lot of images
as I have over here you have a lot of images which you have collected yourself or maybe you have
downloaded this data from a publicly available data set and now it's the time to annotate this
data set maybe you were lucky enough when you were creating the dataset and maybe this data set you
are using is already annotated maybe you already have all the bounding boxes from all of your
objects from all your categories maybe that's the case so you don't really need to annotate your
data but in any other case for example if you were using a custom data set, a dataset you have collected
yourself with your own cell phone your camera and so on something you have collected in that case
you definitely need to annotate your data so in order to make this process more comprehensive in
order to show you like the entire process let me show you as well how to annotate data so we are
going to use this tool which is CVAT this is a labeling tool I have used it many many times in
many projects I would say it's one of my favorite tools I have used pretty much absolutely all
the object detection computer vision related annotation tools I have used maybe I haven't used
them all but I have used many many of them and if you are familiar with annotation tools you would
know that there are many many of them and none of them is perfect I will say all of the different
annotation tools have their advantages and their disadvantages and for some situations you prefer
to use one of them and for other situations it's better to use another one CVAT has many advantages
and it also has a few disadvantages I'm not saying it's perfect but nevertheless this is a tool I
have used in many projects and I really really like it so let me show you how to use it you
have to go to cvat.ai and then you select try for free there are different pricing options
but if you are going to work on your own or or in a very small team you
can definitely use the free version so I have already logged in this is already logged into my
account but if you don't have an account then you will have to create a new one so you
you're going to see like a sign up page and you can just create a new account and then you can
just logged in into that account so once you are logged into this annotation tool you need to
go to projects and then create a new one I'm going to create a project which is called alpaca
detector because this is the project I am going to be working in and I'm going to add a label
which in my case is going to be only one label which is alpaca and then that's pretty much all
submit and open I have created the project it has one label which is alpaca remember if your project
has many many different labels add all the labels you need, and then I will go here which is create
a new task I am going to create a new annotation task and I'm going to call this task something
like alpaca detector annotation task zero zero one this is from the project alpaca detector and this
will take all the labels from that project now you need to upload all the images you are going to
annotate so in my case I'm obviously not going to annotate all the images because you can see these
are too many images and it doesn't make any sense to annotate all these images in this video These
are 452 images so I'm not going to annotate them all but I'm going to select a few in order to show
you how exactly this annotation tool works and how exactly you can use it in your project also in my
case as I have already as I have downloaded these images from a publicly available data set from
the open images dataset version 7 I already have the annotations I already have all the
bounding boxes so in my case I don't really need to annotate this data because I already have the
annotations but I'm going to pretend I don't so I can just label a few images and I can show you
how it works so now I go back here and I'm just going to select something like this many images
right yeah I'm just going to select this many images I'm going to open these images and then
I'm going to click on submit and open right so this is going to create this task and at the same
time it's going to open this task so we can start working on our annotation process okay so this is
the task I have just created I'm going to click here in job number and this and the job number
and this will open all the images and now I'm going to start annotating all these images so we
are working on an object detection problem so we are going to annotate bounding boxes we need to
go here and for example if we will be detecting many different categories we would select what
is the category we are going to label now and and that's it in my case I'm going to label always the same
category which is alpaca so I don't really need to do anything here so I'm going to select shape
and let me show you how I do it I'm going to click in the upper left corner and then in the
bottom right corner so the idea is to enclose the object and only the object right the idea is to
draw a bonding box around the object you only want to enclose this object
and you can see that we have other animals in the back right we have other alpacas so I'm just going
to label them too and there is a shortcut which is pressing the letter N and you can just create
a new bounding box so that's another one this is another one this is another alpaca and this is
the last one okay that's pretty much all so once you're ready you can just press Ctrl s that's
going to save the annotations I recommend you to press Ctrl S as often as possible because it's
always a good practice so now everything is saved I can just continue to the next image now we are
going to annotate this alpaca and I'm going to do exactly the same process I can start here obviously
you can just start in whatever corner you want and I'm going to do something like this okay
this image is completely annotated I'm going to continue to the next image in this case I am going
to annotate this alpaca too. this is not a real alpaca but I want my object detector to be able
to detect these type of objects too so I'm going to annotate it as well this is going to be a very
good exercise because if you want to work as a machine learning engineer or as a computer
visual engineer annotating data is something you have to do very often, actually training
machine learning models is something you have to do very often so usually the data annotation is
done by other people, right, it is done by annotator s there are different
services you can hire in order to annotate data but in whatever case whatever service you use
it's always a very good practice to annotate some of the images yourself right because if
you annotate some of the images yourself you are going to be more familiar with the data
and you're also going to be more familiar on how to instruct the annotators on how to annotate this
particular data for example in this case it's not really challenging we just have to annotate these
two objects but let me show you there will be other cases because there will be always situations
which are a little confusing in this case it's not confusing either I have just to I have to label
that object but for example a few images ago when we were annotating this image if an annotator
is working on this image that person is going to ask you what do I do here should I annotate
this image or not right if an annotator is working on this image and the instructions you provide
are not clear enough the person is going to ask you hey what do I do here should I annotate
this image or not is this an alpaca or not so for example that situation, another situation will be
what happened here which we had many different alpacas in the background and some of them for
example this one is a little occluded so there could be an annotator someone who ask you hey do
you want me to annotate absolutely every single alpaca or maybe I can just draw a huge bonding box
here in the background and just say everything in the background is an alpaca it's something that
when an annotator is working on the images they are going to have many many different questions
regarding how to annotate the data and they are all perfect questions and very good questions
because this is exactly what's about I mean when you are annotating data you are defining exactly
what are the objects you are going to detect right so um what I'm going is that if you annotate some
of the images yourself you are going to be more familiar on what are all the different situations
and what exactly is going on with your data so you are more clear in exactly what are the objects
you want to detect right so let's continue this is only to show a few examples this is another
situation in my case I want to say that both of them are alpacas so I'm just going to say
something like this but there could be another person who says no this is only one annotation
is something like this right I'm just going to draw one bonding box enclosing both of them
something that and it will be a good criteria I mean it will be a criteria which I guess it would
be fine but uh whatever your criteria would be you need one right you need a criteria so while you
are annotating some of the images is that you are going to further understand what exactly is
an alpaca what exactly is the object you want to consider as alpaca so I'm just going to continue
this is another case which may not be clear but I'm just going to say this is an alpaca this
black one which we can only see this part and we don't really see the head but I'm going to
say it's an alpaca anyway this one too this one too this one too also this
is something that always happens to me when I am working when I am annotating images that I am more
aware of all the diversity of all these images for example this is a perfect perfect example because
we have an alpaca which is being reflected on a mirror and it's only like a very small
section of the alpaca it's only like a very small uh piece of the alpacas face so what
do we do here I am going to annotate this one too because yeah that's my criteria but another person
could say no this is not the object I want to detect this is only the object I want to detect and maybe
another person would say no this is not an alpaca alpacas don't really apply makeup on them this is
not real so I'm not going to annotate this image you get the idea right there could be many different
situations and the only way you get familiar with all the different type of situations
is if you annotate some of the images yourself so now let's continue in my case I'm going
to do something like this because yeah I would say the most important
object is this one and then other ones are like... yeah it's not really that important if we detect
them or not okay so let's continue this is very similar to another image I don't know how many I have
selected but I think we have only a few left I don't know if this type of animals are natural... I'm very surprised about this like the head right it's like it has a lot of
hair over here and then it's completely hairless the entire body I mean I don't know I'm
surprised maybe they are made like that or maybe it's like a natural alpaca who cares who cares...
let's continue so we have let's see how many we have only a few left so let's continue uh let's
see if we find any other strange situation which we have to Define if that's an alpaca or not so
I can show you an additional example also when you are annotating you could Define your bounding box
in many many different ways for example in this case we could Define it like this we could Define
it like this I mean we could Define it super super fit to the object something like this super super
fit and we could enclose exactly the object or we could be a little more relaxed right for example
something like this would be okay too and if we want to do it like this it will be okay too right you
don't have to be super super super accurate you could be like a little more relaxed and it's
going to work anyway uh now in this last one and that's pretty much all
and this is the last one okay I'm going to do something like this now I'm
going to take this I think this is also alpaca but anyway I'm just going to annotate this part
so that's pretty much all, I'm going to save and those are the few images I have selected in order
to show you how to use this annotation tool so that's pretty much all for the data annotation and
remember this is also a very important step this is a very important task in this process because
if we want to train an object detector we need data and we need annotated data so this is a very
very important part in this process remember this tools cvat this is only one of the many many
many available image annotation tools, you can definitely use another one if you want it's
perfectly fine it's not like you have to use this one, at all, you can use whatever annotation tool
you want but this is a tool I think it's very easy to use I like the fact it's very easy to use it's
also a web application so you don't really need to download anything to your computer you can
just go ahead and use it from the web that's also one of its advantages so yeah so this is a
tool I showed you in this video how to use in order to train this object detector so this is going
to be all for this step and now let's continue with the next part in this process and now that
we have collected and annotated all of our data now it comes the time to format this data to
structure this data into the format we need in order to train an object detector using yolo V8
when you're working in machine learning and you're training a machine learning model every single
algorithm you work with it's going to have its own requirements on how to input the data that's going
to happen with absolutely every single algorithm you will work with it's going to happen with yolo
with all the different YOLO versions and it's going to happen with absolutely every single
algorithm you are working with so especially yolov8 needs the data in a very specific format so
I created this step in this process so we can just take all the data we have generated all the
images and all the annotations and we can convert all these images into the format we need in order
to input this data into yolo V8 so let me show you exactly how we are going to do that if you
have annotated data using cvat you have to go to tasks and then you have to select this option and
it's export task data set it's going to ask you the export format so you can export this data into
many different formats and you're going to choose you're going to scroll all the way down and you're
going to choose YOLO 1.1 right then you can also save the images but in this case it's not really
needed we don't really need the images we already have the images and you're just going to click ok
now if you wait a few seconds or a few minutes if you have a very large data set you are going to
download a file like this and if I open this file you are going to see all these different files
right you can see we have four different files so actually three files and a directory and if I open
the directory this is what you are going to see which is many many different file names and if I
go back to the images directory you will see that all these images file names they all look pretty
much the same right you can see that the file name the structure for this file name looks pretty
much the same as the one with as the ones we have just downloaded from cvat so basically the way
it works is that when you are downloading this data into this format into the YOLO format every
single annotation file is going to be downloaded with the same name as the image you have annotated
but with a different extension so if you have an image which was called something.jpg then The
annotation file for that specific image will be something.txt right so that's the way it works
and if I open this image you are going to see something like this you're going to see in this
case only one row but let me show you another one which contains more than one annotation I
remember there were many for example this one which contains two different rows and each one of
these rows is a different object in my case as I only have alpacas in this data set each one of
these rows is a different alpaca and this is how you can make sense of this information the first
character is the class, the class you are detecting I wanted to enlarge the entire file and
I don't know what I'm doing there okay okay the first number is the class you are
detecting in in my case I only have one so it's only a zero because it's my only class and
then these four numbers which Define the bounding box right this is encoded in the YOLO format which
means that the first two numbers are the position of the center of the bounding box then you have
the width of your bounding box and then the height of your bounding box, you will notice
these are all float numbers and this basically means that it's relative to the entire size of
the image so these are the annotations we have downloaded and this is in the exact same format
we need in order to train this object detector so remember when I was downloading these
annotations we noticed there were many many many different options all of these different options
are different formats in which we could save the annotations and this is very important because you
definitely need to download YOLO because we are going to work with yolo and everything it's pretty
much ready as we need it in order to input into yolo V8 right if you select YOLO that's exactly
the same format you need in order to continue with the next steps and if you have your data into
a different format maybe if you have already collected and annotate your data and you have your
data in whatever other format please remember you will need to convert these images or actually to
convert these annotations into the YOLO format now this is one of the things we need for
the data this is one of the things we need in order to we need to format in order to structure
the data in a way we can use this object detector with yolo V8 but another thing we should do is
to create very specific directories containing this data right we are going to need two directories
one of them should be called images and the other one should be called labels you definitely need
to input these names you cannot choose whatever name you want you need to choose these two names
right the images should be located in an directory called images and the labels should be located in
a directory called labels that's the way yolo V8 works so you need to create these two directories
within your image directory is where you are going to have your images if I click here you can
see that these are all my images they are all within the images directory they are all within
the train directory which is within the images directory this directry is not absolutely needed
right you could perfectly take all your images all these images and you could just paste all your
images here right in the images directory and everything will be just fine but if you want you
could do something exactly as I did over here and you could have an additional directory which is
in between images and your images and you can call this whatever way you want this
is a very good strategy in case you want to have for example a train directory containing all the
training images and then another directory which could be called validation for example and this
is where you are going to have many images in order to validate your process your training
process your algorithm and you could do the same with an additional directory which could be
called test for example or you can just use these directories in order to label the data right
to create different versions of your data which is another thing which is very commonly done so you
could create many directories for many different purposes and that will be perfectly fine but you
could also just paste all the images here and that's also perfectly fine and you can see that
for the labels directory I did exactly the same we have a directory which is called train and within
this directory is that we have all these different files and for each one of these files let me
show you like this it's going to be much better for each one of these files for each one of
these txt files we will have an image in the images directory which is called exactly the
same exactly the same file name but a different extension right so in this case this one is called
.txt and this one is called .jpg but you can see that it's exactly exactly the same file name
for example the first image is called oa2ea8f and so on and that's exactly the same name as
for the first image in the images directory which is called oa2ea8f and so on so basically for
absolutely every image in your images directory you need to have an annotations file and a file in
the labels directory which is called exactly the same exactly the same but with a different extension
if your images are .jpg your annotations files are .txt so that's another thing which also
defines the structure you'll need for your data and that's pretty much all so remember you need
to have two directories one of them is called images, the other one is called labels within the images
directories is where you're going to have all your images and within your labels directories is where
you will have all your annotations, all your labels and for absolutely every single image in your
images directory you will need to have a file in the labels directory which is called exactly
the same but with a different extension if your images are .jpg your annotation files should
be .txt and the labels should be expressed in the yolo format which is as many rows as
objects in that image and every single one of these rows should have the same structure you
are going to have five terms the first one of them is the class ID in my case I only have one class
ID I'm only detecting alpacas so in my case this number will always be zero but if you're detecting
more than one class then you will have different numbers then you have the position the X and Y
position of the center of the bounding box and then you will have the width and then you will
have the height and everything will be expressed in relative coordinates so basically this is
the structure you need for your data and this is what this step is about so that's
pretty much all about converting the data or about formatting the data and now let's move on to the
training now it's where we are going to take all this data and we are going to train our object
detector using yolo V8 so now that we have taken the data into the format we need in order to
train yolo v8 now comes the time for the training now it comes the time where we are going to take
this custom data set and we are going to train an object detector using yolo V8 so this is yolo
V8 official repository one of the things I like the most about YOLO V8 is that in order
to train an object detector we can do it either with python with only a few python
instructions or we can also use a command line utility let me see if I find it over here we can
also execute a command like this in our terminal something that looks like this and that's pretty
much all we need to do in order to train this object detector that's something I really really
liked that's something I'm definitely going to use in our projects from now on because I think
it's a very very convenient and a very easy way to train an object detector or a machine learning
model so this is the first thing we should notice about yolo V8 there are two different ways
in which we can train an object detector we can either do it in python as we usually do or
we can run a command in our terminal I'm going to show you both ways so you're familiar with both
ways and also I mentioned that I am going to show you the entire process on a local environment in a
python project and I'm also going to show you this process in a google colab so I I know there are
people who prefer to work in a local environment I am one of those people and I know that there are
other people who prefer to work on a Google colab so depending on in which group are you I
am going to show you both ways to do it so you can just choose the one you like the most so let's
start with it and now let's go to pycharm this is a pycharm project I created for this training and
this is the file we are going to edit in order to train the object detector so the first thing I'm
going to do is to just copy a few lines I'm just going to copy everything and I'm going to remove
everything we don't need copy and paste so we want to build a new model from scratch so we are going
to keep this sentence and then we are going to train a model so we are just going to remove
everything but the first sentence and that's all right these are the two lines we need in order to
train an object detector using yolo V8 now we are going to do some adjustments, obviously the
first thing we need to do is to import ultralytics which is a library we need to use in
order to import yolo, in order to train a yolo V8 model and this is a python Library we need to
install as we usually do we go to our terminal and we do something like pip install and the library
name in my case nothing is going to happen because I have already installed this library but please
remember to install it and also please mind that when you are installing this Library this library
has many many dependencies so you are going to install many many many many different python
packages so it's going to take a lot of space so definitely please be ready for that because you
need a lot of available space in order to install this library and it's also going to take
some time because you are installing many many many different packages but anyway let's continue
please remember to install this library and these are the two sentences we need in order to run
this training from a python script so this sentence we're just going to leave it as
it is this is where we are loading the specific yolo V8 architecture the specific yolo V8 model we are going to use you can see that we can choose from any of all of these different
models these are different versions or these are different sizes for yolo V8 you can see we have
Nano small medium large or extra large we are using the Nano version which is the smallest one
or is the lightest one, so this is the one we are going to use, the yolo V8 Nano, the yolo V8 n then about
the training about this other sentence we need to edit this file right we need a yaml file which
is going to contain all the configuration for our training so I have created this file and I have
named this file config.yaml I'm not sure if this is the most appropriate name but anyway this is
the name I have chosen for this file so what I'm going to do is just edit this parameter and I'm
going to input config.yaml this is where the config.yaml is located this is where the main.pi
is located, they are in the same directory so if I do this it's going to work just fine and then let
me show you the structure for this config.yaml you can see that this is a very very very simple
configuration file we only have a few Keys which are PATH train val and then names right let's
start with the names let's start with this this is where you are going to set all your different
classes right you are training an object detector you are detecting many different categories many
different classes and this is where you are going to input is where you're going to type all of
those different classes in my case I'm just detecting alpacas that's the only class
I am detecting so I only have one class, is the number zero and it's called alpaca but if you are
detecting additional objects please remember to include all the list of all the objects you are
detecting, then about these three parameters these three arguments the path is the absolute path to
your directory containing images and annotations and please remember to include the absolute path.
I ran some issues when I was trying to specify a relative path relative from this directory from
my current directory where this project is created to the directory where my data is located when
I was using a relative path I had some issues and then I noticed that there were other people
having issues as well I noticed that in the GitHub repository from YOLO V8 I noticed this is in the
the issues section there were other people having issues when they were specifying a relative path
so the way I fixed it and it's a very good way to fix it it's a very easy way to fix it it's just
specifying an absolute path remember this should be an absolute path so this is the path to this
directory to the directory contain the images and the labels directories so this is this is the
path you need to specify here and then you have to specify the relative path from this location to
where your images are located like the specific images are located right in my case they are in
images/train relative to this path if I show you this location which is my root directory then if
I go to images/train this is where my images are located right so that's exactly what I need to
specify and then you can see that this is the train data this is the data the algorithm is going
to use as training data and then we have another keyword which is val right the validation dataset
in this case we are going to specify the same data as we used for training and the reason
I'm doing this is because we want to keep things simple in this tutorial I'm just going to show
you the entire process of how to train an object detector using yolo V8 on a custom data set
I want to keep things simple so I'm just going to use the same data so that's pretty much all
for this configuration file now going back to main that's pretty much all we need in order to
train an object detector using yolo V8 from python that's how simple it is so now I'm
going to execute this file I'm going to change the number of epochs I'm going to do this for only
one Epoch because the only thing I'm going to show you for now is how it is executed, I'm going to
show you the entire process and once we notice how everything is working once we know
everything is up and running everything is working fine we can just continue but let's just
do this process let's just do this training for only one Epoch so we can continue you can see that
now it's loading the data it has already loaded the data you can make use of all the different
information of this debugging information we can see here you can see now
we were loading 452 images and we were able to load all the images right 452 from 452 and if
I scroll down you can see that we have additional information additional values which are related
to the training process this is how the training process is going right we are training this object
detector and this additional information which we are given through this process so for now the
only thing we have to do is only waiting we have to wait until this process is completed so
I am going to stop this video now and I'm going to fast forward this video until the end of this
training and let's see what happens okay so the training is now completed and you can see that
we have an output which says results saved to runs/detect/train39 so if I go to that directory
runs/detect and train39 you can see that we have many many different files and these files are related to how the training process was done right for example if I show you these
images these are a few batches of images which were used in order to train this algorithm
you can see the name is train batch0 and train batch1 I think we have a train batch2 so we have a lot of different images of a lot of different alpacas of different images we used
for training and they were all put together they were all concatenated into these huge images so
we can see exactly the images which were used for training and The annotation on top of them right
the bonding boxes on top of them and we also have similar images but for the validation dataset
right remember in this case we are using the same data as validation as we use for training so it's
exactly the same data it's not different data but these were the labels in the validation data set
which is the training data set and these were the predictions on the same images right you can see
that we are not detecting anything we don't have absolutely any prediction we don't have absolutely
any bounding box this is because we are doing a very shallow training we are doing a very dummy
training we are training this algorithm only for one epoch this was only an example to show you the output
how it looks like to show you the entire process but it is not a real training but nevertheless
these are some files I'm going to show you better when we are in the next step
for now let me show you how the training is done from the command line from the terminal using the
command I showed you over here using a command like this and also let me show you how this training
is done on a Google colab so going to the terminal if we type something like this yolo detect train
data I have to specify the configuration file which is config.yaml and then model yolov8n.yaml
and then the number of epochs this it's exactly the same as we did here exactly the
same is going to produce exactly the same output I'm just going to change the number of epochs for
one so we make it exactly the same and let's see what happens you can see that it we have exactly
the same output we have loaded all the images and now we are starting a new training process and
after this training process we are going to have a new run which we have already created the new
directory which is train40 and this is where we are going to save all the information related
to this training process so I'm not going to do it because it's going to be exactly the same as
as the one we did before but this is exactly how you should use the command line or how you
can use this utility in order to do this training from the terminal you can see how simple it is
it's amazing how simple it is it's just amazing and now let me show you how everything is done
from a Google colab so now let's go back to the browser so I can show you this notebook I created
in order to train yolo V8 from a Google colab if you're not familiar with Google collab the way
you can create a new notebook is going to Google Drive you can click new more and you select
the option Google collaboratory this is going to create a new google colab notebook and you
can just use that notebook to train this object detector now let me show you this notebook and
you can see that it contains only one two three four five cells this is how simple this will
be the first thing you need to do is to upload the data you are going to use in order to train
this detector it's going to be exactly the same data as we used before so these are exactly
the same directories the images directory and the label directory we used before and then
the first thing we need to do is to execute this cell which mounts Google Drive into
this instance of google collab so the only thing I'm doing is just I just pressed
enter into this cell and this may take some time but it's basically the only thing it does is
to connect to Google Drive so we can just access the data we have in Google Drive so I'm going to
select my account and then allow and that's pretty much all then it all comes to where you have the
data in your Google drive right in the specific directory where you have uploaded the data in
my case my data is located in this path right this is my home in Google Drive and then this
is the relative path to the location of where I have the data and where I have all the files
related to this project so remember to specify this root directory as the directory where you have
uploaded your data and that's pretty much all and then I'm just going to execute this cell
so I save this variable I'm going to execute this other cell which is pip install ultralytics the
same command I ran from the terminal in my local environment now I'm going to run it in Google
collab remember you have to start this command by the exclamation mark which means you are running
a command in the terminal where this process is being executed or where this notebook is being
launched so remember to include the exclamation mark everything seems to be okay everything
seems to be ready and now we can continue to the next cell which is this one you can see that
we have done exactly the same structure we have input exactly the same lines as in our
local environment if I show you this again you can see we have imported ultralytics then we have
defined this yolo object and then we have called model.train and this is exactly the same as we are
doing here obviously we are going to need another yaml file we are going to need a yaml file in our
Google Drive and this is the file I have specified which it's like exactly the same
configuration as in the um as in the in the yaml file I showed you in my local environment is
exactly the same idea so this is exactly what you should do now you should specify an absolute
path to your Google Drive directory that's the only difference so that's the only difference
and I see I have a very small mistake because I see I have data here and here I have just
uploaded images and labels in the directory but they are not within another rectory which
is called Data so let me do something I'm going to create a new directory which is called Data
images labels I'm just going to put everything here right so everything is consistent so now
everything is okay images then train and then the images are within this directory so everything
is okay now let's go back to the Google collab every time you make an edit or every time you do
something on Google Drive it's always a good idea to restart your runtime so that's what I'm going
to do I'm going to execute the commands again I don't really need to pip install this Library
again because it's already installed into this environment and then I'm going to execute this
file I think I have to do an additional edit which is uh this file now it's called google_colab_config.yaml and that's pretty much all I'm just going to run it for one Epoch so everything is exactly
the same as we did in our local environment and now let's see what happens so you can see that
we are doing exactly the same process everything looks pretty much the same as it did before we
are loading the data we are just loading the models everything it's going fine and
this is going to be pretty much the same process as before you can see that now it takes
some additional time to load the data because now you have... you are running this environment you're
running this notebook in a given environment and you're taking the data from your Google Drive so
it takes some time it's it's a slower process but it's definitely the same idea so the only thing we
need to do now is just to wait until all this uh process to be completed and that's pretty much all
I think it doesn't really make any sense to wait because it's like it's going to be exactly the
same process we did from our local environment at the end of this execution we are going to have
all the results in a given directory which is the directory of the notebook which is running this
process so at the end of this process please remember to execute this command which is going
to take all the files you have defined in this runs directory which contains all the runs you
have made all the results you have produced and it's going to take all this directory
into the directory you have chosen for your files and your data and your google collab and so on
please remember to do this because otherwise you would not be able to access this data and
this data which contains all the results and everything you have just trained so this is how
you can train an object detector using yolo V8 in a Google collab and you can
see that the process is very straightforward and it's pretty much exactly the same process exactly
the same idea as we did you in our local environment and that's it so that's how easy it is to train
an object detector using yolo Y8 once you have done everything we did with the data right once
you have collected the data you have annotated data you have taken everything into the format
yolo V8 needs in order to train an object detector once everything is completed then
running this process running this training is super straightforward so that's going to be
all about this training process and now let's continue with the testing now let's see how these
models we have trained how they performed right let's move to the next step and this is the last
step in this process this is where we are going to take the model we produced in the training
step and we're going to test how it performs this is the last step in this process this is how
we are going to complete this training of an object detector using yolo v8, so once we have trained
a model we go to the uh to this directory remember to the directory I showed you before regarding... the
directory where all the information was saved where all the information regarding this training
process was saved and obviously I I'm not going to show you the training we just did because it was
like a very shallow training like a very dummy training but instead I'm going to show you the
results from another training I did when I Was preparing this video where I conducted exactly the
same process but the training process was done for 100 epochs so it was like a more deeper training
right so let me show you all the files we have produced so you know what are all the different
tools you have in order to test the performance of the model you have trained so basically you have
a confusion Matrix which is going to give you a lot of information regarding how the different
classes are predicted or how all the different classes are confused right if you are familiar
with how a confusion Matrix looks like or it should look like then you will know how to read
this information basically this is going to give you information regarding how all the different
classes were confused in my case I only have one class which is alpaca but you can see that
this generates another category which is like uh the default category which is background and we
have some information here it doesn't really say much it says how these classes are confused but
given that this is an object detector I think the most valuable information it's in other metrics in
other outputs so we are not really going to mind this confusion Matrix then you have some plots
some curves for example this is the F1 confidence curve we are not going to mind this plot either
remember we are just starting to train an object detector using yolo V8 the idea for this
tutorial is to make it like a very introductory training a very introductory process so we are not going to
mind in all these different uh plots we have over here because it involves a lot of knowledge and
a lot of expertise to extract all the information from these plots and it's not really the idea for
this tutorial let's do things differently let's focus on this plot which is also available in
the results which were saved into this directory and you can see that we have many many many
different plots you can definitely go crazy analyzing all the information you have here
because you have one two three four five ten different plots you could knock yourself out
analyzing and just extracting all the information from all these different plots but again the idea
is to make it a very introductory video and a very introductory tutorial so long story short I'm
just going to give you one tip of something the one thing you should focus on these plots for now
if you're going to take something from this video from how to test the performance of a model
you have just trained using yolo v8 to train an object detector is this make sure your loss is going
down right you have many plots some of them are related to the loss function which are this one this
one and this one this is for the training set and these are related to the validation set make
sure all of your losses are going down right this is like a very I would say a very simple way to
analyze these functions or to analyze these plots but that's... I will say that that's more powerful
that it would appear make sure all your losses are going down because given the loss function we
could have many different situations we could have a loss function which is going down which
I would say it's a very good situation we could have a loss function which started to go down and
then just it looks something like a flat line and if we are in something that looks like a flat line
it means that our training process has stuck so it could be a good thing because maybe the the
algorithm the machine learning model really learned everything he had to learn about this
data so maybe a flat line is not really a bad thing maybe I don't know you you would have to
analyze other stuff or if you look at your loss function you could also have a situation
where your loss function is going up right that's the other situation and if you my friend
have a loss function which is going up then you have a huge problem then something is obviously
not right with your training and that's why I'm saying that analyzing your loss function what
happens with your loss is going to give you a lot of information ideally it should go down if
it's going down then everything is going well most likely, if its something like a flatline
well it could be a good thing or a bad thing I don't know we could be in different situations
but if it's going up you have done something super super wrong I don't know what's going on
in your code I don't know what's going on in your training process but something is obviously
wrong right so that's like a very simple and a very naive way to analyze all this information
but trust me that's going to give you a lot a lot of information you know or to start working
on this testing the performance of this model but I would say that looking at the plots and analyzing
all this information and so on I would say that's more about research, that's what people
who do research like to do and I'm more like a freelancer I don't really do research so
I'm going to show you another way to analyze this performance, the model we have just
trained which from my perspective it's a more... it makes more sense to analyze it like this and it
involves to see how it performs with real data right how it performs with data you have
used in order to make your inferences and to see what happens so the first step in this more
practical more visual evaluation of this model of how this model performs is looking at these images
and remember that before when we looked at these images we had this one which was regarding the
labels in the validation set and then this other one which were the predictions were completely
empty now you can see that the the predictions we have produced they are not completely empty
and we are detecting the position of our alpacas super super accurately we have some mistakes
actually for example here we are detecting a person as an alpaca here we are detecting also
a person as an alpaca and we have some missdetections for example this should be in alpaca and it's not
being detected so we have some missdetections but you can see that the the results are pretty much okay
right everything looks pretty much okay the same about here if we go here we are detecting pretty
much everything we have a Missdetection here we have an error over here because we are detecting
an alpaca where there is actually nothing so things are not perfect but everything seems to be pretty much
okay that's the first way in which we are going to analyze the performance of this model which is
a lot because this is like a very visual way to see how it performs we are not looking at plots we
are not looking at metrics right we are looking at real examples and to see how this model performs
on real data maybe I am biased to analyze things like this because I'm a freelancer and the way it
usually works when you are a freelancer is that if you are building this model to deliver this
project for a client and you tell your client oh yeah the model was perfect take a look at all
these plots take a look at all these metrics everything was just amazing and then your client
tests the model and it doesn't work the client will not care about all the pretty plots and so
on right so that's why I don't really mind a lot about these plots maybe I am biased because I am a
freelancer and that's how freelancing works but I prefer to do like a more visual evaluation
so that's the first step we will do and we can notice already we are having a better
performance we are having an okay performance but this data we are currently looking at right
now remember the validation data it was pretty much the same data we use as training so this
doesn't really say much I'm going to show you how it performs on data which the algorithm have
never seen with completely and absolutely unseen data and this is a very good practice if you
want to test the performance of a model, so I have prepared a few videos so let me show you these
videos they are basically... remember this is completely unseen data and this is the first video
you can see that this is an alpaca which is just being an alpaca which is just walking around
it's doing its alpaca stuff it's having an alpaca everyday life it's just being an alpaca
right it's walking around from one place to the other doing uh doing nothing no it's doing
its alpaca stuff which is a lot this is one of the videos I have prepared this is another video
which is also an alpaca doing alpaca related stuff um so this is another video we are going to
see remember this is completely unseen data and I also have another video over here so I'm
going to show you how the model performs on these three videos I have made a script in Python
which loads these videos and just calls the predict method from yolo v8, we
are loading the model we have trained and we are applying all the predictions to this model and
we are seeing how it performs on these videos so this is the first video I showed you and these
are the detections we are getting you can see we are getting an absolutely perfect detection
remember this is completely unseen data and we are getting I'm not going to say 100 perfect detection
because we're not but I would say it's pretty good I will say it's pretty pretty good in order to
start working on this training process uh yeah I would say it's pretty good so this is one of
the examples then let me show you another example which is this one and this is the other video
I showed you and you can see that we are also detecting exactly the position of the alpaca
in some cases the text is going outside of the frame because we don't really have space but
everything seems to be okay in this video too so we are taking exactly the position of this uh
alpaca the bonding box in some cases is not really fit to the alpaca face but yeah but everything
seems to be working fine and then the other video I showed you you can see in this case the detection
is a little broken we have many missdetections but now everything is much better and yeah in
this case it's working better too it's working well I would say in these three examples this one
it's the one that's performing better and then the other one I really like how it performed too in
this case where the alpaca was like starting its alpaca Journey... we have like a very
good detection and a very stable detection then it like breaks a little but nevertheless I would say
it's okay it's also detecting this alpaca over here so uh I will say it's working pretty much
okay so this is pretty much how we are going to do the testing in this phase remember that if you
want to test the performance of the model you have just trained using yellow V8 you will have a lot
of information in this directory which is created when you are yolo the model at the end of your
training process you will have all of these files and you will have a lot of information to knock
yourself out to go crazy analyzing all these different plots and so on or you can just keep it
simple and just take a look at what happened with the training loss and the validation
loss and so on all the loss functions make sure they are going down that's the very least thing
you need to make sure of and then you can just see how it performs with a few images or with
a few videos, take a look how it performs with unseen data and you can make decisions from
there maybe you can just use the model as it is or you can just decide to train it again in this
case if I analyze all this information I see that the loss functions are going down and not
only they are going down but I notice that there is a lot of space to to improve this training, to
improve the performance because we haven't reached that moment where everything just appears to be
stuck right like that a flat line we are very far away from there so that's something I would do
I would do a new deeper training so we can just continue learning about this process also I would
change the validation data for something that's completely different from the training
data so we have even more information and that's pretty much what I would do in order to iterate in
order to make a better model and a more powerful model now let's get started with this tutorial
this is detectron 2 official repository and this is exactly the framework we are going to use today I have used
detectron 2 many many many times in my projects as a computer vision engineer I think it's an
amazing framework, an amazing algorithm, and in this video I'm going to show you how to train
an object detector using detectron 2. now the first thing I'm going to do is to show you the
date we are going to use today now we're going to use the same alpaca dataset we already used
in one of my previous tutorials if you watched my previous video on how to train an object detector
using yolo V8 then most likely you are already familiar with this data set this is exactly the
data we are going to use in this tutorial too and this is how the images look like now it's
very important that in my case I already have the annotations of this data you can see all of
these txt files this is my annotation these are my annotations for all my data for all my images but
if you're watching this tutorial then most likely you want to know how to train detectron2 on
your own custom data and most likely you want to know how to do all the annotation right you want
to build this data set from scratch you want to annotate all of your images and the annotation of
an object detection dataset is something I have already covered in one of my previous videos in
my previous video where I showed you how to train an object detector using yolo V8 I think it
doesn't really make a lot of sense to cover the entire process again in this tutorial so if you
are curious to know how to annotate your custom data then go ahead and watch that other video I'm
going to post a link somewhere in this video and now let's go to pycharm to a python project
I created for this tutorial and these are the requirements for this project as always please
remember to install these requirements before starting with this tutorial otherwise nothing
is going to work so please remember to install these packages and now let me show you these
three files they are called train.py util.py and loss.py let's start with train.py this is the
file we are going to execute in order to do all of our training all of our training process and you
can see it all starts with a very very long docstring explaining how you need to format your
data this is very very very important in the util.py and loss.py we have many different
functions and we also have a class definition we have many different... code... we have a lot of code
which already handles the entire training process already handles the... parsing the data already
handles everything so the only thing you need to do in order to make this training process to work
as expected is to put the data into this format to put your file system into this format too you
need to put everything as it's specified in this doc string so let me show you you can see that the
annotations should be provided in yolo format this is this format class xc yc which is the X and Y
position of the center of the bounding box of the annotation and then the width and the height
of the bounding box now let me show you one of my annotations files let me show you how it looks
like you can see for example in this case we have five numbers the first one is a zero and then we
have four float numbers so this is exactly the annotation the bounding box in yolo format the
first number the zero is the class ID which in my case is always going to be 0 because I only
have one class in this data set and then these two numbers are the X and the Y positions of the
center of the bounding box these two numbers are the center of the bounding box and then
this number is the width and this this number is the height of our bounding box right so please
remember to format all of your annotations into the YOLO format which looks exactly like this
class ID X and Y position of the center of the bounding box and then the width and then the
height and then your file system needs to be structured exactly like this let me show you in my
computer in my file system if I go to data this is my root directory where my data is located you can
see I have two folders one of them is called train the other one is called val within train I have
two other folders one of them is called images and the other one is called anns within images is
where I have all my images all my training images and within anns is where I have all my annotations
for my training images and then if I go to val you can see exactly the same structure I have two
folders one of them is images the other one is anns within images I have all my images and within anns
I have all my annotations for the validation data now this is exactly what's described here right
you can see that we have a data directory and within data directory we have two folders train
and val within train we have two additional folders images and anns and the same about val
we have two directory images and anns and then this is exactly what I show you in my local computer
and then for absolutely every single image in this directory we have an annotation file in
this other directory with exactly the same name but a different extension this is very important
and please remember to structure your data your file system exactly like this otherwise nothing
is going to work because all the functions which are in this file which handle all the parsing and
reading the data and getting the annotations and so on all of these functions are expecting the
data exactly like this so please remember to structure everything as it's described in this doc
string otherwise you are going to have issues in this training process now let's continue if I
scroll down you can see I have this argparser and these are all the arguments we can specify, we can
define for this training process you can see that we have the data directory obviously this is very
very important then we also need to define what are all the names of our classes if I go to my
file system you can see that in my case I have this file which is class dot names and in my case
it only contains one class name which is alpaca but if you are doing something like a multi-class
object detector then most likely you are going to have other classes as well now let's go back to
pycharm you can see that another argument is the output directory this is where all the models and
all the results everything is going to be saved and then we have different hyper parameters for
example the learning rate this is going to be the learning rate of our training process then the batch
size the number of iterations of our training process then the device if we want to do this
training in a CPU or in a GPU this is very very important then this argument is the checkpoint
period which means how often we are going to save the weights of the model we are training right
we are going to be training by this number of iterations and every a given number of iterations
every 500 iterations we are going to be saving the weights of this model and this is something we
are going to see later on this tutorial when we are doing the validation of the model we trained
now another hyper parameter which is very very super amazingly important is model this is where
we are going to specify the baseline we are going to use in order to do this training process in
my case this is the baseline I have set you can see it's coco detection retina net R 101. and
now let me show you something that's very very important which is where this model comes from let
me show you in my browser this is detector2 model zoo and baselines this is very very very important
when you are working with detectron2 you have many many many models to choose from so this
model zoo is basically a collection, it is a very very large collection of all the baselines of
all the pre-trained models you can choose from when working with detectron 2. if I scroll down
you can see that we have many different sections for example here we have a section which is Coco
object detection baselines then we have another one for instance segmentation we have another one for
keypoint detection then panoptic segmentation and so on right we have many many different sections for
all the different types of algorithms right you can see that we have many many many models many
architectures many baselines we can choose from and basically the idea is that when we are
training our own model when we are training our custom model we can just take whatever pre-trained
model we want we can take whatever baseline we want and we can just train our own model on top right
this is very important because you can see that we have many different metrics we have all
the different performances of all of these different models we also have the inference type...
the inference time of all these different models so this is this is amazing because we can just
choose the model we like the most for the specific project we are working in and in my case this
is the model I have selected, I have selected the retina net R 101 so this is the model we are going
to be using in this tutorial but in your case please go ahead and choose whatever model you want because
it's basically the same I mean the entire process I'm going to show you in this tutorial is going
to work exactly the same for whatever other model you want to choose from here so this is the model zoo
this detector 2 model zoo and please take a look at this zoo take a look at all the models which
are available and this is just amazing you can see that this is like a very very large collection and
it's just amazing so let's go back to pycharm and you can see that this is where you are going to
specify the architecture... the model you are going to use in your training process now let's continue
you can see that after I just parse through all of these different arguments the only thing I'm
doing is calling util.train I am calling the train function which is defined in my util.py
file and I am just calling this function from a very very high level right I'm just calling this
function I'm putting all these arguments as input and that's it and this function is going to take
care of the entire training process if you have watched my previous tutorials on yolov8: the
image classifier, the object detector, the instance segmentation model, the keypoint detector...
absolutely all of my models, you will remember that the training process is super super simple super
straight forward the only thing we need to do with yolo V8 is to code a couple of lines and that
is it, so for this video, for detectron2 I wanted to give you something that's like the same
level of complexity the same level of abstraction right I wanted to give you something super super
high level which you can just go ahead and use without really caring about all the different
details and about everything that's working under the hood right so that's why I made this train.py
file like this right you can just set up all of your arguments and then you can just call train
and you can just forget about all the complexity about using detectron2 that's something I wanted
to do for you because that's going to make things much much simpler for you to just train your
custom... your model on your custom data and that's it and I don't know about you but my case detectron2
yolov8 or whatever other machine learning framework, algorithms, whatever you can think of, for
me they are only tools I use to solve my problems right so being able to train an object detector
using detectron2 by just calling a function like this from a very very high level for
me is amazing I don't know about you but if you're anything like me then you're just going to be
super super happy with this function and if that's the case then just jump to the next chapter where I am
going to show you how to continue this training process and how to do this training from a google colab
but if you do care about the details, if you do care about how everything works like under the
hood, if you want to know exactly... how these functions work and exactly how the data is parsed
and everything if you want to know more details then just continue watching and I'm going to give
you more details. And now let's move to util.py you can see that we have four different functions and
these functions are the functions that take care of the entire training process so let me start
with train this is the function we are calling over here in order to start with the training, with
the training process so this is a very good place to start with this util.py file you can see
that we have many many different parameters many different arguments into this function and
for each one of these parameters, for each one of these arguments, we have a very short description
of what they mean of what this parameter is so this is very important please remember to take a
look at this documentation, at this docstring when you are reviewing this file when you are
reviewing this function because this is going to help you a lot to further understand what each
one of these parameters and arguments mean now let's continue we can see that in the first line
we are calling another function... we are calling another function in this util.py file
which is register_datasets, by the way detectron2 works we always need to 'register' the datasets
before starting with the training process now let me show you this function and this is another
function in the util.py file you can see that we are taking two parameters as input which is a root
directory and the class list file and basically the only thing we're doing in this function is
calling this method we're calling dataset catalog dot register and we are just doing something
else right... we are just taking these two arguments as input into this function but basically just
remember that in this function we are registering the data all of our data all of our annotations into
detectron2 and this is a very important step when we are working with this detectron2, you
can see that we are registering the training set under the keyword train and we are registering the
validation set under the keyword val this is very important because we are going to make a reference
to these two words (to train and val) later on so please remember and then the second argument is this
Lambda function we have over here and you can see that basically we are calling another function
in this util.py file which is called get_dicts so this is basically the function we are putting over
here and we are putting these two arguments which is basically the location of the images and the
location of the annotations for the training set and for the validation set we are iterating in
the training set and in the validation set and for each one of these iterations we are registering
each one of these sets right now let me show you this other function get_dicts and you can see
that basically the documentation we have in this function is very very proper it's a very good
documentation and it says read the annotations for the dataset in yolo format and create a list
of pictures containing information for each image the arguments are a directory containing images
and another directory containing annotations and the return is a list of dictionaries with all
this information the file name for every single image and an unique identifier for every image
then the height and the width of the image and then the annotations; the bounding box and also
the category ID the class ID right and if I show you the code it's very straightforward the only
thing we're doing is iterating in absolutely all the files in the annotations directory and for
each one of these files we are opening the image for this annotation... the image that belongs to
this annotation and then we are just taking the height and the width for this image and we're just
creating this dictionary with all the information which is the image file name the ID the height
and the width of this image and we are just saving everything into a dictionary and then the
only thing we're doing is we are parsing through all the annotations and we're just getting all
the bounding boxes and we are just getting also the class ID right so basically we are parsing
through all of our data and we are getting all the information of our images and all the information
of our annotations and something that's very very important is that if you remember our annotations
are specified in the yolo format which means is the class ID and then is the X and Y position
of the center of the bounding box and then it's the width and then it is the height and we are converting
the annotation into another format which is the x y w h in absolute coordinates this is very
important because this may be confusing but just remember that we are taking the annotation which
is in the yolo format and we are converting into this other format x y means that it's the upper
left corner and then it's the width and then its the height of the bounding box right so that's
basically what we are doing here we are converting the annotation from this format into another
format and then it's just getting the class ID and that's basically all and at the end of this
function we are returning this list right is a list of dictionaries with all this information
right so just go through this function and it's going to be super super straightforward and you
have this super comprehensive docstring telling you exactly how everything works and all the input
parameters and all the output and so on so this is is all for this function for get_dicts and
now let's continue by reading register_datasets so after we register the dataset... the training dataset
and also the validation dataset then we need to tell detectron2 exactly what are the class
names right because so far we are parsing through the data we are parsing through the annotations
but we are only parsing through the class IDs right the annotations they only contain the class
ID but they don't really have the class name so it's very important for detectron2... it's
very important we tell detector 2 what are the class names and that's why we are calling this
function after we register the datasets so that's pretty much all for this function for register_datasets
and now let's continue so after we register the datasets we can continue with the next line which
is get config this is where we are going to create the entire configuration file we are going to use
in this training process and the first line is get config and this is basically a detectron2
built-in function and basically we are getting something like a default configuration file with
many many many default hyper parameters that's basically what we are getting here a very very
long very comprehensive default configuration file and then the next line we are updating this
file with many other values which are specific to the specific model we are using here right in
my case it's retina net r 101 and basically after we are getting this default configuration file we
are just updating this file with many other values which are specific of this model right that's very
important and then the only thing we're doing is just setting other values... we are just manually
updating other values in this config file right you can see that for each one of the other lines
it's config dot a given key and it's the value for that key right in this case we are updating the
the value of the training set the validation set the test set and so on right it's very
self-explanatory each one of these configuration values right for example here we are telling
detectron2 to use CPU here we are setting the weights of this model which is basically
the pre-trained weights of this model we have over here and then we are just setting the batch size the
checkpoint period which is how often we are going to be saving the checkpoints the learning rate and
so on and I would say this is the most important part of this function, by far, right, because this is
where we are telling detectron2 where is the training data and where is the validation data
right because we have registered the datasets and if you remember we called one of the datasets
train and we called the other dataset val so this is where we are telling detectron that the training
data is the dataset we registered under this keyword train and the validation data is the data... the
dataset we registered under this keyword: val, this is very very very important and this... I would say
it's the the most important part of this function and that's basically all for this get config
now let's continue and you can see that then the only thing we're doing is we're creating the
output directory then we are creating this object which is the trainer, the trainer... the one
that's going to take care of the training process then this line and actually these three lines we
have over here this is also very important because when we are training a model using detectron2,
during the training process we will have a lot of information regarding the loss function in the
training set but we will not have any information regarding the loss function in the validation set
that's the way detectron2 works by default so if we want to add this information if we want
to add the loss function in the validation set if we want to access this information this is exactly
what we need to do and this is why I created this class we have over here which is validation
loss, this is the class which is defined in loss.py so long story short these three lines we have
over here it's related to creating this custom output creating this custom debugging information
regarding the training process so we have more information regarding how the training process
is doing... how the training process is going and this is very important because this way we are
going to have additional information and this is going to be super super useful once we are
validating this model so now let's continue you can see that this line is resume or load and this
is pretty much if we are resuming this training or if we are training from scratch and
in my case I'm training from scratch so resume equal false and then the only thing I'm doing is
calling trainer.train and this is pretty much all it takes in order to start this training process
and that's pretty much all so this is a much more detailed explanation of these four functions we
have here, in the util.py file and also of the function or actually the class definition we have
here in loss.py so this is in order to give you more details regarding all these other functions
and this class definitions and so on and now let's continue and let's go back to train.py because
now that we already have all the code we need in order to do this training the only thing we need
to do is to press play and that's it you can see I press play I get some huge output and this is... I'm
just going to stop the training I'm going to show you something this is pretty much all the model...
all the network... all the hyper parameters for this network we are using in order to train this model
in my case remember it's retina net r 101 and then the only thing we will need to do from now on is
just wait until the training is completed but in my case I'm not going to train it locally because
it's going to take a lot of time I'm going to show you how to do this training from a Google collab
because this is going to make the process much much much simpler and much much faster this is going
to take care of the entire training process much faster than if I would do it in my local computer so I'm
going to tell you how to do it from a google colab the first thing you need to do is to obviously
upload your data and this is very important please remember to upload your data, otherwise obviously you
will not be able to train this model from a google colab and in my case you can see that this is my
data the same data I showed you in my local computer these are my train and my val directories and
now you also need to... sorry, you also need to upload these files which are util.py train.py loss.py
and class names right so basically is these files over here train.py util.py loss.py and also the class
names which is this one so remember to upload all these files otherwise nothing is going to work and
now let's move to this google colab, to this Jupiter notebook and I'm going to tell you exactly how you
can train this model from here basically this is a very straightforward process the only thing you
need to do is to execute each one of these cells so it's something very very simple to do and
we are just going to be executing the code we have over here, the first step is to connect your
Google colab with Google Drive so basically you need to execute this cell and you need to wait a
couple of minutes... click on connect to Google Drive I select my account I scroll all the way down
and I click allow and that's pretty much all then I need to wait a few seconds and that's going to
take care of connecting my google colab with Google Drive okay so everything is completed then I'm
going to continue with the next cell I'm going to run this... install these requirements which
is running all this pip installs... this is going to be very straightforward the only thing you need
to do is wait until everything is completed... okay that took a few minutes but now it's completed
and you can see I have an output over here you must restart the runtime in order to use newly
installed versions and if I scroll up I got a similar output over here so basically remember
to restart the runtime if you have a similar message and that's going to be pretty much all
so... Google Drive is now Mounted so everything is okay we have installed the requirements now
let's continue then you need to change the working directory of this notebook so you need to execute
this cell but it's very important you update this path to the path where you have uploaded the
data and all of your files right in my case is content gdrive my drive and then this is the
location of my data if I show you my Google Drive you can see this is my drive computer vision engineer
TrainDetectron2ObjectDetector and if I show you here my drive computer vision engineer
TrainDetectron2ObjectDetector so please remember to update this path with the location of your data
in your Google Drive, your data and also your files right everything should be located
in the same directory once you have edited this location the only thing you need to do is to press
Ctrl enter and then that's going to be pretty much all in order to change the working directory and
then the only thing you need to do is to execute this cell so you can see that we are executing
the train.py file and I'm setting these arguments which are the device I am setting device into
GPU it's very important because that's pretty much the reason why we are using a Google colab
a Jupiter notebook in Google collab so this is very important then I'm also setting the learning
rate in this value and I am going to train for 6000 iterations, I would say these two values
these two arguments are not absolutely needed you can just use the default values but in my
case for my data for my problem I noticed these values were better it was much better to use
this learning rate and also it was better to do a shorter training only 6000 epochs would be
just fine so now I have to press enter I need to execute this cell and that's going to be pretty
much all to do this training process you can see how simple this is once everything is within
this train.py file right once I created these functions and I put everything in this util.py...
we can just execute everything from a super super high level calling train.py and please let
me know what you think in the comments below but I think it's just amazing we can just train
detectron2 2 from a super super high level as we are doing over here the only thing we're doing is
calling train.py and we are passing the arguments exactly like this from a super high level we
don't really care about the details we don't really care about the complexity we don't really
care about nothing it's amazing, I don't know what you think but I think it's amazing please let
me know in the comments below what you think so this is pretty much all for this training the
only thing we will need to do now is we will need to wait until everything is completed and
this is going to take some time this is going to depend on your data on your annotations on your
problem on your specific problem in my case for my data it took something like two hours to do the
entire training process so we are not really going to wait until this is completed because I have
already trained this model when I was preparing this tutorial so let me just show you what the
output looks like, once you trained your model you are going to have a directory which is called
output exactly like the one I have over here and this is where you're going to have all
the results of your training process you can see... you're going to see all of these models
all of these checkpoints which are the weights of your process of your training process in all
these different steps right and this is where we are going to notice this argument over here
checkpoint period because we have set that the checkpoints should be saved every 500 steps and if
you notice this is are all the checkpoints these are all the weights we have saved and if you check
the numbers you can see that these are... 499 then 999 1499 and so on so all of these files are
500 steps apart and yeah so that's basically what it means to save the checkpoints every 500 epochs,
at the end of your training process or actually during your training process you are going to be
saving the checkpoints the weights exactly like this so at the end of your process you're going
to have many many many many weights files exactly like I have over here so these are my weights but
what I'm going to do is I'm going to take this file... this is the file with all the information
of our training process in the training set and in the validation set so this metrics.json
file is the one we are going to inspect is the one we are going to analyze to validate this
model so I'm just going to download this file and now let's go to pycharm because I want to
show you this file which is plot_loss.py so I have already downloaded this file and it's in
my directory let me show you todays tutorial detectron 2 code and this is my metrics.json
file I have just downloaded and now if I show you this plot_loss.py basically what we are
doing over here is parsing through this file right parsing through all the information we have
in this file let me open this file for you so you can see exactly how the information looks like
you can see it looks very crazy right we have a lot of values we have a lot of information and
basically we need a way to parse through this information and we need a way to visualize all
this information super super quickly so that's why I created this... plot_loss.py because
it's going to help us a lot in order to just get all the information we want from this file and
just plot everything into a very nice looking plot so we can just do this validation much much
quicker so let me show you how it looks like I'm just going to press play I'm going to tell you in
a few minutes why I have commented these two lines I'm going to press play plot loss play and you
can see that this is the training loss and the validation loss, the blue values are the training
loss and the orange values are the validation loss but obviously this is something that we cannot
analyze because this is a lot of information this is... this doesn't really look very well right
so we're going to do something now which is going to make everything much much prettier which is
we're going to do a moving average on each one of these functions basically we are going to apply
another function which is going to smooth these values and it's going to make everything much much
smoother, I already made all the code we need and I already made this function which is moving average
and the only thing we need to do is I'm going to delete these comments and basically now we are
going to plot the loss values... the same loss values we are getting from this metrics.json file
and then we're also plotting the moving averages right we are plotting the same functions but
the averages and you can see that this is how the averages look like right this is something
that's much much prettier, and in order to show you much better I'm just going to remove these two
plots and we are only going to plot the moving averages right this is much prettier this is much
much better so now we have in blue the training loss I'm going to adjust the labels okay now you
can see that everything looks better we have this values over here in blue we are plotting the
training loss and in orange we are plotting the validation loss and we can see that both of
these functions are going down and that's a very good sign but in the case of the training loss
it seems we have reached a plateau over here so the training process goes super smoothly until
it reached something like 5000 steps and in the case of the validation loss it seems we also
reach a plateau but much much sooner right so this is basically where we are going to validate
this model and this is also where we are going to decide which one of our checkpoints we are going
to choose from this model right because we have many many many weights we have many checkpoints
and we can just use any of these files in order to produce our inferences in order to produce
our predictions so this is where we're going to Define exactly which one of these checkpoints
we are going to use and I would say I would... I like how everything is going until this point over
here because you can see that the training loss is going down and the validation loss is kind of
going down as well and this is pretty much where everything is like a plateau right so if you ask
me I would keep this checkpoint over here in the 3000 epochs so in the... sorry in the 3000 steps this
checkpoint over here so this is where you're going to draw your conclusions and you're going to make
decisions regarding what you're going to do next obviously another conclusion could be to do the
training again it all depends on what's going on with your data ideally your training loss
and your validation loss should be like closer together right in this case they are very far
apart and that's something I don't really like but that's like ideally I think that if we take
this model over here in the 3000 steps I think everything is going to be just
fine but ideally I would like to have like these two plots more closer together right because
otherwise this could mean the model is overfitting to the training data and the model is not going to
perform well in unseen data but never mind let's just take this model, the one we trained over here
in the 3000 steps and let's see what happens so I'm just going to get back to pycharm because now
it's time to make our predictions now it's time to take the model we trained to take the checkpoint
we chose and let's just make some predictions with this checkpoint with this model so let me
show you how to do that I'm going to this file which is predict.py and this is the file we are going to
use in order to make our predictions and you may see that everything is already coded so everything
is ready and I'm just going to explain absolutely every single line of this file so you understand
exactly how it works and you understand exactly every single line of this file you can see that
the first few lines are a few Imports so I'm just importing a few functions which are important
in order to make these predictions these are a few imports from detectron2 and I'm also
importing CV2 then the first line is getting a configuration file absolutely every single time we
use detectron2 we need a configuration file we need an object which is going to contain all the
configuration for the specific task we are going to do with detectron2 in this case we are
just getting this default configuration file with a lot of default values and then we are updating
this file this default configuration with many other values which are specific to the model we use
in my case as I used this model, as I use this pre-trained model, this Baseline, I have to use
exactly the same one here and the only thing I'm doing is updating this default configuration
file with many other values which are specific to this model then this is very very very very
important I am setting the model.weights to the location, the path, of the checkpoint we are going
to use and if I show you my google drive remember these are all the checkpoints we generated with
this training process and as I am going to use the one we generated at the 3000 steps this
is the one I have already downloaded and it's already in my file system you can see that this
is the directory, the folder, of my python project and this is the file we are going to use:
model_0002999.pth so this is exactly the file we are going to use
and this is exactly the location of this file then as I am going to make these predictions in
my local CPU I am setting device to CPU then I am creating this object which is this predictor and
basically this is going to be the predictor we are going to use in order to make our predictions
then I am loading an image, very important, because we definitely need an image if we are going
to make predictions and this is the image I am loading let me show you this is exactly
the image of an alpaca we are going to use in order to make predictions ideally we should
be getting the location, the bounding box of this alpaca let's see what happens but this is what we
should get ideally we should get the location of this alpaca and this is the location the path to
this image right you can see its data val images data val images and then it is just this name if I
search you can see this is the image we are going to use now let's go back here and then the only
thing we need to do is to call predictor and we need to input the image we are going to predict
and then we are just going to get all the outputs right? all the results but let's just stop for
a minute and let me show you exactly how output looks like I'm just going to print outputs
I'm going to comment everything else this is maybe the only coding I'm doing in
this tutorial so I'm just going to press play and you can see that this is the output we got
right so these are the predictions for this image you can see that we have many different
fields one of them is pred boxes and these are basically all the bounding boxes, all the objects
we are detecting, this is the first one and then these are all the other objects we are detecting
something like 8, 8 different objects in this image and this is very important these are the
all the bounding boxes and these are the X and Y coordinates of the top left corner and these are
the X Y coordinates of the bottom right corner so these are the bounding boxes you can see that we
are also detecting the scores we are also getting information regarding the scores the confidence
values of each one of these bounding boxes for example the first one is 88.6 percent and the
last one is 5 percent so... you can see that these are all the different confidence values
and then we are also getting this information which is the class we are predicting right in
my case I'm only using one class which is alpaca and it's encoded with the number zero but
this is where you will have all the different numbers of all the different class IDs of all
the objects you are detecting and please mind that in my case although my data set contains only
one class ID because I'm only detecting alpacas you may notice that some of these objects were
detected with a different class ID I have a 39 47 56 I have used detectron2 many times in many
projects and this is an issue I have found in different projects so you can see for example in
this case I should be getting only zeros because I only have one class in my dataset but
I'm also detecting other random numbers so please please take a look at the numbers you are getting
here and please take a look that everything makes sense and just make sure you are only detecting
the numbers you should be detecting and if you are getting some random numbers as I'm doing right now
just don't use those predictions do something like an if something like that and if the number you
are getting is not within your predictions is not within your classes then just don't use
those predictions, do something like that, but also in my case for example now you can see that the random
numbers the random values are some detections with a very very low confidence for example
this one is the fourth one so one two three four this one which is something like an 8
percent confidence and then this one which is a 5.8 percent and this one which is a 5.3 percent
so I guess it's most likely this is going to be an issue with those objects with a very very low
confidence value but you never know so please make sure the numbers you are getting they make sense
now let's continue so I showed you the output you you are getting from the detectron 2 now I'm just
going to uncomment everything and I'm going to continue explaining this file so you can see that
the next line it says threshold equals 0.5 and this is the detection threshold we are defining so
we are only going to consider valid all of those detections with a confidence value Which is higher
than 0.5 now let's continue you can see that this is basically... we are parsing through the outputs
you can see that we are detecting three objects the pred boxes the scores and the pred classes
right so the only thing I'm doing is I'm parsing through this information and I'm just getting
these objects pred classes scores and bounding boxes and then I am iterating in all the bounding
boxes and for each one of these boxes I am getting the score, the confidence score, of that specific
detection I'm getting the class ID I'm detecting the number of the class ID I am detecting and then if
the confidence value is greater than the threshold then I am just getting all the values, the X Y
position of the top left corner and the bottom right corner and then I'm just drawing a rectangle
on top of my image, I'm not really checking that I am getting only zeros, I am not doing it here
but that's a very good homework for you, I invite you to make an edit into this file and you make
something like an edit for example here, and you say something like if the confidence score is greater
than the detection threshold and pred is within the class IDs of my class dot names file, of
this file over here, right, if the prediction we are getting is within my classes, is a valid number,
if I am getting a number which makes sense then and only then draw the bounding box right that's a
homework for you that's a very very good homework for you so yeah I'm just going to continue and
you can see that I'm drawing the bounding box and then the only thing I'm doing is plotting this
image so let's see what happens now I'm going to press play and let's see if we are detecting this
alpaca properly or not... amazing we are detecting the alpaca super super properly remember this the
image we are using and we are just detecting the only alpaca we have in this image and yeah we are
just drawing the bonding box and the bounding box is enclosing the alpaca super super properly
so everything is working just fine so this is going to be all for this tutorial this is exactly
how you can train your object detector on your own custom data using detectron2 and this is
going to be all for today. hey my name is Felipe and welcome to my channel
in this video I'm going to show you how to use Amazon recognition as an object detector, Amazon
recognition is a very interesting tool and a very powerful tool which I have used many times in my
projects as a computer vision engineer now let me show you super quickly all the different
categories all the different objects you can detect using AWS recognition and you can see that
this is a very very long and a very comprehensive list of objects right for example you can detect
dinosaurs you can also detect diamonds you can detect driving licenses e-scooters and so on if
I scroll down you can see that these are many many categories and in total we have something like 290
different objects so this is definitely a lot and this is a very interesting tool because there are
many cases in many situations in many projects in which you need to detect a very specific type of
object and in some cases it may not make a lot of sense to train an entire object detector only to
detect a very specific objects in some cases it may be more convenient and it may be easier and it
may be quicker much quicker to just use something like Amazon recognition out of the box and you can just
detect all of these different objects in the list right for example if we were working in a project
and we need to detect Wheels we can either train an object detector from scratch to detect wheels
or we can just use Amazon recognition out of the box right so this is a very interesting tool
and I have used it many times in my projects and this is exactly what we will be doing today and
in this video I'm going to show you how to use Amazon recognition to detect zebras this is a
random category a random object I have chosen from this list so this is exactly the object we will
be using in order to show you how to use Amazon recognition now let me show you super quickly the
video we are going to use as an example so we can use this tool and you can see that this is a
video in which we have many many many zebras we are going to use this video in order to detect
all the zebras and in order to show you how to use Amazon recognition, now... what we're going
to do now is going to pycharm and I'm going to show you the entire process of how to create
a project how to create all the files we need how to install the requirements I'm going to show you
absolutely every single step of this process we are going to start this project and we're going
to build this project from scratch right so the first thing I'm going to do is I have already
opened pycharm I'm going to file new project and I'm going to just create a project and I'm
going to create this project exactly here which is this folder I have over here and I'm going to
create it here where it says tutorial AWS reko this is where you are going to choose the exact
directory where you want to create your project then I'm going to create a new environment and
this is where my environment is going to be located and I'm going to create this environment
using python 3.8 now I'm going to click on create I'm going to choose this window because I'm going
to open this project over here and you can see that this is a completely and fully and absolutely
empty project the only thing we have is the virtual environment which is called env and that
is it, now the first thing I'm going to do is to install the requirements is to install the python
libraries we are going to use today so I'm going to settings then I'm going to a project and python
interpreter and I'm going to click on this button this plus button over here and then I'm just
going to choose... I'm going to type opencv python this is one of the libraries we're going to
use, I am going to click on install package and then we are also going to use boto3
and that's pretty much all, these are the two libraries we need in this project
and then I'm just going back to pycharm and now I'm going to create the first file
we are going to use in this project and I'm going to click here new python file and
then I'm going to call this file main.py so the the first thing I'm going to do for now
is to just write the entire pipeline the entire process we will be doing today the first step
will be to create an aws recognition client... aws reko client right this is going to be
the first step in this process then we are going to set the class set the target class we are going
to be detecting right I already mentioned we were going to detect zebras in this tutorial so this is
exactly where we are going to specify exactly what the object what's the category we are going to
be detecting then we are going to load the video right the video we are going to detect today then
we are going to read frames from the video the next step is to convert the frame to jpg this
is a very important step then we are going to convert this... we are going to get a buffer from
this conversion and we're going to convert this buffer to... to bytes right, it's going to be
the next step in this process then the only thing we need to do is to use Amazon recognition in
order to detect objects and then we are going to write all the detections to our file
system right we are going to write everything to our disk to our local computer and this is exactly
the process in which we are going to be working today now let me show you something else I'm going
to create another file which is called credentials because in the first step in this process
we are going to create this AWS reko client, and in order to do so we are going to need a
couple of keys we are going to need an acces key which I'm just going to set in none for now
and we're also going to need a secret key which I'm also going to set in none for now
right we are going to need these two keys in order to continue with this project because we
need to use these two keys in order to create a client, an AWS rekognition client, now
let's go back to my browser and let me show you exactly how to create these two keys so
let's go back to my AWS Management console and I'm going to show you super quickly how to
create these two keys we need in this project but first obviously you need an AWS account
in order to continue right this is very very important and also you need to login into your
account once you have an account once you have created an account and you are logged into your
account you are going to see something like this this is your AWS Management console and these are
all the services which you have available in AWS right these are a lot... but in today's tutorial
we are only going to use one service only one service which is IAM so we need to type IAM
over here and we need to select this option then this is your IAM Management console
and you need to select users we are going to create a new user then you need to select add
users and we are going to choose a name for this user I'm going to say something like AWS reko
tutorial right this is the name of my user this is the user I'm going to create then you need
to select attach policies directly and we are going to search for rekognition right and I'm
going to select Amazon recognition full access I click here then next and that's pretty
much all so I'm just going to create user so the user is now created and then I'm going to
select the user over here AWS reko tutorial and then I am going to security credentials because now
it's where we are going to create the two keys we need in our project so we scroll down until
we... until this section over here access keys and create access keys then you you can see that
we have all these different options and if I'm not mistaken it's pretty much the same how you
create this access key pretty much absolutely all these options are going to create
exactly the same keys and you can just use... use them from your project if I'm not mistaken but
we are going to use this one over here which is local code because this is the description which
fits better to our project right you plan to use this access key to enable application code in a
local development environment to access your AWS account if I'm not mistaken it's pretty much the
same if we use any other option but let's just use the option which fits better with our use case
and now you can see we have a warning over here which is alternative recommended use an integrated
development environment IDE which supports the AWS toolkit toolkit enabling authenification through
iam identity Center and this is very important because this is a warning we get from AWS because
it means there is a better way or there is a more secure way to create these keys and to access this
service but in this tutorial we are not going to mind this warning because it would involve to
create a solution which it's only useful for a very specific IDE right in my case I'm using
pycharm and if I follow these instructions I would be using a solution which is only useful for
pycharm right and I want to make this tutorial as generic as possible and I want you to use it as
well so in case you're using a different IDE let's just create these access keys in a different
way right the only thing I'm going to do is I'm going to select this checkbox over here I
understand the above recommendation and I want to proceed to create an access key and I'm going
to click next right I'm going to show you a very very generic way to do it which is going to work
for whatever your IDE is right if you use pycharm or if you use visual studio and so on so I'm not
going to type anything here so just create access key and these are our access Keys something that's
very very important is that access keys are very personal and you should never disclose them with
anyone in any situation right so you should never do something like I'm doing right now right just
making a video with my access keys completely available for anyone watching this tutorial never
do something like this right in my case it's not really that important because I'm just going to
delete these Keys once this tutorial is over but please be super super Mindful and super careful
with who has access with your access keys, with your private access keys, because this is very very
sensitive information, so the only thing I'm going to do for now is to copy these two Fields
I'm going to start with this one which is access key I'm going to copy this field and I'm going
to get back to pycharm I'm going to my file to credentials.py and the only thing I'm going to
do is to paste the access key over here right then let's get back to this page and I'm just
going to copy the secret access key and I'm going to head back to pycharm I'm just going to
paste the secret key and that's pretty much all so these are the two keys you need in this project
and now we can continue with the main.py file and we can just start coding our entire pipeline
so let's get started and the first thing I'm going to do is to import boto3 and let's
import opencv as well so we can just focus on everything else right so I have
imported the two libraries we have installed in this project and now let's get started by
creating this AWS reko client and this is how we're going to do I'm going to call this client
reko client and this is something like boto3 dot client and then I need to input rekognition
and then this is where we are going to input the access keys right so we're going to have two
keys one of them is AWS access key ID and then the other one will be something like AWS secret
access key right and now the only thing you need to do is to import credentials I need to import
these two variables right so the first one will be something like credentials dot access key and
then the other one will be credentials.secret_key and that's pretty much all so now let's continue
and we're going to set the target class so I'm going to create a variable which is Target
class and this is where I'm going to define the class we are going to be detecting today, as I
already told you we are going to be detecting zebras so now let's continue now it's time to load the
video and what I'm going to do is to go to my directory where I have the video and I'm going
to copy and paste this video to my directory where I have created this pycharm project so now
the video is located in this pycharm project and it's called zebras.mp4 so let's go back to
pycharm so now let's call... let's load this video exactly like this I'm going to call CV2
video capture and then this will be zebras dot MP4 and this will be cap okay now let's read frames
from the video so I'm going to define a variable which is ret I'm going to initialize it as true
and then while ret I'm going to read frames from the video like this ret frame equal to cap
dot read right so we are reading frames from the video and now let's convert this Frame to
jpg and this is how we are going to do I'm going to call CV2 imencode if I'm not mistaken then
this will be jpg and then frame right and this is going to return two variables one of them we
are not going to use it so it doesn't matter and the other one is a buffer okay now let's convert
buffer to bytes and I'm going to do it like this let's call this something like image bytes and
this will be buffer to bytes if I'm not mistaken something like this I'm not sure about this
character I'm just going to execute this file I'm going to do it for only one frame so we make
sure everything's okay and let's see what happens okay and I got an error and it says something like
could not find encoder for this specify extension in function imcode let's see if I have a... if
I have a character missing I think it's dot jpg let's see now... now I have an error which is object
has no attribute to bytes so I'm almost sure that this is without the underscore let's see now...
and now everything is okay okay so I'm just going to remove this break and now it's the most fun
part of this tutorial because now it's the time to use Amazon rekognition to detect objects
in this video so this is exactly how we are going to do I'm going to call the client we have just...
we have just created reko client and I'm going to call detect labels I'm going to
input the image we have just created this image bytes and this will be something like image I'm
going to open a dictionary and this will be bytes and then image bytes and that's pretty much all
and now I'm going to set the minimum confidence value in which we are going to... for which we are
going to detect objects right we are going to set this value in 50% so... and I think this
is a capital M and this means that we are only going to detect objects if the confidence value
is greater than 50%, for everything else we are not going to get the object right we are going to
filter all the detections with a confidence value lower than 50 percent that's exactly what it means
and this will be something like response right and now let's do it like this okay
and now I'm just going to iterate for for label in response labels right I'm going to iterate in all the results
we got so this is how I'm going to do... if label name equal to our Target class right so
if the object we have detected is a zebra then we are going to iterate
for instance number in range Len label instances and if I'm not mistaken
this is with a capital I right so we are going to iterate in all the zebras
we have detected and now let's continue now let's get the bounding box we have
detected with this... in this object in each one of these objects so this will
be something like label instances instance number and then
we need to call bounding box let's execute the code so far to make sure
everything is okay and let's see let's just let's do it for only one frame so
I'm going to break the loop here labels right because this is
with a capital L most likely okay everything's just fine so I'm going
to delete the break and I'm going to get back here and let's continue so
now I'm going to unwrap all the information in the boundary box and this is
something like X1 is equal to bonding box left and I'm going to cast it to int
okay then y1 is equal to int bonding box top okay then the width of this bounding
box is equal to bounding box... um width with a capital W if I am not
mistaken and then the height is equal to int bounding box height and let's see what happens if we
just execute... if we just print these values so I'm going to print X1 y1 width and height and also I'm going to remove the int for now
because... for now let's just remove it so I can show you something and then I'm going to add
the int again but let's just for now to make sure everything is okay I'm just going to execute
this as it is... okay let's see what happens okay you can see these are the values we are getting and
this is why I removed the int and this is why I'm not casting to int because otherwise everything
will be a zero or a one so these are the values we are getting and everything it's in relative
coordinates this is very very important so what we need to do now is to multiply these values
for the width and the height of the frame we are reading right so I'm going to Define two
new variables which are H and W and these are the height and the width of every frame so this will
be frame dot shape and now let's just continue by doing something like this so X1 will be bounding box
left multiplied by the width of the image right then y1 will be exactly the same but for H...
times H and then this is times W and this is times H okay and now I'm going to cast it to int okay and then let's print the values for X1 y1
width and height again and let's see what happens okay and now you can see that we are getting
integers and everything seems to be okay right we are getting objects we are detecting objects
so everything is okay so the then the next step of this pipeline is to write the detections but
before we do so let's just make sure everything is 100% proper everything is working just just fine
and let's just visualize some of the frames with all the bounding boxes we are detecting on top
and let's see what happens so I'm going to call cv2 dot rectangle I'm going to input the frame and
then X1 y1 and then x 1 plus width and y1 Plus height. and then I need to input the color if I'm not
mistaken which is going to be green and then the thickness of the rectangle which will
be three for now... and then let's see what happens I'm going to visualize
this Frame by calling imshow frame frame and CV2 waitkey okay so we are plotting
a bounding box on top of absolutely every single frame we are plotting a bounding box for each
one of our objects and let's see what happens I'm just going to execute this file and let's
see if we are detecting all of our zebras and everything seems to be working just fine right if
I just press a letter you can see that we are just detecting all the frames this is not running on
real time because obviously we are detecting many many many zebras and we are plotting a rectangle
a bounding box for each one of these zebras so this is not running on real time but you can see that
nevertheless this is working just just fine so the only thing we need to do now is to take all these
detections and we need to write these detections to our file system to our computer so this is how
we are going to do I'm going to remove all the plotting because we are not going to do it anymore
and now let's just write the detections and in order to do so I'm going to create a new variable...
with the output directory with the location of the output directory which is
where we are going to save all these detections so I'm going to Define this variable like output
dir and this will be my local computer and the directory will be called Data so let's go back
to the directory of this pycharm project and let's create a new directory which is called Data
I'm going to press enter and that is it now let's save all these detections into the YOLO format
so I'm going to create another directory which is imgs I'm going to create another variable for
the images directory which will be something like output dir imgs and This Is os path join output
dir and images right I'm going to import OS and then I'm going to create another variable
for the annotations for the detections I'm going to call this other variable anns...output dir anns
and this will be something like this and now I'm going back to my local computer to my
file system and within this data directory I'm going to create two additional directories one of
them for the frames for the images which I'm just going to call imgs exactly how I have called
this variable over here and then I'm going to create another folder which is called anns right
exactly as I have called this other variable over here so now everything is set everything is
ready we have just created the directories where we are going to save all the data now let's
get back over here and the only thing we need to do is I'm going to do something like with
open I'm going to do it here before we start this iteration is going to be much much better if
we do it here for every single one of these frames we are going to open a text file and the path
name will be something like with os path join output directory anns and then this will be
the file name which I'm going to call frame Dot txt and then I'm going to input the frame
number format frame number which we haven't defined so I'm going to Define it in a second but
let's just say string frame number zfill 6 right now let me explain this in a few... in a
couple of seconds but for now let's just get here I'm going to define a new variable which
is frame number I'm going to initialize it as -1 and then I'm going to increment it
for every single frame we read over here okay so we are initially in -1 we are incrementing
this variable here and then for absolutely every single image we are creating this file name
which is frame and then this integer... this number but with six zeros we are filling this
number with six zeros so we make sure all the file names are all the same length that's
very important, that's actually more for formatting reasons it's not 100% needed but
it's going to make it look much much nicer so now let's just continue and I'm
going to open this... as write and then as f and then that's pretty much all okay and now
for each one of our detections the only thing we need to do is to write these detections and
this is how we are going to do f dot write we are going to write five numbers remember we
are going to do it in the yolo format so we need five numbers and this will be something like the
first one of these numbers will be a zero because we will be detecting only one object which is
zebra so this will always be the number zero then it's the the X and the Y coordinates of the
center of this bounding box so this is something like X1 plus width divided by 2 and then it's
exactly the same but for the y coordinate plus height divided by two okay and then it's the
width and then the height and I see there's an issue here okay a parenthesis missing let's
see now okay and... okay perfect and if we are using the yolo format remember we are just
converting all these values into integers but if we are going to save the annotations into
the yolo format we don't really need to do this conversion right so I'm just going to
delete the integer and this multiplication I'm going to something like this because remember
how the YOLO format works we need the coordinates into the relative... we need relative coordinates so
we... with the values like this will be just fine and that's pretty much all okay so we are writing
all the detections and once we have written all the detections the only thing we need to do
is to close the file and that's pretty much all and let's save the images as well let's just
prepare this dataset as if it were a data set in the yolo format so we can just take this
dataset and we could potentially train a model we could train an object detector with the data
we are going to be saving, and in order to do so we need to save the detections but we also need to
save the images so I'm going to save the images over here we can just do it after we save
all the detections we can call cv2 imwrite then the file location which will be pretty
similar to the um to the detections but we are going to change txt by jpg and that's pretty
much all but we also need to change the directory which will be imgs okay and then we need to
input the frame... and that is all... okay so let's see now if everything is okay let's just run it
for only one image and let's see what happens everything is just fine and if I go to my local
directory I open anns you can see I have a file with many many detections which makes sense because
we have many many zebras and then if I go to the images directory you can see I have a frame
the first frame from the video so everything seems to be just fine so the only thing we need
to do now is to execute exactly the same process but for absolutely all the frames so I'm going to
remove this break and then let's see what happens okay I see I got an error because we
should be doing everything else only if we have read a frame right so this is a very
small mistake and also while I was waiting for the execution to be completed I realized another
mistake which is we should be dividing only the width and only the height by two these are the
X and the Y coordinates of the center of the bounding box everything should be okay now so
in order to be 100% sure everything is okay I'm just going to execute this file again now the
execution has been completed and we don't have any errors so everything is just fine and if I
go to the images directory you can see I have 755 images because we are starting from zero so
we have 755 images and these are the images of our zebras right these are all the frames from
the video and then if I go to the annotations directory you can see I have all my annotations
and I also have 755 files right we have 754 and we are starting from zero so we have 755
so everything is working just fine so this is exactly how you can use Amazon rekognition
as an object detector this is exactly how you can detect objects using Amazon rekognition and
it's going to be all for this tutorial in this video we're going to work with
automatic number plate recognition and this is exactly what you will be able to do with this
tutorial you can see that we are detecting all the license plates in this video and we're also
reading the text from these license plates we're using 100% python we're going to use an object
detector based on yolo V8 we are going to do object tracking and we are going
to read the text from the license plates using easyocr
so this will be an amazing tutorial my name is Felipe welcome to my
channel and now let's get started so let's get started with this tutorial today we are going to
work with automatic number plate recognition and let me show you a few resources a few repositories
which are going to be super super useful for today's tutorial the first one is Yolo V8 because
we are going to be detecting license plates and then we're going to be reading the text from
the license plates right and in order to detect our license plates we are going to use an object
detector which is based on yolo V8, so yolo V8 is going to be super super important in today's
tutorial and I'm going to show you more details in a few minutes but for now let me show you the
other repository which we are also going to use in this tutorial and it's going to be super super
important and it's sort it's an object tracking algorithm which is called sort because today we're
going to do object detection and we're also going to do object tracking this is going to be an
amazing tutorial and in order to do object tracking we are going to use sort and then once we have detected
the license plates once we have implemented all the object tracking once we have done everything
we need to do we are going to read the content of the license plate using easyocr so this
is a python Library which is going to be super super super important in this tutorial and now let
me show you the data we are going to use in this tutorial let me show you the video we are going to
use in order to test the automatic license plate recognition software we are going to use in this
tutorial you can see that this is a video of a highway and we have many many cars which are going
through this highway and the important thing about this video is that all the cars... we have like a
very very frontal view of absolutely all the cars and most importantly we have a very frontal view
of all the license plates right you can see that for absolutely every license plate we detect in
this video we have a very very very frontal view and this is an ideal point of view to build a
project like this so this is exactly the video we are going to use in this project and now let me
show you something else if I go to Google and I search for license plate and I go to images let me
show you something you can see that we have a lot of diversity when it comes to license plates right
we have many different types of license plates we have some license plates which are comprised only
with numbers like this one then we have other license plates which are only letters like these
two and we have many many different examples we have many different types many different formats
I would say that absolutely every single country, absolutely every single state, absolutely every
single time in history have its own a license plate format right its own license plate style
its own license plate system right there are many many different type of license plates there's a
lot of diversity when it comes to license plates and obviously that it it's very very challenging
to build an automatic license plate recognition software to deal with absolutely every single type
of license plate right, it's... I'm not going to say it's impossible it's not impossible but it's a
very very challenging task so in order to make it more simple in order to simplify our problem we
are going to focus only on one very specific type of license plate which is this one we are going
to be working with the United Kingdom license plate system, with the United Kingdom license plate
format, which is comprised of seven characters the first two characters are letters then we have two
numbers and then we have three more letters right so we have two letters two numbers and three
letters and this is the exact structure of the license plate type we are going to be working
today in this tutorial right, this is the exact same type we are going to be detecting with the
software we are going to build in this tutorial but today I'm going to show you a very generic
process and a very generic pipeline so by making some adjustments into the code we are going to be
making today you will be able to apply the same process to other types of license plates right we
are going to work with this type in this project but you will be able to make some adjustments in
everything we're going to be doing today so you will be able to apply the same process to other
types of license plates right so that's something I'm going to show you better in a few minutes
but for now let's continue now let me show you something else, when we were starting this tutorial
I showed you that we were going to use an object detector based on yolo V8 to detect license plates
now let me show you the data I used in order to train this license plate detector right this is
exactly the data set I used in order to train this detector, and I'm going to give you a link to this
dataset in the description of this video, and if you want to know exactly how I trained this object
detector I invite you to take a look at one of my previous videos where I show you how I train an
object detector using yolo V8, in that video is the step-by-step guide of how to train an object
detector using yolo V8 and that's exactly the same process I followed when I was creating this
license plate detector so this is the data I used and if you want to know exactly how I trained that
object detector then just take a look at the video I'm going to be posting over there right so now let's
continue I already showed you all the resources we were going to use in this tutorial I already
showed you the type of license plate we are going to be detecting today and now it's time to go to
pycharm so we can start implementing all the code of today's tutorial, and now let's go to pycharm
let's go to this pycharm project and let me show you some files I have over here, you can see I have
many many different files and for now let's just focus on these two: main.py and util.py. main.py
is the file in which we are going to be coding the entire pipeline of this tutorial right you
can see that this is a sequence of steps which we are going to follow in order to build this
automatic license plate recognition software you can see that the first step is loading
the models then loading the video then we're going to read frames and so on this is the entire
pipeline the entire process we are going to be building today and then we have this other file
which is util.py, in this utils file we have five functions let me show you these are the
functions we have defined over here and from all of these functions we are going to focus
on these two which are read license plate and get car, these functions... if I open these functions
you can see that they are completely empty right we need to implement these functions in this
video and then the other three functions they are already implemented right everything is ready
and we're just going to use them and the idea is to focus on these two functions over here because
these two functions are way more important from a computer vision point of view right so these are
the functions we are going to focus the most and this is the util.py file now if I go back to main.py
now it's time we start with this process now it's time we start with this Pipeline and in order to
do so we are going to start importing YOLO so I'm going to say from ultralytics import YOLO and then
we are going to load the models that's the first step in this process and the interesting part is
that we are going to have two models because we are going to be detecting license plates but we
are also going to be detecting cars that's going to be a very important part in this process so
I'm going to be loading two models I'm going to call the first one of these two models coco model
because this is a model which was trained on the coco dataset and this is going to be YOLO and
we're only going to use a pre-trained model from YOLO V8 which is Yolo V8 nano.pt right
we are just going to call this pre-trained model and this is the model we're going to use in order to detect
cars it's very important we detect cars I know we are going to detect the license plates and we are
going to read license plates but detecting cars is going to be super super super important and
you're going to see exactly why in a few minutes then we're also going to load the license plate
detector and we're going to call it license plate detector and this is going to be YOLO
and we need to input the path to this license plate detector and the license plate
detector is located in a directory which is called models and is called license plate
detector.pt so I'm just going to... models... okay now it's time to load the video we are
going to use today and in order to do so I'm going to import CV2 and I'm going to call
CV2 video capture and I'm going to input the video location which is something like the
current directory and it's called sample.mp4 okay and this is going to be cap okay now
we are going to read frames from the video so I'm going to define a variable which is
ret I'm going to initialize it as true and then while ret I'm going to read frames from the
video like this ret frame equal to cap dot read if ret then I am going to continue okay and this is
going to be pretty much all for now so we are reading frames from the video and now it's time to
continue detecting all the vehicles right we are going to be detecting all the cars and therefore
we are going to be detecting the vehicles and in order to do so this is where we are going to use
the first model which is the model trained on the coco dataset so we are going to do something like
this I'm going to call coco model and I'm going to input the frame and this is going to be results
right I'm going to call this object 'detections' and in order to move one
step at the time... I need to access the first element... in order to move one step at the time I'm going to
print detections and I'm only going to execute the first 10 frames otherwise it's going to be
very... this is going to take a lot of time so and frame number lesser than 10 and obviously I
need to Define a variable which is frame number I'm going to initialize it in -1 and
then I'm just going to increment it here okay and I don't really need the pass anymore
and let's see what happens if I print detections okay so everything seems to be working just fine
this is all we got and you can see that this is a lot of information these are all of our detections
so everything seems to be working just fine so what I'm going to do now is we are going
to iterate for detection in detections and this is going to be for detections.boxes
dot data dot to list and let's print detection again so we know exactly how this looks like
and we know how to access all the information okay so this is how each one of our detections
looks like right you can see that we have one two three four five six numbers and the
way this works this is going to be something like X1 Y1 X2 Y2 then we will have the score
and then we will have the class ID right this is detection so remember we are using a
model which was trained on the coco dataset so we are detecting many many different
objects right this is the class ID this is exactly the type of object we are detecting at
every single time at every single one of these detections so this is very important and then
we have the confidence value right this is how confident our object detector is of this specific
detection and then this is the bounding box right so we have X1 Y1 X2 Y2 the bounding box then
the confidence score and then the class ID and something that's very very important we are
doing all of this in order to detect Vehicles so as the coco dataset... as this model which was
trained on the coco dataset is detecting many many different objects we are going to say
something like this if int class ID in vehicles then we are going to continue and vehicles is a
variable which we haven't defined and we are going to Define it with the indexes with the class IDs
of all the vehicles in the coco dataset this is a list of all the objects which we can detect
using this model right you can see that these are a lot of objects and some of these objects
are related to vehicles and some other objects are not for example you can see we have person
bicycle car motorbike airplane bus train truck and so on right so from all this very very
long and very comprehensive list we are going to make sure we are detecting a vehicle so we are
going to say if the class ID we are detecting is either a car or a motorbike or a bus or a truck
then we are going to continue and if not we are going to neglect the bounding box, the detection
we just got, and the indexes we are interested in are 0, 1, 2 for car so we are just going to put
two then three for motorbike four five for bus and then six seven for truck right we don't
really have any motorbike in this video I know for sure because I already watched the video but
nevertheless in order to make this more generic I'm just going to add a motorbike as well so
if our class ID is within our vehicles then we are going to continue and I'm going to
create another variable which is detections_ and this is where I'm going to save
all the bonding boxes of all the vehicles we are going to detect in this video so I'm going
to do something like this if we have detected a vehicle then I'm going to append the bounding
box and the confidence score to this new variable and please mind that I'm not saving the class ID
from now on it's not really that important, we are not really going to care about the specific class
ID we have detected from now on the only thing we care about about our detections is that they are
Vehicles right and we don't really care to know exactly what type of vehicles so this is the new
variable in which I'm going to be working from now on in this tutorial in this process and now let's
continue and now it's the time in which we are going to implement the object tracking remember
we were going to work with object tracking in this tutorial and now it's the time where we are
going to implement this tracking functionality into this project and before we do so let me give
you a very very quick explanation Regarding why exactly we are using this tracking why exactly
we are going to implement this object tracking and basically every time we solve a problem every
time we solve not only a computer vision problem but any type of problem you need to use absolutely
all the information you have available regarding that problem and in this case we are going to be
tracking license plates which are moving through a video right we are going to be detecting license
plates on individual frames and these license plates are objects which are moving through a
video so if we are able to track this license plate through all this video we will have more
information and this additional information is going to be super valuable in order to build a
more robust solution so that's pretty much the reason why we are going to implement this object
tracking and we are going to be tracking... we're not going to be tracking the license plates
themselves but we are going to be tracking the cars, the vehicles, and I'm going to show you exactly
why later on so this is what we are going to do we are going to work with this repository remember
I showed you this repository when we were starting with this tutorial and the first thing you should
do is cloning this repository into your local drive into your local directory you need to clone
this repository into the directory into the root directory of your pycharm project so in my case
this is the root directory of my pycharm project this is where I have all my Python scripts and
this is where I have all my files related to this project and you can clone this repository in one
of these two ways let me show you one of the ways is opening a terminal and typing something like
git clone and the repository URL so I'm going to click here I'm going to copy the repository URL
and then I'm going to paste the repository URL here and then the only thing you will need to do
is to press enter right and that's exactly how you can clone this repository into your local computer
but there is another way in which you can do it and actually this is a much more simple way and
maybe you prefer to do it like this which is just downloading the entire repository as a zip file
and once you have downloaded this file this ZIP file the only thing you need to do is to copy and
paste is to take this directory this sort Master directory into your local directory right that's
the only thing you need to do is to drag and drop this directory into your local computer and that's
it and please mind that this directory is called sort-master but you will need to edit the name
you will need to rename this directory into sort right you can see here in my computer
this is my directory this is called sort if I open this directory you can see these are all
the files which are in this repository so basically remember to rename this directory into sort it's
going to be called sort-master but you need to rename this directory into sort that's very very
very important otherwise you will have some issues possibly you will have some issues with the next
steps in this tutorial so let's go back to pycharm this is the repository you need to clone
into your local directory and remember to call the directory containing this repository remember
to call this directory sort now let's take it back to pycharm and what I'm going to do now is just
importing sort... let's call from sort dot sort I'm going to import everything we are going
to import absolutely everything from this library and then I'm going to call an object I'm
going to create a new object which is called mot_tracker and this is going to be equal to sort
right this is the object tracker we are going to use in order to track all the vehicles
in our video and now let's get back here and what I'm going to do now is just calling
mot_tracker.update and I'm going to input a numpy array... of this list we have created containing all the
vehicles in our video right and this is going to be something like track IDs right so track IDs
is going to contain all the bounding boxes of all the vehicles we have detected in this Frame but
with the tracking information right it's going to add an additional column an additional field
which is going to be the car ID the vehicle ID for each one of the cars we are going to detect
and this vehicle ID or this car ID is going to represent that specific car through all the
video right so let's continue so now we are tracking all of our objects all of our cars and
now it's the time to detect the license plates right so far the only thing we have detected is
the cars in the video but now it's the time to detect the license plates in order to do so we are
going to use this detector over here which is license plate detector and we're going to do it
exactly the same way as we have detected the cars right I'm just going to copy and paste
this sentence and I'm going to replace coco_model by license plate detector right and this
way we are going to be detecting all the license plates I'm going to call this object license
plates and then I'm going to iterate in all the license plates we detected within this Frame and
in order to do so I'm going to call for license plate in license plates dot boxes dot data dot
to list and that's pretty much all and then let's unwrap all the information we got from this
license plate exactly as we did before so this is going to be something like X1 Y1 X2 Y2 score
and class ID this is going to be license plate okay then we will need to assign each license
plate to a given car right because we have detected all the cars in every frame and all
the license plates in every frame but so far we have cars and we have license plates and we
don't really know which license plates belong to which car right and we know for sure that every
single license plate will be on one of our cars but we don't really know which one goes with
which one right so now in this step is where we are going to assign a car to absolutely every
single one of our license plates right and in order to do so we are going to use one of the
functions in our util.py file we are going to use this function which is get car this function
receives a license plate and receives this object we have over here receives this object
with all the tracking information for all the cars in that specific frame and it returns a
Tuple containing the vehicle coordinates and its ID right so we are going to call this function
get car and this function is going to return the car this license plate belongs to right this
is what we're going to do I'm going to import from util import get_car
and now I'm going to call get_car I'm going to input the license plate and
I'm going to input this object which is track IDs remember this object contains all
the bounding boxes and also all the tracking related information right that's very important
and the return will be the coordinates of the car this license plate belongs to so it's
going to be something like X car 1 Y car 1 X Car 2 Y Car 2 and then the car ID
for this car right remember every single car in our video will have an ID
it will have a very unique ID which is going to identify the car through all the
frames in the video that's very important and also please mind that this function is
completely empty for now right this function is only returning some very dummy values and
this function is completely and 100% empty and this is exactly what we will need to implement
in the next step in this project right once we are completely ready once we have completed this
pipeline then at the end of this pipeline at the end of this process then we are going back here
to util.py, to this file to the util.py file, and we're going to implement this function right so
now we have assigned the license plate to a very specific car now we know what's the car this license
plate belongs to and now we can continue with the next step which is cropping the license plate
and this is how we're going to do we are going to call frame and then we're going to input
the license plate coordinates which is int Y1 int Y2 and then int X1 and int X2
right so this is the license plate crop and that's pretty much all we need to do in this
step of this process and now let's continue to the next step which is processing this license
plate right now we are going to apply some image processing filters to this crop we have over
here in order to further process this image so we improve this image so it's much simpler
for the OCR technology for easyocr to read the content from the license plate now it's time to
apply some image processing filters to this crop and specifically the filters we are going
to apply are a grayscale conversion and then we are going to apply a threshold so let's see
how we can do that I'm going to call CV2 dot cvt color I'm going to input the license plate
crop and then I'm going to call CV2 color bgr 2 gray and this is going to be license plate
Gray license plate crop Gray right now we have converted the license plate crop into a grayscale
image and now the only thing we need to do is to call CV2 threshold we are going to input this
grayscale image then is the threshold which I'm going to set in 64 and then it's the value
at which we are going to take all the pixels which are lower than the given threshold right
which is 255 and then I say the value at which we are going to take all the pixels which are
lowered than the threshold because we are going to use the inverse threshold we are going
to use the thresh binary... thresh binary inverse type of threshold and this type of threshold is
going to take all the pixels which are lower than 64 and is going to take them to 255 and all the pixels
which are higher than 64 is going to take them to zero right that's exactly how this threshold works
and if you want more details on how this function works I invite you take a look at one of my
previous videos where I show you an entire course of opencv with python and one of the lessons in
this course is exactly about thresholding right it's exactly about this function so I'm going
to be posting a link to this course somewhere in this video so you are welcome to take a look
at this course and this lesson particularly to get more details on how thresholding works now
let's continue this is going to be equal to a variable which we are not going to use in the
tutorial so it doesn't really matter and then I'm going to call the output license plate crop
threshold right so this is going to be the thresholded image and its exactly the image we are
going to input into our OCR technology into our easyocr algorithm, in order to be more more
clear about the difference between these two images I am going to visualize these images super
super quickly so you see exactly how they look like I'm going to call imshow and I'm going
to input this image which is license plate crop I'm going to call this window crop I'm
going to call it original Crop so it's more clear this is the image we are cropping
from the frame and then I'm going to call cv2 imshow again and in this case I'm
going to be plotting the threshold and I'm going to input this other variable and then the only
thing I'm going to do is to call CV2 wait key and let's take a look at these two images super
super quickly so you see exactly how they look like and this is what we got and you can see
that this is the frame this is the crop we are making from the frame so this is the license
plate and this is exactly how we are cropping this license plate from the frame and this is
the thresholded image right you can see that in this image absolutely every single Pixel is
either white or black and this type of image this thresholded image will make it much much
simpler to easyoce to our OCR technology to read the content from this image right this is the
image we are going to use in order to read the license plate number because this is going to
make it much much simpler to easyocr so it's going to be much simpler to our OCR to read the
license plate so that was like a very very quick way to show you how these two images look like
and now let's continue now it's the time to read the license plate number we are almost there we
have almost completed the this process and this is how we're going to do now we're going to call
another function which is defined in util.py and this function is read license plate and you
can see that this function is not implemented either this function is completely empty we are
returning some dummy values and this is another function which we are going to implement later
on we are going to implement after we are happy with this process once we are completely and
absolutely happy with this pipeline then we are going to move to util.py and we are going
to implement this function as well. But for now we are just going to... we're just going to use
this function so I'm going to import it as well uh no this is not the function name... read license
plate... something like this and now let's see how we can use this function I'm going to call
util Dot read license plate and this is going to return two values let's look at the function
documentation to see exactly what are the values which are going to be returned here... we are going
to... it is going to return a tuple containing the formatted license plate text and its confidence
score so this is going to be something like license plate text and then license plate text
confidence score right these are the two values we are going to be getting from here and the
input should be the license plate crop in our case we are going to input the thresholded crop
right this thresholded version of our crop and that's pretty much all right remember we are just
completing the pipeline the most generic process then we are going to get back here in order to
implement this function and this other function right and now let's continue now the only thing we
need to do is to write the results we are almost there we have almost completed this process and
now obviously if we want to take these results and we want to visualize these results or if we want
to analyze these results whatever thing we want to do with these results we obviously need to write
these results to our local computer so this is how we are going to do in order to write these results
we are going to use another function which is also defined in this util.py file and it's called write csv
and this function is implemented this function is 100% and fully implemented you can see that
this is all the code we have for this function and everything is just ready and we can just use
this function as it is remember in this tutorial and in basically all my tutorials we always focus
on the computer vision part of the problems so writing this csv file is not really that important
from a computer vision point of view so that's why we are not really going to implement this function
live in this video but this is already implemented and we're just going to use it so let's see what
this function does and it says write the results to a CSV file and it receives two arguments which
are the results which is a dictionary containing the results and then it also receives a path to
the CSV file we are going to produce and this is going to be the path in which we are going to
write this CSV file right it's the path in which we are going to save the CSV file we are going to
produce so if we are going to input a dictionary then we need to produce a dictionary in order to
input into this function right we need to take all all of our information and we need to put all of
this information into a dictionary right that's very very important so that's what we are going
to do now because for now the only thing we have done is just Computing all the information but
we have not saved this information into any type of dictionary or anything like that so I'm going
to create a new variable which is called results on results is going to be a dictionary and
then this is where I'm going to save all the information and this is how we are going to
do the first key in this dictionary will be the frame number right we are going to save all the
information and we are going to start with the frame number we are going to have a different
key for absolutely every single frame in our video and then for absolutely every single frame
we are going to save all the information which is related to all the cars we are detecting
and most importantly to all the license plates right so then I'm going back to the end of this
pipeline here and I'm going to say something like... I'm going to make a very quick edit first which
is going back to this function and instead of returning two None I'm going to be returning two
zeros right because we are going to reserve this other output we are going to reserve the None, None
output for those times in which we are going to find an error or we are going to have any type
of issues reading the license plate and this is going to be much more clear later on once we are
implementing this function but for now just bear with me that it's much more convenient to return
some dummy values which are different than None so let's get back here and this is where we're
going to say if license plate text is not None we are going to save all the information about
this license plate in this dictionary we have just created so we are going to take this variable
over here which is results for that specific frame number and we're going to create a new entry with
all the information for the license plate we have detected right and this is how we're
going to do I'm just going to write it first and I'm going to explain it once it's done once
I'm completed and this is what I'm going to do I'm going to say the next key is the car ID right this is going to be results frame number car
ID and then for this car I'm going to create a new dictionary which is going to have two keys one
of them is car and the other one is license plate for car we are going to have another
dictionary which is the bounding box and that's it right and for the license plate
we are going to have another dictionary which is something like bbox... the bounding box
then also the text we have detected then the confidence value for the bounding box and
then the confidence value for the text right okay and that's pretty much all so I'm just
going to format this a little nicer and that's pretty much all now let's see what
exactly we need to input in each one of these fields okay so basically for the car bounding
box we are going to input these values over here which are the car bounding box right
these are the coordinates of the bounding box of this specific car and then for the license
plate bounding box we are going to input these values which are the coordinates for the bounding
box of this license plate and then for the text we are going to input this value which is license
plate text for bounding box score we are going to input this value which is the score in for
in which we have detected this license plate then for text score we are going to input this variable
which is license plate text score and by doing so we don't have any errors and everything is
okay so for every single frame for every single frame number we are going to be saving all the
information which is related to each one of our cars and all the information for each car will
be the information of that specific car where the car is located and then all the information
about the license plate which we have detected in that specific car right and for the license plate
we are going to save all the information we have right and we're going to save all this information
only in those cases in which we have detected the license plate and every time we have successfully
read the license plate number from this license plate so this object is not None we are going
to be saving all these information into this dictionary only in that case only when we have
detected the license plate and when we have read its license plate number right and please notice the
structure I have built for this information for this dictionary because remember every time we
detect a license plate it will not be floating around in space completely isolated no that will
never happen every time we detect a license plate it will be on a given car and this car will
be on a given frame right so this is exactly why this structure I have decided for this dictionary
and once we have created all this information the only thing we need to do is to call... I'm going to
import this function as well, I am going to input the name was something like write csv so let's
import write csv as well and something is going on because we are not really using this import
we have over here so if I scroll down I see I'm not really importing the function itself I think
there we should be okay okay so now let's go back here and I'm going to call this function which
is write csv and I need to input the dictionary so I'm going to input results and I'm also going
to input where I want this CSV file to be saved and I'm going to save all this information into
a CSV file called test.csv so what I'm going to do now is I'm going to execute this pipeline I'm
going to execute this process as it is and then we are going to take a look at this file and then we
are going to continue right then we are going to see if the file we are going to create it
makes sense right so I'm just going to press play okay the execution is now completed and now if I
go to my local directory to the directory of this pycharm project this is test csv so this is the
file we have just created and if I open this file you can see that this is all the information we
have saved and we have extracted from this video right remember we are processing only the first
10 frames we are still processing only the first 10 frames so this is the all the information we
have extracted so far and please remember we are just Computing some dummy values from some of...
from some of our functions so this is this is not really all the information this is all the
information we have compute so far but other than all of these zeros over here and these zeros over
here you can see everything looks pretty pretty well right, we are just producing an entire
CSV file with all the information we have computed from this video we are almost there and actually
we are there we are ready we have completed this pipeline we have completed this process we are
almost almost there the only thing we need to do now is going back to ulil.py because we need
to implement these two functions get car and read license plate and once these functions are
implemented then we are going to be producing a real file right we are going to be using a
file with the entire information here and here right we are going to be producing the real
license plate number and the real license plate score and also the car bounding box and the car
ID for absolutely every single license plate in absolutely every single frame in which we have a
detection right so we are almost there I am super excited and now let's continue to the util.py
file so we can Implement these functions and let's start with get car remember from the main.py
pipeline we were using this function which is get_car in order to assign which car each license
plate belongs to right we have many many cars and many many license plates and for each one of these
license plates we want to know what's the car this license plate belongs to so this is exactly where
we were using this function get car and now let's see exactly how we are going to implement the
function and in order to do so I'm going to show you a few pictures this is a random frame from our
video right you can see that this is a frame we have many many cars and this is only a frame
from the video once we have detected all the cars we are going to have a situation like this we are
going to have many many many detections because at every single frame we are going to have many
many many many cars I don't know how many cars we have in this picture but they are many they are
something like I don't know 20 30 50 maybe 60 cars they are many many cars so for every single frame
we are going to have many many detections which are going to be our cars, we are going to have
many bounding boxes for all of our cars and also at every single frame we are going to have all
of our license plates but please focus please mind that we are only going to have maybe one
or two or three license plates for every single frame right so we are going to have many cars but
only a few license plates and the idea is to know which car this license plate belongs to and the
way we are going to know that is by looking at all of these bounding Boxes by looking at all of
these cars and by finding the car which contains the license plate right by finding the bounding
box of the car which contains the bounding box of this license plate right that's the way we
are going to find what's the car which belongs to this license plate so that's exactly the idea
of what we are going to be implementing in this function now let's see exactly how we can do that
the first thing I'm going to do is unwrap all the information in license plate so in order to do
so I'm going to do something like this because this is exactly the same object license plate
so I'm just going to do this okay then I'm going to iterate in all the cars we have over here I'm
going to say for... let's say for j in a len vehicle track IDs we are going to be iterating in all the
cars we have detected and remember this is the entire information this is the bounding box and
this is also the car ID remember so now we are going to unwrap all the information for each one
of these cars and this is going to be something like x car 1 y car 1 x car 2 y car 2 and car
ID this is exactly the information which is in each one of the elements of this object vehicle
truak Ids and this is vehicle track ID j okay so that's pretty much all we are iterating in
absolutely all the bounding boxes of all the cars we found in this Frame we are iterating
in all these bounding boxes for each one of these bounding boxes we are going to verify
if it contains the license plate right that's exactly what we are going to verify and this is
how we're going to do it we are going to see if X1 is greater... remember X1 is the upper
left coordinate of the license plate if X1 is greater than x car 1 and Y1 is greater than
y car 1 right we are verifying that this coordinate over here it's greater than this
other coordinate over here we are trying to verify if we meet this condition right and then the
other condition we need to meet is if this point we have over here these coordinates
we have over here they are lesser than this other point we have over here right we need to
meet these two conditions and this is exactly how we're going to do it if X1 greater than x
car 1 and Y1 greater than y car 1 and X2 lesser than x car 2 and Y2 lesser than y car 2 then we are we are going to... we
have found the bounding box this license plate belongs to we have found the car on which
this license plate is located right that's what it means if we have met all of these
conditions that's what it means so in this situation we are going to... I'm going to
define a new variable which is foundIt and foundIt is going to be false at the beginning
and then it's going to be true in this case and in this case we're also
going to break the loop right and then I'm also going to Define
another variable which is going to be car_index and car index will be j
okay now if foundIt then return this value which is going to be... okay so if we have found the car which contains
this license plate then we are going to return these values which are the bounding box of the
car and also the car ID and in any other case we are just going to return this output in order to
make it more clear that we have not found the car we are going to return something like this so
it's going to be much more clear so that's pretty much all... that's it, we have implemented this
function which is get car and now let's continue so now let's let's see if everything works well
now we should have the uh the right values for all the cars we are detecting and the only thing I'm
going to do is I'm going to execute this script again and let's see what happens okay I got an
error and I think I know what's the problem I think we need to iterate in range len vehicle track IDs
and now everything should be okay let's try again okay now it's completed and now let's see the new
file we have created the new test.csv file and now you can see that we have some values for car ID
and we also have some values for the car bounding box so we are moving one step at the time but we
are making progress right so now let's continue with the util.py file and now let's move to the
next function which is read license plate now it's time to implement this function over here and
something I'm going to do first is I'm going to do an if over here and I'm going to continue with
this pipeline only if car ID is different than -1 right and now let's continue and let's see
how we can implement this function which is read license plate and the only thing we need to do is
to call easyocr and let's see how we can read the license plate and let me show you some variables
I have defined over here these variables are going to be super super important now this variable are
going to be super amazingly important you're going to see exactly why and then also let me show you
this reader we have here I have already defined I have already initialized this OCR reader and
you can see that I'm calling easyocr and then I'm calling this method which is reader so the
only thing we need to do now is calling reader dot read text and I'm going to input the license
plate crop and this is going to be detections then I'm going to iterate for detection
in detections because remember we could be detecting many many many many different objects
many different text objects in this image so for each one of these objects we are going to
unwrap these objects first and this is going to be something like bounding box text and score
this is going to be the detection right each one of these detections is going to be something
like the bounding box of the text we have detected then the text we have detected and then
the confidence value for which we have detected this text and then we are going to convert this text to
uppercase and we are going to remove all the white spaces right this is exactly how we are going to
do and this will be equal to text okay and now it's the time in which we are going to use this
format right remember when we were starting this tutorial I told you we were going to focus on
this very specific type of license plate right we are going to work with this type of license
plates each license plate is going to have seven characters the first two characters are going to
be letters then two numbers and then three letters this is the format of absolutely every single
license plate we are going to be working with in this tutorial so we are going to make sure every
single text we detect complies with this format and in order to do so I have already created a
function which is license complies format this function returns a Boolean value which is pretty
much the verification of if this license plate complies with the format or not we are going
to be verifying if we have seven characters and we're also going to be verifying the first two
characters are letters and then the second... the third and the fourth characters are numbers and
then the last three characters are letters again right this is exactly what we are doing with this
function and this is a very important function we are going to use now so let me show you
exactly how we are going to use this function if license complies format text then and
only then we are going to return the text and the confidence score we are going to return these two
values, these two variables, which are text and score right only if the text complies with the
format we are asking absolutely all the license plates right only in this case we are going to
be returning these values and in any other case we are going to return None right this is very
very very important and this is going to make our solution way more robust and way way better
and something that makes the solution even better is that we are not going to return the text on
itself we are going to call another function which is format license and let me show you exactly
what we are going to be doing with this function I'm going to call format license text and let me
show you the... let me give you the idea, the high level idea behind this function sometimes
when we are using an OCR technology when we are using a library like easyocr sometimes it's very
challenging to tell some characters apart for example it's very challenging to tell a five apart
from an S right so you can see that the letter S and the number five are very similar and it's
very very very challenging for an OCR to tell the difference between these two characters and
we are going to have exactly the same situation for the letter I and the number 1 or for the
letter O and the number 0 for example right those are characters which are very very hard
to differentiate, they are very hard to tell apart so this function I have over here
format license the only thing it does is going through all the characters in the license plate
in the text and for each one of these characters it fixes whatever issue we may have with
this type of confusion right if for example we are reading this character over here and easyocr,
the OCR technology we are using, it says is the letter S we know for sure it's not the letter
S because we are expecting a number here so if we have detected the letter S then we convert this
value to the number 5 and the same happens here if we are reading this value this character and
we are getting the number 5 we know for sure for a fact that that's not the number 5 because
we are expecting a letter here so we are going to convert the number 5 into the letter S that's
exactly the idea the high level idea of what we are going to be doing with this function
we are going to be going through absolutely all the characters in the license plate and for each
one of these characters we are going to be fixing these type of issues in case we find any type of
issues like this and that's pretty much all and I invite you to take a look at these two functions
to format license and to license complex format and to take a much closer look and to properly
understand exactly how they work right that's your homework that's your homework from this
video so you properly understand how they work so now let's continue and now we are returning
format license text and score if our license complies with our format and we are returning
none in any other case and we are done we are completed now we have completed our process now
let's see what happens now I'm going to execute this file again and let's see what happens I'm
going to make a very very small change I'm only going to execute it for 10 frames but I'm going
to do it like this if ret then if frame number um greater than 10 then I'm going to break the
loop this is going to be much better and now let's see what happens I'm going to execute main again
okay it seems I have a typo over here this is obviously not remove but this is replace I got
confused because I was removing the white spaces but this is obviously not the name of the function
we want to use here so now let's see what happens okay now the execution has been completed and
now we have produced a new test.csv file and if I open this file you can see that we still
have all the information related to the car ID and the car bounding box and now we have all the
license plate numbers we have read from the frames from the license plates and also the confidence
score for each one of these license plates so we made it we have completed this process now
we are completed we are done so everything is ready the only thing I'm going to do now
is to execute this script execute this main pipeline for the entire video so I'm just going
to remove this break over here and that's pretty much all and now I'm going to press play again
and then I'm going to show you how to visualize this data so everything looks like the video I
showed you in the intro so let's see what happens and now let's go back to pycharm so I can show
you exactly how you can create a visualization as the one I showed you when we were starting
this video in order to do so this is where we are going to use these two files visualize.py
and add missing data.py and you're going to find these two files in the GitHub repository
of today's tutorial so you can just go ahead and use them in your project and before using these
two files let me show you something first if I go back here to the test.csv file we have created
let me do something I'm going to filter by car ID I'm going to show you all the data all the
information we have extracted for only one of our cars I'm going to select only the car ID
number three right this is only a random car ID in our data you can see that all the frame numbers we
have detected for this car ID are not consecutive so this means that we have detected the number
zero... the frame number zero then the number one then it jumps to the number four then it jumps to
the number nine then 12 13 14 15 16 17 then 27 so we have many many missing frames right for some
reason we don't have the information for this car ID for many frames which are in between these... these
two for example right we don't have the information for the frame number two the frame number three
or the frame number five six seven eight uh 10 11 right there are many many missing
frames for this car ID so that's something that's going on and remember that we are not saving all
the information because we are only saving the information for those license plates for which
we have detected the car the license plate it is on right? the license plate... the car
where the license plate is located and also we're only saving the information the license plates
for which we have read a license plate... a license plate number which complies with our format right
so we are not saving all the information, there's a lot of information which we are not
saving into this CSV file remember how the OCR Technologies usually work I mean they are very
very good they perform very good but in some cases they have errors they have mistakes so if
in some cases they are not reading a number which complies with this format then we are not going
to be saving the information for those frames so that's the reason why we have some missing
frames over here that's the first thing I want you to notice then another thing which is going to
be much more important is take a look what happens with the license plate numbers now we have read
the license plate numbers in all of these frames and we have read a number which complies with
our format so everything it's okay but you can see that we have many numbers right for example
we have many many different values many different numbers if I show you the number we have detected
in the first frame it's different than the one we have detected here in the frame number four right
and then if I continue scrolling down you can see that we have also detected other values for
example here this is different and if I continue scrolling down this is also different here we have
an N we have a P so for every single car ID we are going to have many many different values for
the license plate and this is a huge issue this is this is a very very important thing we need to
solve because obviously every single car has only one valid value for its license plate
so if we have so many values if now we have so many values for the license plate how do we
make a decision how do we know what's the the real one right what's the real value the
most accurate value for the license plate how do we make a decision what's our criteria that's a
huge problem and this is exactly where the object tracking is involved because for every single car
in the video... because we are going to be tracking the car through all the different frames in the
video, for every single car we are going to have the value for the license plate we have detected
in that given frame for that car so if we want to know what's the value for the license plate of a
given car through all the frames in the video the only thing we need to do is to select the license
plate we have read we have detected with the highest confidence score right you can see this
column is the confidence score in which we have detected every single one of these license plates
so the only thing we need to do is to take a look what's the license plate we have detected with
the highest confidence and that's it, that's going to be your criteria to know what's
the license plate number of this car and that's it that's the way we are going to solve our problem
and that's exactly where the object tracking is involved and that's exactly why it's so important
to track... to apply to implement an object tracking algorithm into this problem because this
is how we are going to solve this problem this is going to be our criteria to select the license
plate number for every single car in this video so remember we have these two problems this
is how we are going to solve this problem and then we have we still have this
other problem which is that we have some missing frames for every single car right this
problem actually is not... it's not really a big problem and the only thing is going to affect is
the visualization right because now we are going to take all this information and we are going to
visualize this information so the only thing is going to happen with all these missing frames is
that we are just not going to visualize the license plate and we are not
going to visualize the license plate value for that given frame so let me show you what happens
if we visualize if we create a video from the text file... the CSV file I just showed you
we will have a visualization which looks like this which will be okay I guess but it's like
um but... it's not an ideal visualization right it's like uh it's it's not really pretty it's not
really good looking this doesn't really look good ideally we would like to have a visualization
which is more stable for every single license plate we would like to see the license plate on
a fixed position through all the different frames in which we are detecting the license plate for
that car right that's exactly what we eould expect and this is not really good looking this doesn't
really look good right so in order to fix this problem which again is not a huge problem
and the only thing it does is to affect the visualization we are going to use one of these
two scripts which is called add missing data and the only thing this script does is interpolate
all of those frames in which we have not detected a license plate or in which we are not extracting
the information for the license plate so the only thing we're going to do is interpolate the
values for the bounding boxes for the car and the license plate in all of those frames, we are
going to interpolate the values and that's it for example in the frame number 41 you can see we have
the information for the frame number 40 and we have the information in the frame number 42 but we
don't have the information in the frame number 41 so the only thing the add missing data.py script
does is going to consider the bounding boxes for this Frame and the founding boxes for
this Frame and it's going to take the average of all the different coordinates and by taking the
average it's going to compute what it's the value of the found inbox in the missing frame right and
it's going to compute exactly the same process in absolutely all the other missing frames so that's
the way we are going to solve this problem all the missing frames remember this is only a problem
of visualization this is a matter of visualization is not a huge problem and then once we have fixed
that issue then we can just create the video and that's it so these two files I'm going to give you
these two files in the GitHub repository of this tutorial and now let me show you how this works
so the first thing you need to do is to execute add missing data and you need to change here the
path to the file name you are going to interpolate right in our case its test.csv and then you need to
specify what's the file name of the CSV you are going to create with the interpolated data let me
show you super quickly how this file looks like I'm going to filter by car ID and I'm going to
select the number three again and you can see that in this case we have computed absolutely every
single frame right we are starting the number zero just as before but now we have computed absolutely
every single... the values for the bounding boxes for absolutely every single frame until the number 65
which is the last frame in which we have detected this car right so this exactly the data
we are creating with add missing data.py and once we have created this data this new CSV
file then we go to visualize.py and then we input something like test interpolated.csv and then we
specify what's the file name of the video we are going to create in this case out dot mp4 and the
only thing we need to do is to execute this file and then to execute this file and then after a
few minutes we are going to have a video which looks exactly like this and this is going to be
all for today my name is Felipe I'm a computer vision engineer and these are exactly the type
of videos and the type of tutorials I make in this channel if you enjoyed this video remember
to click the like button and also remember to subscribe to my channel this is going to be all
for today and see you on my next video so on today's tutorial we will
be making an object detection web application we will be detecting
tumors on a brain MRI image now let me show you how it works I'm going to
drag and drop an image from my computer so this is the image we have uploaded and if I
click here on detections you can see that we have detected two objects we have detected two tumors
on this image so this is exactly the project in which we will be working today on today's tutorial
we are going to make the entire web application using Python and streamlit and we're gonna detect
objects using an object detector trained with detectron2 so my name is Felipe welcome to
my channel and now let's get started so let's get started with this tutorial and the first thing we
need to do is to create a new pycharm project you can see that this is pycharm and now let me
show you how to create a new project we need to click here on new project I'm going to select the
directory where I'm going to create this project which in my case is here and then I'm going to
select tutorial this is the directory in which I'm going to create this pycharm project and
then I'm going to create a new environment and The Interpreter will be python 3.8 everything else
will be just the default values so I'm going to click on Create and that's pretty much all now the next
step will be to install all the requirements we are going to use today so I'm going to create
a new file which is called requirements.txt I'm going to name this file requirements.txt
I'm going to press enter and then I'm going to paste all the... all the requirements all the
dependencies we need to install in this project which are all of these packages we have over
here so I'm just going to copy and paste these packages over here and that's pretty much all now
I'm going to the terminal and I'm going to press pip install -r requirements and that's
pretty much all I press enter and that's going to take care of installing all the requirements
and you can see that I got an error and basically this error is because we need to install all
of these dependencies first and then we need to install this final dependency... right you can
see that this one is called detectron2 we need to install everything else first and then at
the end we need to install detectron2 so I'm just going to comment this line and then I'm going
to press pip install -r requirements again okay now all the requirements have been installed
and now the only thing we need to do is to install detectron 2 so I'm going to uncomment this line
and I'm going to press pip install -r requirements again and this is going to take care
of installing all the requirements but as we have already installed all these packages the only one
that's going to be installed now is detectron 2 so we need to wait a few minutes okay and that's
pretty much all in order to install detectron2 and now we are all set all of our requirements
have been installed so it's time to continue let me show you how to create a new file let's create
a new python file so we're going to select file new python file and this file will be main.py
so this is the file in which we are going to be coding the entire web application of today's
tutorial and remember in this tutorial we are going to be detecting tumors on brain MRIs
so we definitely need an object detector in order to detect these type of objects right let me show
you the data I used in order to train this object detector this is a dataset I found in roboflow
and I'm going to give you a link to this dataset in the GitHub repository of today's tutorial so you
can just go ahead and take a look at this dataset if you want to, and this is an object detector
I trained using detectron 2. and I'm not going to going to show you the details of how I trained this
object detector because that's something I have already covered in one of my previous videos in
one of my previous videos I showed you how to train an object detector using detectron2 and
I showed you the step-by-step guide I showed you the entire process so if you are curious to know
how exactly I trained this object detector I invite you take a look at the video I'm going to
be posting over there and now let's continue this is the data I used in order to train this object
detector and now let me show you the entire pipeline in which we are going to be working
today let's get back to pycharm and let me show you exactly what are all the steps we are going
to be making in this tutorial the first step will be setting up the title of the web application
so this is the first step in this process then the next step is setting up the header right
the third step will be creating a file widget upload file so the user can upload an image about
a brain MRI so we can detect all these objects on top of this image then the next step is loading
the model right loading the object detector we are going to be using to detect objects then we
are going to load the image the user has uploaded then we are going to detect objects and then the
last step in this process will be to visualize the objects we have detected on top of the original
image and we are just going to display this visualization to the user right so these are the
steps of the entire process the entire pipeline in which we are going to be working today and
I'm going to show you every single step of this process so you can see these are one two three
four five six seven steps in only 7 steps we will have this web application up and running so let's
get started and the first step in this process is importing streamlit as st okay then in order
to set up the title I'm going to call st dot title and the title will be something like brain
MRI tumor detection then in order to set up the header I'm calling st dot header and this
will be something like please upload an image okay then in order to create the file upload
widget I'm going to call st dot file uploader and I'm going to input two parameters the
first one is an empty string and then it's all the types we support in this widget
and I'm going to say something like png jpg and then jpeg okay and that's pretty
much all and in order to move one step at the time let's see if everything executes
just fine I'm going to execute the code as it is so far so I'm going back to the
terminal and I'm going to type streamlit run main.py this is going to open my browser
and we are going to see exactly how our web application looks so far and everything looks just
perfect so we are okay in order to continue so let's get back to pycharm and let's continue with
the next step in this process which is loading the model loading the object detector we are going
to be using today and remember we are going to be using an object detector which I trained using
detectron 2 and remember I already showed you how to use the detectron2 in one of my previous
tutorials so let's go back to my browser and let's see exactly how we can use this model which
I trained with detectron2 let's go to the GitHub repository of this previous tutorial and let's
see exactly how this... training this model or how using this model was all about so I'm going to
this file over here which is predict.py and this is the file we used in order to load the model in
order to make predictions with a model trained with detectron2 so the only thing I'm going
to do in this tutorial is to copy some of the code in this file and I'm just going to paste it in the
main.py file of our ocyharm project right remember that in this tutorial we are not going into the
details of how to use detectron2 so I strongly recommend you to take a look at my previous video
to take a look at this video over here which you are going to find in my YouTube channel so you
can see exactly how this... using this model how using detectron 2 works right because we are
not going into the details in this tutorial right so this is my strong recommendation for
you please take a look at that previous video the only thing we're going to do now is just copy
and paste some of these lines which I'm going to explain super super quickly right you can see
that we are getting a configuration file then we are getting the weights for this model and
we are getting the weights from this model from our local drive so we are specifying a file path,
a location in our... in our local drive and the only thing we're doing is specifying the weights
location then we are creating an object which is our predictor and this is exactly the model we
need in order to continue with this process so this is a very quick explanation regarding this
code we have over here and now let's continue now you can see that we need to make a few Imports
because we are not finding these objects we have over here these functions we have so I'm going
up all the way up and I'm going to say something like from detectron2 dot config import get config
and that should be... should be all for this function we have over here then from detectron2 dot engine
import default predictor and that should fix this issue over here and now we need to import
from detectron2 import model zoo and that should be all in order to fix this issue
over here I'm going to delete these comments and that's pretty much all so everything that's
here is everything we need in order to load this model but obviously we need a model in order to
load right because this is just the default code we had in our GitHub repository so let me show you
exactly where it's my model in my local drive if I go to my file system you can see that I have this
file over here which is model.pth and I have this other file which is labels.txt this is the model we
need model.pth these are the weights of our model and what I'm going to do is to copy this file and
I'm going to paste it in the in the directory of this pycharm project right you can see that this is
the main.py in which we are currently working in this is the requirements.txt file we created a few
minutes ago and this is exactly where I'm going to paste this model and I'm going to do something
else which is creating a new directory which is called model and this is where I'm going to put
the model and everything it's okay and remember I showed you we have another file which is all the
labels we are detecting but in our case this is a very very dummy labels.txt file because we
only have one category we are only detecting one class which is tumor and a very very very
quick note is that remember the dataset I used in order to trained this object detector in this
dataset we had two classes which were negative and positive and this is something like two different
types of tumors... or that's what I think... but what I decided to do when I was training this object
detector was merging these two labels these two categories into only one object and I called
this only one object I called it tumor right so that's exactly why we have only one class over
here given that... although the original dataset I used had two categories so that was it's a very quick
note regarding the model I trained and now let's go back to my file system we are not going to use
this directory anymore I go to model and this is the model we are going to be using this is the
weights... the model weights we are going to be using so remember this is within another directory which
is called Model I go back to pycharm and the only thing I'm going to say is something like model
and then the name is something like model.pth okay and that's pretty much all in my case I'm
going to run this code in my local computer which is using a CPU so this is what I need to
specify if your computer or the computer where you are running this code has a GPU then the
only thing you need to do is to comment this line and everything will run on your gpu but in
my case I'm going to run it locally on my CPU so I'm just going to leave this line as it is, now
let's continue now it's time to load the image we are going to use in order to detect all these
objects so this is what I'm going to do if file I have to make another edit so we are uploading
a file and we are calling the file the user has uploaded we are calling this object file so now
if file so if the user has uploaded something we are going to continue and we are going to call the
image we're going to call this object we're going to call it image and then image will be Image
dot open file and then something like to RGB right an image is an object we are going to
import from pillow right from PIL import image okay that should be all, okay now in order to
move one step at the time let's go back to my browser and let's see if everything executes just fine
I'm going to refresh and everything is just fine and now I'm going to select an image let's see if
everything it's okay the data I'm going to use it's located over here this is train and val I'm
just going to select a random image which is this one and let's see what happens we have an error
because this is not called to but this is called convert if I'm not mistaken let's see now I'm
going to refresh and I'm going to do the same process again I'm going to select the same image and I'm
going to drop it over here and you can see that now we have another error because it's not covert
but it's convert I had another typo okay now let's see what happens I'm going to refresh again
let's hope everything is okay now I'm going to take the image I'm going to drop it here and let's
see what happens now we have to wait a couple of seconds we may be loading the model so this may
take a few seconds... and everything it's okay we are not visualizing the image so if we are not getting
any error that means everything is okay so let's go back to pycharm everything it's okay so far
and now it's time to detect objects we are moving super super quickly we are almost there right we
have almost completed this process this Pipeline and the only thing we need to do now is to detect
objects and in order to detect objects with this model which was trained with detectron2 I am going
back to my browser and to this repository because let's see exactly how we can make this prediction
the only thing I'm going to do is to copy and paste everything that's from here up to here we
don't really need to draw the rectangle but let's just copy everything so I'm going to copy then I'm
going to pycharm and I'm going to paste it here we will need to make a few edits but most of the
code will remain the same right I'm just going to fit this image over here because if I go back
to my GitHub repository you can see that this image is actually a numpy array right we are
reading this image using opencv so the format is a numpy array and we need to input a numpy array
right over here so I'm going to do something like I'm going to Define a variable which is
image array and this will be numpy as array right we will need to import numpy so I'm
going to say something like import numpy as np and that's pretty much all now I'm going
to input image array and that should be it so this is pretty much all, we are going to be
detecting all the objects we have... we are going to be returning all the objects we have detected
with a confidence value greater than 50 percent and other than that everything is just fine
and that's it and we don't really need to draw the rectangle so I'm just going to delete it and
that's pretty much all, so we have loaded the image we have detected all the objects on top of this image
using our model and now it's time to continue with the visualization now we are going to take all the
detections all these objects we have detected and we are going to draw bonding boxes on top of the
image the user has uploaded so this is amazing because we're moving super super quickly and let's
see how we can continue with the visualization, now it's the time in which we are going to draw bounding
boxes on top of our images and in order to do so we are going to use plotly, plotly is an amazing
python Library which I have used many times in my projects it's an amazing Library you can
do some very very crazy visualizations using plotly, some very Dynamic visualizations so this
is a very amazing Library we are going to use now and something that's very important is that in my
tutorials we always focus on the computer vision part of the problem and everything that's
related to the visualization is not really that interesting from a computer vision perspective
so what we are going to do now is just taking the code for the visualization which I have already
prepared over here right this is a function which is called visualize and this is the function we are going
to use in order to visualize the bounding boxes on top of our images so please pay attention please
focus because otherwise you may be lost please take a look at what we are going to be doing
now I'm going to the project I'm going to file new python file and I'm going to create a new python
file which is called util.py then I'm going back to this file I have over here and I'm just going
to copy the entire file I'm going to press Ctrl C and then I'm going to press Ctrl V over here
so this is all the code we need in order to do the visualization remember the visualization is
very very very interesting and very important but it may not be the most interesting thing from a
computer vision perspective and that's why we are not really minding everything that's visualization
everything that's related to how to visualize all these bounding boxes on top of the images we are
just going to use this function and that's pretty much all, I need to do a few Imports otherwise this
is not going to work I'm going to import streamlit as st and that's pretty much all if I'm not
mistaken yeah now let me show you something which is related to all the code I have just
copied you can see that this is the code of two different functions right one of them is called
visualize and this is a function we are going to use now in a few minutes in order to visualize
all the bounding boxes on top of our images and the other function is called set background and
this is another function which is only going to make a very very very small and very aesthetic detail
at the end of this tutorial which is changing the background of the web application right this
is only a detail this is definitely not the most important thing from a computer vision
perspective right this is just changing the background of the web application of the browser
so this is something we are going to do at the end and this is also in the code I have just copy
and pasted into this file but now let's focus on this other function which is visualize you can
see this function receives two parameters one of them is image and the other one is bounding
boxes and you can see that the image is the input image and then the bounding boxes are a
list of all the bounding boxes in the format X1 Y1 X2 Y2 so now let's go back to main because
let's see exactly how we can call this function over here the first thing I'm going to do is from...
from util import visualize right now the function is imported into our main process and now
let's go back here and then this is where we are going to call this function remember we need to
input two parameters one of them is the image we are going to import... the image we are going
to use in order to draw all the bounding boxes and we need to input the image in the pillow
format and then the other variable is bounding boxes right bboxes and please please focus,
please pay attention because we already have a variable which is bboxes but if we go back to
the documentation you can see that this variable is a list of bounding boxes in the format X1
Y1 X2 Y2 so this is not the same as this other variable we have over here please pay attention
because otherwise it may be a little confusing so this is what I'm going to do I'm going to
define a new variable which is bounding boxes underscore this is going to be a list and
what I'm going to do here is just appending the bounding boxes exactly as we need them to
be right so this is what I'm going to do and if I go back to util.py this is exactly what
we need to input okay so we have this object over here and the only thing I'm going to
do is to paste this object over here and I invite you to take a look at this file... I invite you
to take a look at this function visualize so you can see exactly how it works and you are going
to see exactly we are using the plotly library and we are calling some functions and we
are doing some stuff which is related to visualization right I invite you take a look at
this function this is going to be available in the GitHub repository of today's tutorial but now
let's continue and let's see exactly what happens if we refresh this website and if we upload
a new image and let's see exactly what type of visualization we will be getting with this
function so I'm going back to my local computer to my file system I'm going to take a random
image again and I'm going to drop it over here and you can see that now this is what we get
which is exactly the same image I uploaded over here this image over here but now we have
these two buttons one of them is original which is... which means this is the original image we have
uploaded and the other one is called detections and if I press this button you can see that we
are plotting the bounding box exactly on top of the tumor of this brain right... I mean I'm not
a doctor so I have no idea what I'm looking at I have the impression this is a brain and this is
an MRI and based on the colors I have the feeling that this is the issue right this is a tumor so
it looks like we have detected exactly what we should have detected right but this is the data
I used in order to train the model right this is the training data now let's see if we have exactly
the same performance with a data... with an image in our validation set right this is completely
complete and absolutely unseen data for my model so let's see what happens if I just take
a random image like this one I'm going back here this is the image I have just uploaded remember
now we are taking completely unseen data for my model and let's see what happens if I move to the
other tab to the other bottom which is detections and we are detecting successfully detecting the
the bounding box the object we should be detecting in this image so everything is working just fine
and in order to make it more challenging and more fun let's see if we can detect an image with two
objects I know that there are a few... like this one which has two objects so I'm just going to drop
this image here and let's see if we can detect both of these objects both of these issues and
we can see that we can detect both of them so everything seems to be working just fine and this
is pretty much all in order to set up this web application up and running you can see that we
are uploading images and we are just detecting all the issues in this image and we are just plotting
everything exactly as we should the only thing I'm going to do now is to use this other function we have
over here which is set background right the only thing I'm going to do is to change the background
of this web application so we make it a little a little nicer and this is exactly how I'm going to call
this function so I'm just going to main.py I'm going to import... from util import visualize and
then set background and then I'm going back to my file system and this is an image I have prepared
in order to change the background it may not be the perfect background ever but I think it's going
to work we are going to put this background in our web application so let's see what happens I'm
going to copy and paste it over here and now I'm going back to pycharm and I'm just going
to call set background and I'm going to input bg.png and let's see what happens if I refresh
and you can see that now we have a much better looking background so everything looks much much
better now, now let me open a new image I'm just going to select for example this image over here
so we can see how the entire web application looks like with this new background we have to wait a
couple of seconds and now we are getting the image with all the detections on top so this is going
to be pretty much all for this tutorial this is exactly how you can create an object detection
web application using Python and streamlit and this is going to be all for today if you enjoyed
this video I invite you to take a look at other of my previous videos where I show you how to make
an image classification web application and I'm going to be posting a link to this othr tutorial
over there so remember if you enjoyed this video most likely you will enjoy that video too
because it's exactly the same process and it's a very very similar web application. Congratulations. You have completed
my course on object detection. My name is Felipe. I'm a computer vision
engineer and this is exactly the type of videos and the type of courses I make in this channel. If you enjoyed this video,
I invite you to click the like button. And I also invite you to subscribe to my channel. This is going to be all for today
and see you on my next video.