Object detection with Python FULL COURSE | Computer vision

Video Statistics and Information

Video

Captions Word Cloud

Captions

Hey my name is Felipe and welcome to this fully comprehensive course on object detection. We will start by discussing what object detection is and how to measure the performance of an object detector. Then I'm going to show you a step by step guide on how to train your own object detector on a custom dataset. And I'm going to show you three different ways to detect objects on your images and videos: Yolov8, detectron2 and AWS Rekognition This course is ideal for beginners as well as for more advanced developers as it contains very valuable information and insights I gathered from years of experience as a computer vision engineer. By the end of this course, you will be familiar with different object detection algorithms and you will be able to create amazing projects using state of the art computer vision technologies. And now let's get started. So let's start with this lesson about what is object detection. I'm going to cover the definition and I'm also going to mention a few examples. So object detection is a computer vision technique to identify and locate objects within images and videos. And there are many technologies to perform object detection. These are only a few of all the available technologies of all the available algorithms which you can use to do object detection. For example, you can use the Python library mediapipe, which is a very popular library to do hand detection and face detection. You can also use OpenCV, which is a library available for Python and C++ You can use Yolov8, which is the most recent version of YOLO. You can use Detectron2, which is a high level framework based on Pytorch. And this is a very popular framework in order to do many different computer vision related tasks. You can also use AWS Rekognition, which is a service available through a cloud provider. And these are only a few of all the different ways to do object detection. And there are many, many, many, many, many, many, many ways. And I don't know how many ways to do object detection. These are only a few of them. And although there are many algorithms and many technologies, all of them were pretty much the same from a high level perspective. From an input output perspective, all of them receive an image as input and the output is a list of all the detected objects in that image. And the objects in that image are given by these three values. The bounding box which is the location of the object in the image. Then the confidence score, which is a value from 0 to 1. And it means how confident the object detector is regarding that detection. And then the object category or the class name, right? Because if we... if we have detected an object, we want to know what object we have detected, right? We want to know the name of that object. So pretty much all the object detectors work pretty much the same way and they are going to return something which looks like this. The bounding box is usually specified with four values and there are many different formats, many different conventions in order to specify the bonding box. And this is one of the most popular formats in order to do so which is the X and Y position of the top left corner and then the X Y position of the bottom right corner, right, with these two values with these two corners, then we have specified the bounding box and then the confidence core and then the class name. So remember, although there are many, many, many ways to do object detection, they all work pretty much the same way from an input output perspective. And this is a very specific example of how to do object detection in this image. You can see that this is the image of a cat and a dog. And this is a Python script, a very, very simple Python script which uses yolov8 in order to detect all the objects within this image. And I'm not going into the details of how this script works, but this is going to be available in the github repository of this tutorial. So if we execute this script, you can see that at the end we have... we are iterating in all the detections we have detected in this image and we are printing all the detections. And if we execute this script, we are going to print something like this, you can see that we have detected a cat. This is the bounding box where the cat is located. And this is the confidence score of the object detector regarding this detection. And then we have also detected a dog. This is the bounding box of the dog and this is how confident the object detector is regarding this detection. So this is a very simple example of how object detection works on a very specific image and this is going to be pretty much all for this lesson. So remember, object detection is a technique, a computer vision technique to identify and locate all the objects within an image or a video. And although there are many, many many many different ways to do object detection, all of them were pretty much the same from an input output perspective. Now let's move to the next lesson about object detection metrics. So let's talk about object detection metrics. We will answer the question how to measure the performance of an object detector. And you can see that we are just starting with this lesson. And we immediately got this huge warning sign which says when using object detection metrics, you are only comparing your predictions with your ground truth. This is very, very, very important and you're going to see exactly why later on this lesson. But for now, let's continue. So this is the road map we will be covering today. I have divided all the content in this lesson into two sections. The first one is about fundamentals and this is where we will discuss all the definitions of all the metrics we will be using today, all the different examples. I'm going to show you about these metrics and we will assume we are working under ideal conditions and this is very, very very important and you're going to see exactly what I mean with ideal conditions later on. And then we have the other section which is the one for the more advanced topics. And this is where we will assume real life conditions for now, let's continue. So let's start with the fundamentals, we are going to cover the most common metrics. And we will assume the data we are using to train the model is perfect. This is what I meant with ideal conditions, right? We will assume our data set is perfect, which involves we have many samples, we have a huge dataset. And in case we have many different classes in our dataset, we we assume all of our classes are equally distributed, which means we have the same amount of objects for each one of our classes. But most importantly, we will assume our dataset is perfectly annotated. So we have no issue in our dataset whatsoever, right? These are the ideal conditions we will be assuming in this section. Now let's continue. These are the most common metrics which are commonly used in object detection. So we have the loss function which is used during training during the training process. And then we have these two other metrics which are part of the evaluation process of an object detector, which are the intersection over union and the mean average precision. Now let me show you a very specific example of how this looks like on real life, right? Remember from our previous lesson I told you there were many, many, many different ways to do object detection, many different technologies, many different algorithms. Now yolov8 is only one of all the different options of doing object detection. And when training a model with yolov8, this is what we get at the end of the training process. At the end of the training process, we will have all these many plots. So we can analyze the training process itself. And we can also analyze the performance of the object detector we have just trained. And from all of these plots, you can see that six of them are related to the loss function. And I'm not going into the details on why we have so many plots for the loss function. But just keep in mind that this is such an important metric that we have all these many plots in order to analyze the performance of the model and the performance of the training process regarding the loss function. This is such an important metric that is why we have so many plots. And then the remaining four plots are related to the main average precision. And in the case of yolov8 right, in the case of training an detector with yolov8, the intersection of a union is not provided. But this is also a very important metric in object detection. Now let's continue, let's start with the loss function. This metric is related to the learning process to the training process. And there are different loss functions. There are many loss functions and they usually involve very complex mathematical expressions, very, very, very complex mathematical formulations and expressions. And the only thing I'm going to cover in this course about object detection is that regarding the loss function lower is better. So a lower value of the of the loss function means it's better. And if we go back to these plots we have over here, you can see that in all of these plots, regarding the loss function in all of them, you can see that the loss function is going down as we increase the number of epochs, right. So the the this is the only thing I want you to remember for now, the the loss function is related to the learning process. They usually involve very, very, very complex and very advanced mathematical expressions and lower is better. Now, let's continue, let's move on to the intersection over union. And this metric measures the detection accuracy. It ranges between 0 and 1. So the intersection of our union is a value between 0 and 1 and higher is better. And this is exactly how the intersection over union is computed. So we have... given two bounding boxes, right? Remember we are going to be comparing our detections with the ground truth. So we will have a bounding box for our detections and we will have a bounding box for the ground truth. Given these two bounding boxes, we will measure the area of overlapping and we will measure the area of union. And then we will just compute the intersection of union making this very simple calculation, right? Let me show you an example. So we have these two pictures of a cat, we have a cat in each one of these pictures. And you can assume these are the ground truth bounding boxes for these objects, right? You can see in each case, this is a bounding box which encloses the object perfectly. This is the ground truth. Now let's assume we are using an object detector and these are the detections we got with our object detector. And now let's assume we want to compute the intersection over union for each one of these cases. In the case of this example over here, we have a very, very, very small intersection, we have a very small overlapping. So if we apply this formula we have over here, which is the area of overlapping over the area of union, we will have a very, very, very, very low value and this value is 0.15 right. So in this case, we have a very, very small value because these two boundary boxes have a very, very, very, very small area of overlapping, of intersection. But in the other case, you can see that our prediction is very, very, very close to the ground truth, right? It's it's almost perfectly matching the ground truth. So in this case, we will have a higher value of intersection over union. And in this case is 0.95. So this is a very, very simple example for you to get like a much better idea regarding the intersection over union. Now let's continue, let's move to the mean average precision. The mean average precision is based on the precision recall curve. And the precision recall curve is based on the intersection over union and the detection confidence score. Right? Remember from our previous lesson on what is object detection, remember I mentioned that all of the different frameworks, all the different algorithms in order to do object detection, all of them have pretty much the same structure regarding the input output and the output will always involve a bounding box and also a confidence score. So the precision recall curve is based on the intersection over union and the detection confidence score. The recall measures how effectively we can find objects, right? From the precision recall curve, we have two elements, one of them is precision, the one is recall and the recall measures how effectively we can find objects and then precision measures how well we perform once we find an object please mind these two definitions. Please mind the difference between these two definitions. This is very, very, very important and this is going to be much more clear in a few minutes because I'm going to show you a few examples. But please please uh focus on each one of these two definitions of how we are defining recall and how we are defining precision And then about the mean average precision, remember that higher is better. Now let's move on. Now it's where we are going to describe an example on how to compute the mean average precision. So this is our dataset, right? let's assume that we have 10 apples in our dataset. And for each one of these apples, for each one of these objects, we have the ground truth, right? We have a bounding box which encloses the object perfectly, right? So this is our data and these are our annotations, this is our ground truth. Now let's assume we are working with an object detector and these are our detections, right? In some cases, for example, here or here or here we are getting like an OK detection. But in other cases like here or here, we are not getting a very good detection, right. So let's see how we can compute the mean average precision in this example. So this is the ground truth with the predictions on top. Now, we are visualizing both the predictions and the ground truth. And these are values which are going to be super important in order to compute the mean average precision. You can see that for each one of these objects, we have two values, the score, which is the confidence score of that prediction, right? It's the confidence score of the green bounding box and then the intersection over union which is the intersection over union between the green bounding box and the blue bounding box is the intersection of a union between our prediction and the ground truth. You can see that for each object in our dataset, we have these two values, the confidence score of the green bounding box and the intersection over union between the green bounding box and the blue bounding box. And what we will be doing now is we will be applying this very, very simple process. This is pseudo code, this is not real code, right? This looks like Python but its not really Python. This is the pseudo code of the process we will follow in order to compute the mean average precision. Please mind and please please pay attention because this is very important. You can see that we are defining a variable which is intersection over union threshold and we are defining this variable as 0.5. Then we are iterating in many different values for the confidence score threshold. And we are defining two variables. For each one of these iterations, we are defining two variables which are true positives and false positives. And we are initializing each one of these variables in zero. Then for each one of our detections for each one of our green bounding boxes, we are going to verify if the confidence score we got is greater than the confidence score threshold. we are computing in this iteration. And if it is greater than this confidence score threshold. We will take a look at the intersection of over union between the green bounding box and the blue bounding box. And if this is greater than the intersection of over union threshold, then we are going to increment the true positives variable. And in any other case, we are going to increment the false positives variable. So this is a very simple and a very straightforward process in order to compute the mean average precision. But please focus on this process, please go through this process more than once. Please be super super clear on how this works because it's very, very important to understand how this process works. So once we have computed the true positives and the false positives, right? Remember that for each one of the values in confidence score threshold, we will be computing the true positives and the false positives, we will be computing these two variables. Once we have computed these two variables, we are going to define precision and recall exactly like this, precision will be the true positives divided by the number of true positives + false positives. And in case of recall, we will be dividing the true positives for the total number of ground truth objects. And in our case, the total number of ground truth objects is 10, right? Remember we have 10 blue bounding boxes and that's exactly our ground truth. So in our case, this number will always be 10. Now, let's continue. So let's go through this situation. Let's go through this process once and again. And let's start with a confidence score threshold of 0.75 right. In order to do so, we are going to go through each one of these green bonding boxes. We we're going to go through each one of our detections and we are going to keep only the ones which are... which have a confidence score greater than 0.75 which are these three bounding boxes, right? If we go back, you can see that in this case, the confidence score is 0.7. In this case is 0.40 0.2 and so on. In all the other bounding boxes, the confidence score is lower than our threshold of 0.75. And we are going to take a look at the intersection over union. And in case the intersection over union is greater than the threshold we have defined of 0.5 then we are going to increment the true positives. And if not, if it is not greater, if it's lower than 0.5 we are going to increment the false positives. And in this case, you can see the intersection over union is 0.85. So this is greater than 0.5. So this is a true positive. This is also 0.85. So this is also greater than 0.5. So this is also a true positive. And in this case, this is also a true positive because the intersection of union is 0.8. So in this case, we have three true positives and we have zero false positives. If we go... if we move here, you can see that this is exactly what we have just mentioned. The true positives is three, the false positives is zero. So if we compute precision and recall, we get that precision is 1 and recall is 0.3 right? It's a very, very, very simple process. A very straightforward process. Please go through this example once and again until you are completely and 100% clear on what we are doing because once you get familiar with the process is very, very simple. But now let's move on to a confidence score threshold of 0.5. In this case, we are going to do exactly the same. We're going to filter all the detections with a confidence score lower than 0.5. And this is what we got. And now let's go through each one of these detections. And let's see if the intersection over union is greater or lower than 0.5. In this case, it's greater than 0.5. So this is a true positive. This is also a true positive, also true positive, this is also true positive. This is also true positive. But in this case, the intersection over union is 0.4 w hich is lower than 0.5. So this is a false positive. So we have 5 true positives and only 1 false positive. And if we compute the position and recall we get the precision is 0.83 and the recall is 0.5. Right? Let's continue. Now, let's move to confidence score threshold equal to 0.25. We are going to filter all the all the detections with a confidence score lower than 0.25. This is what we got. And let's take a look at the intersection over reunion. You can see in this case true positive true positive, true positive. In this case, it's 0.1. So this is a false positive. This is also true positive true positive and this is a false positive and this is also true positive. So we have 1 2 3 4 5 6 true positives and 2 false positives. And this is what we have over here. We have six true positives, two false positives and the precision is 0.75 and the recall is 0.6. Now let's continue. Now let's compute exactly the same values but for a confidence score threshold of zero. In this case, we are not going to filter any attention because all of them have a confidence score which is greater than zero. And in this case, you can see that this is a true positive true positive, true positive. This is a false positive. This is a true positive. This is also a false positive. And then all the other ones are true positives... except this one which is a false positive. So we have 1 2 3 4 5 6 7 true positives and only three false positives. And this is exactly what we have over here. So the precision is 0.7 and the recall is 0.7. So we have computed all these different values for precision and recall. And from here is super super easy and super straightforward to put everything together under a precision recall curve, right? We can very, very easily to take all these pairs of values of precision and recall and to put everything together on a plot which looks like this, right. And if we compute the area under the curve, we will be computing the average precision, which is a very important value we need to compute before computing the mean average precision. And please do the math yourself, and if I'm mistaking, please let me know in the comments below. But if I'm not mistaking, this is the value I have computed for this curve we have over here. And a very quick note is that as we were using an intersection over union threshold of 0.50 this is sometimes referred to as average precision at 50 right? If you search for the literature or if you search for other blogs or youtube videos and so on, if you search for other places in which they talk about the, the average precision or the mean average precision you will find that sometimes this value is referred to as average precision at 50. And we can also compute other values. For example, if we were using an intersection over union threshold of, for example, 0.90 then this will be the average precision at 90 right. This is a very quick note. But for now let's just continue. So we have computed the average precision. And from here, if we want to compute the mean precision, the only thing we need to do is a very, very, very simple calculation because in our case, we are working with only one class right, we are detecting apples and we are working with only one class which is apple. But in the most generic case, you will be computing the average precision for many many many different classes, right? So in the most generic case, the mean average precision will look something like this, right? You will have many different average position values for each one of your classes. And then in order to compute the mean average position, the only thing you need to do is to sum everything together and to take the average right, that's exactly how you can compute the mean average precision And remember in our case, we are always working with an intersection over union threshold of 0.50 as we are using only one value for the intersection over union threshold. This is exactly how the mean average position l ooks like in our case. Now, let's continue. This is your homework. This is your homework from this tutorial. Now tell me which model performs better. We have two models and for each one of these models, we have the intersection over union and we also have the mean average precision, right? The intersection over union of the model A is 0.70 and the mean average precision is 0.80. And for the model B, the intersection over union is 0.55 and the mean average precision is 0.72. Now, your homework from this video is to tell me which model performs better. So let me know in the comments below if you find the answer to this question and I will be super happy to read your answer in the comments below. And if you don't know which model performs better, then also let me know in the comments below. And I will be super happy to help you or maybe another member in our community will be super happy to help you as well. But this is going to be all for this section, for the fundamentals. And now let's move to the more advanced section. This is where we are going to work with imperfect data, right? This is where we are going to have a dataset which is going to comply with one of the following sentences. Maybe we don't have enough samples. Maybe we are working with a very small dataset. Maybe we have an unbalanced dataset, which means we have many different classes and we don't have the same amount of objects for all of our classes. Or maybe most importantly, we have errors in our annotations, right? If you have one or more than one of these issues in our dataset, this is where it's super super super important to remember we are comparing our detections against the ground truth, right? All of the metrics we have mentioned so far. The only thing we're doing is comparing the detections against the ground truth. This is where it's super super super important to remember the warning we got when we were starting this lesson. So if we are in this situation, I want you to take your performance matrix with a grain of salt which means compute everything you want to compute, compute the intersection of reunion. Please compute the mean average precision, all your losses, compute every single metric you want. But please take all of your metrics with a grain of salt. And a very good example of this situation is one of my previous tutorials where I showed you how to train a semantic segmentation model using yolov8. This previous tutorial was not really about object detection, but this was about semantic segmentation, but I think it's a very good example nevertheless. in this previous tutorial, we had a ground truth, we have a dataset which had many, many, many, many many different errors. And in this previous tutorial, we noticed that the detections we got with the model we trained, were even better than the data we used to train the model, were even better than the ground truth. So this is a very, very good example of what happens in a situation like this, right? This is a very good example of a situation in which we have many issues in our data. And we have to be super, super, super cautious in the way we interpret, in the way we read in the way we make sense of the object detection metrics now, I'm not going to show you the entire previous tutorial on semantic segmentation using yolov8. But let's just remember a few minutes from this previous tutorial, those few minutes where we noticed, we had an issue with our data and we noticed that the predictions we got with our model were even better than the data we used to train this model. Let's remember these few minutes. in order to continue with this process with this validation is that we are going to take a look at what happens with our predictions how is this model performing with some data with some predictions and for this we are going to take a look what happens with all of these images right you can see that these are some batches and these are some some of our labels some of our annotations for all of these images and then these are some of the predictions for these images right so we are going to take a look what happens here and for example I'm going to show you these results, the first image, and you can see that looking at this image which again these are not our predictions but this is our data these are our annotations these are our labels you can see that there are many many many missing annotations for example in this image we only have one mask we only have the mask for one or four ducks we have one two three four five dogs but only one of them is annotated we have a similar behavior here only one of the ducks is annotated here is something similar only one of them is annotated and the same happens for absolutely every single one of these images so there are a lot of missing annotations in this data we are currently looking at and if I look at the predictions now these are the same images but these are our predictions we can see that nevertheless we had a lot of missing annotations the predictions don't really look that bad right for example in this case we are detecting One Two Three of the five Ducks we so we have an even better prediction that we have over here I would say it's not a perfect detection but I would say it's very good right it's like it's not 100% accurate but it's like very good and I would say it's definitely better than the data we used to train this model so that's what happens with the first image and if I take a look at the other images I can see a similar Behavior right this is the data we used for training this algorithm and these are the predictions we got for these images and so on right it seems It's like exactly the same behavior exactly the same situation for this image as well so my conclusions by looking at these images by looking at these predictions is that the model is not perfect but I would say performs very well especially considering that the data we are using to train this model seems to be not perfect seems to have a lot a lot of missing detections have a lot of missing elements right a lot of missing objects so that's our conclusion that's my conclusion by looking at these results and that's another reason for which I don't recommend you to go crazy analyzing these plots because when you are analyzing these plots remember the only thing you're doing is that you are comparing your data the data you are using in order to train this model with your predictions right the only thing you're doing, you're comparing your data with your predictions with the predictions you had with the model right so as the only thing you are doing is a comparison between these two things then if you have many missing annotations or many missing objects or if you have many different errors in your data in the data you're using to train the algorithm then this comparison it's a little meaningless right it doesn't really make a lot of sense because if you're just comparing one thing against the other but the thing you are comparing with has a lot of Errors it has a lot of missing objects and so on maybe the comparison doesn't make any a lot of sense whatsoever right that's why I also recommend you to not go crazy when you are analyzing these plots because they are going to give you a lot of information but you are going to have even more information when you are analyzing all of these results and this is a very very very good example of what happens in real life when you are training a model in a real project because remember that building an entire dataset, a dataset which is 100% clean and absolutely 100% perfect is very very very expensive so this is a very good example of what happens in real life usually the data you're using to train the model, to train the algorithm has a few errors and sometimes there are many many many errors so this is a very good example of how this validation process looks like with data which is very similar to the data we have in real life which in most cases is not perfect And obviously you are more than welcome to watch the entire tutorial after you complete this course. For now, let's just move to the next video where I'm going to show you how to train an object detector on your own custom data. hey my name is Felipe and welcome to my channel in this video we are going to train an object detector using yolo V8 and I'm going to walk you step by step through the entire process from how to collect the data you need in order to train an object detector how to annotate the data using a computer vision annotation tool how to structure the data into the exact format you need in order to use yolo V8, how to do the training and I'm going to show you two different ways to do it; from your local environment and also from a Google collab and how to test the performance ofthea model you trained so this is going to be a super comprehensive step-by-step guide of everything you need to know in order to train an object detector using yolo v8 on your own custom data set so let's get started so let's start with this tutorial let's start with this process and the first thing we need to do is to collect data the data collection is the first step in this process remember that if you want to train an object detector or any type of machine learning model you definitely need data, the algorithm, the specific algorithm you're going to use in this case yolo V8 is very very important but the data is as important as the algorithm if you don't have data you cannot train any machine learning model that's very important so let me show you the data I am going to use in this process these are some images I have downloaded and which I'm going to use in order to train this object detector and let me show you a few of them these are some images of alpacas this is an alpaca data set I have downloaded for today's tutorial and you can see these are all images containing alpacas in different postures and in different situations right so this is exactly the data I am going to use in this process but obviously you could use whatever data set you want you could use exactly the same data set I am going to use or you can just collect the data yourself you could just take your cell phone or your camera or whatever and you can just take the pictures the photos the images you are going to use you can just do your own data collection or something else you could do is to just use a a publicly available data set so let me show you this data set this is the open image dataset version 7 and this is a dataset which is publicly available and you can definitely use it in order to work on today's tutorial in order to train the object detector we are going to train on todays tutorial so let me show you how it looks like if I go to explore and I select detection uh you can see that I'm going to unselect all these options you can see that this is a huge data set containing many many many many many many many many categories I don't know how many but they are many this is a huge data set it contains millions of images, hundreds of thousands if not millions of annotations thousands of categories this is a super super huge data set and you can see that you have many many different categories now we are looking at trumpet and you can see these are different images with trumpets and from each one of these images we have a bounding box around the trumpet and if I show you another one for example we also have Beetle and in this category you can see we have many different images from many different type of beetles so this is another example or if I show you this one which is bottle and we have many different images containing bottles for example there you can see many different type of bottles and in all cases we have a bounding box around the bottle and I could show you I don't know how many examples because there are many many many different categories so remember the first step in this process is the data collection this is the data I am going to to use in this project which is a dataset of alpacas and you can use the exact same data I am using if you want to you can use the same data set of alpacas or you can just collect your own data set by using your cell phone your camera or something like that or you can also download the images from a publicly available dataset for example the open images dataset version 7. if you decide to use open images dataset version 7 let me show you another category which is alpaca this is exactly from where I have downloaded all of the images of alpacas so if in case you decide to use this publicly available data set I can provide you with a couple of scripts I have used in order to download all this data in order to parse through all the different annotations and to format this data in the exact format we need in order to work on today's tutorial so in case you decide to use open image data set I am going to give you a couple of scripts which are going to be super super useful for you so that's that's all I can say about the data collection remember you need to collect data if you want to train an object detector and you have all those different ways to do it and all these different categories and all these different options so now let's move on to the next step and now let's continue with the data annotation you have collected a lot of images as I have over here you have a lot of images which you have collected yourself or maybe you have downloaded this data from a publicly available data set and now it's the time to annotate this data set maybe you were lucky enough when you were creating the dataset and maybe this data set you are using is already annotated maybe you already have all the bounding boxes from all of your objects from all your categories maybe that's the case so you don't really need to annotate your data but in any other case for example if you were using a custom data set, a dataset you have collected yourself with your own cell phone your camera and so on something you have collected in that case you definitely need to annotate your data so in order to make this process more comprehensive in order to show you like the entire process let me show you as well how to annotate data so we are going to use this tool which is CVAT this is a labeling tool I have used it many many times in many projects I would say it's one of my favorite tools I have used pretty much absolutely all the object detection computer vision related annotation tools I have used maybe I haven't used them all but I have used many many of them and if you are familiar with annotation tools you would know that there are many many of them and none of them is perfect I will say all of the different annotation tools have their advantages and their disadvantages and for some situations you prefer to use one of them and for other situations it's better to use another one CVAT has many advantages and it also has a few disadvantages I'm not saying it's perfect but nevertheless this is a tool I have used in many projects and I really really like it so let me show you how to use it you have to go to cvat.ai and then you select try for free there are different pricing options but if you are going to work on your own or or in a very small team you can definitely use the free version so I have already logged in this is already logged into my account but if you don't have an account then you will have to create a new one so you you're going to see like a sign up page and you can just create a new account and then you can just logged in into that account so once you are logged into this annotation tool you need to go to projects and then create a new one I'm going to create a project which is called alpaca detector because this is the project I am going to be working in and I'm going to add a label which in my case is going to be only one label which is alpaca and then that's pretty much all submit and open I have created the project it has one label which is alpaca remember if your project has many many different labels add all the labels you need, and then I will go here which is create a new task I am going to create a new annotation task and I'm going to call this task something like alpaca detector annotation task zero zero one this is from the project alpaca detector and this will take all the labels from that project now you need to upload all the images you are going to annotate so in my case I'm obviously not going to annotate all the images because you can see these are too many images and it doesn't make any sense to annotate all these images in this video These are 452 images so I'm not going to annotate them all but I'm going to select a few in order to show you how exactly this annotation tool works and how exactly you can use it in your project also in my case as I have already as I have downloaded these images from a publicly available data set from the open images dataset version 7 I already have the annotations I already have all the bounding boxes so in my case I don't really need to annotate this data because I already have the annotations but I'm going to pretend I don't so I can just label a few images and I can show you how it works so now I go back here and I'm just going to select something like this many images right yeah I'm just going to select this many images I'm going to open these images and then I'm going to click on submit and open right so this is going to create this task and at the same time it's going to open this task so we can start working on our annotation process okay so this is the task I have just created I'm going to click here in job number and this and the job number and this will open all the images and now I'm going to start annotating all these images so we are working on an object detection problem so we are going to annotate bounding boxes we need to go here and for example if we will be detecting many different categories we would select what is the category we are going to label now and and that's it in my case I'm going to label always the same category which is alpaca so I don't really need to do anything here so I'm going to select shape and let me show you how I do it I'm going to click in the upper left corner and then in the bottom right corner so the idea is to enclose the object and only the object right the idea is to draw a bonding box around the object you only want to enclose this object and you can see that we have other animals in the back right we have other alpacas so I'm just going to label them too and there is a shortcut which is pressing the letter N and you can just create a new bounding box so that's another one this is another one this is another alpaca and this is the last one okay that's pretty much all so once you're ready you can just press Ctrl s that's going to save the annotations I recommend you to press Ctrl S as often as possible because it's always a good practice so now everything is saved I can just continue to the next image now we are going to annotate this alpaca and I'm going to do exactly the same process I can start here obviously you can just start in whatever corner you want and I'm going to do something like this okay this image is completely annotated I'm going to continue to the next image in this case I am going to annotate this alpaca too. this is not a real alpaca but I want my object detector to be able to detect these type of objects too so I'm going to annotate it as well this is going to be a very good exercise because if you want to work as a machine learning engineer or as a computer visual engineer annotating data is something you have to do very often, actually training machine learning models is something you have to do very often so usually the data annotation is done by other people, right, it is done by annotator s there are different services you can hire in order to annotate data but in whatever case whatever service you use it's always a very good practice to annotate some of the images yourself right because if you annotate some of the images yourself you are going to be more familiar with the data and you're also going to be more familiar on how to instruct the annotators on how to annotate this particular data for example in this case it's not really challenging we just have to annotate these two objects but let me show you there will be other cases because there will be always situations which are a little confusing in this case it's not confusing either I have just to I have to label that object but for example a few images ago when we were annotating this image if an annotator is working on this image that person is going to ask you what do I do here should I annotate this image or not right if an annotator is working on this image and the instructions you provide are not clear enough the person is going to ask you hey what do I do here should I annotate this image or not is this an alpaca or not so for example that situation, another situation will be what happened here which we had many different alpacas in the background and some of them for example this one is a little occluded so there could be an annotator someone who ask you hey do you want me to annotate absolutely every single alpaca or maybe I can just draw a huge bonding box here in the background and just say everything in the background is an alpaca it's something that when an annotator is working on the images they are going to have many many different questions regarding how to annotate the data and they are all perfect questions and very good questions because this is exactly what's about I mean when you are annotating data you are defining exactly what are the objects you are going to detect right so um what I'm going is that if you annotate some of the images yourself you are going to be more familiar on what are all the different situations and what exactly is going on with your data so you are more clear in exactly what are the objects you want to detect right so let's continue this is only to show a few examples this is another situation in my case I want to say that both of them are alpacas so I'm just going to say something like this but there could be another person who says no this is only one annotation is something like this right I'm just going to draw one bonding box enclosing both of them something that and it will be a good criteria I mean it will be a criteria which I guess it would be fine but uh whatever your criteria would be you need one right you need a criteria so while you are annotating some of the images is that you are going to further understand what exactly is an alpaca what exactly is the object you want to consider as alpaca so I'm just going to continue this is another case which may not be clear but I'm just going to say this is an alpaca this black one which we can only see this part and we don't really see the head but I'm going to say it's an alpaca anyway this one too this one too this one too also this is something that always happens to me when I am working when I am annotating images that I am more aware of all the diversity of all these images for example this is a perfect perfect example because we have an alpaca which is being reflected on a mirror and it's only like a very small section of the alpaca it's only like a very small uh piece of the alpacas face so what do we do here I am going to annotate this one too because yeah that's my criteria but another person could say no this is not the object I want to detect this is only the object I want to detect and maybe another person would say no this is not an alpaca alpacas don't really apply makeup on them this is not real so I'm not going to annotate this image you get the idea right there could be many different situations and the only way you get familiar with all the different type of situations is if you annotate some of the images yourself so now let's continue in my case I'm going to do something like this because yeah I would say the most important object is this one and then other ones are like... yeah it's not really that important if we detect them or not okay so let's continue this is very similar to another image I don't know how many I have selected but I think we have only a few left I don't know if this type of animals are natural... I'm very surprised about this like the head right it's like it has a lot of hair over here and then it's completely hairless the entire body I mean I don't know I'm surprised maybe they are made like that or maybe it's like a natural alpaca who cares who cares... let's continue so we have let's see how many we have only a few left so let's continue uh let's see if we find any other strange situation which we have to Define if that's an alpaca or not so I can show you an additional example also when you are annotating you could Define your bounding box in many many different ways for example in this case we could Define it like this we could Define it like this I mean we could Define it super super fit to the object something like this super super fit and we could enclose exactly the object or we could be a little more relaxed right for example something like this would be okay too and if we want to do it like this it will be okay too right you don't have to be super super super accurate you could be like a little more relaxed and it's going to work anyway uh now in this last one and that's pretty much all and this is the last one okay I'm going to do something like this now I'm going to take this I think this is also alpaca but anyway I'm just going to annotate this part so that's pretty much all, I'm going to save and those are the few images I have selected in order to show you how to use this annotation tool so that's pretty much all for the data annotation and remember this is also a very important step this is a very important task in this process because if we want to train an object detector we need data and we need annotated data so this is a very very important part in this process remember this tools cvat this is only one of the many many many available image annotation tools, you can definitely use another one if you want it's perfectly fine it's not like you have to use this one, at all, you can use whatever annotation tool you want but this is a tool I think it's very easy to use I like the fact it's very easy to use it's also a web application so you don't really need to download anything to your computer you can just go ahead and use it from the web that's also one of its advantages so yeah so this is a tool I showed you in this video how to use in order to train this object detector so this is going to be all for this step and now let's continue with the next part in this process and now that we have collected and annotated all of our data now it comes the time to format this data to structure this data into the format we need in order to train an object detector using yolo V8 when you're working in machine learning and you're training a machine learning model every single algorithm you work with it's going to have its own requirements on how to input the data that's going to happen with absolutely every single algorithm you will work with it's going to happen with yolo with all the different YOLO versions and it's going to happen with absolutely every single algorithm you are working with so especially yolov8 needs the data in a very specific format so I created this step in this process so we can just take all the data we have generated all the images and all the annotations and we can convert all these images into the format we need in order to input this data into yolo V8 so let me show you exactly how we are going to do that if you have annotated data using cvat you have to go to tasks and then you have to select this option and it's export task data set it's going to ask you the export format so you can export this data into many different formats and you're going to choose you're going to scroll all the way down and you're going to choose YOLO 1.1 right then you can also save the images but in this case it's not really needed we don't really need the images we already have the images and you're just going to click ok now if you wait a few seconds or a few minutes if you have a very large data set you are going to download a file like this and if I open this file you are going to see all these different files right you can see we have four different files so actually three files and a directory and if I open the directory this is what you are going to see which is many many different file names and if I go back to the images directory you will see that all these images file names they all look pretty much the same right you can see that the file name the structure for this file name looks pretty much the same as the one with as the ones we have just downloaded from cvat so basically the way it works is that when you are downloading this data into this format into the YOLO format every single annotation file is going to be downloaded with the same name as the image you have annotated but with a different extension so if you have an image which was called something.jpg then The annotation file for that specific image will be something.txt right so that's the way it works and if I open this image you are going to see something like this you're going to see in this case only one row but let me show you another one which contains more than one annotation I remember there were many for example this one which contains two different rows and each one of these rows is a different object in my case as I only have alpacas in this data set each one of these rows is a different alpaca and this is how you can make sense of this information the first character is the class, the class you are detecting I wanted to enlarge the entire file and I don't know what I'm doing there okay okay the first number is the class you are detecting in in my case I only have one so it's only a zero because it's my only class and then these four numbers which Define the bounding box right this is encoded in the YOLO format which means that the first two numbers are the position of the center of the bounding box then you have the width of your bounding box and then the height of your bounding box, you will notice these are all float numbers and this basically means that it's relative to the entire size of the image so these are the annotations we have downloaded and this is in the exact same format we need in order to train this object detector so remember when I was downloading these annotations we noticed there were many many many different options all of these different options are different formats in which we could save the annotations and this is very important because you definitely need to download YOLO because we are going to work with yolo and everything it's pretty much ready as we need it in order to input into yolo V8 right if you select YOLO that's exactly the same format you need in order to continue with the next steps and if you have your data into a different format maybe if you have already collected and annotate your data and you have your data in whatever other format please remember you will need to convert these images or actually to convert these annotations into the YOLO format now this is one of the things we need for the data this is one of the things we need in order to we need to format in order to structure the data in a way we can use this object detector with yolo V8 but another thing we should do is to create very specific directories containing this data right we are going to need two directories one of them should be called images and the other one should be called labels you definitely need to input these names you cannot choose whatever name you want you need to choose these two names right the images should be located in an directory called images and the labels should be located in a directory called labels that's the way yolo V8 works so you need to create these two directories within your image directory is where you are going to have your images if I click here you can see that these are all my images they are all within the images directory they are all within the train directory which is within the images directory this directry is not absolutely needed right you could perfectly take all your images all these images and you could just paste all your images here right in the images directory and everything will be just fine but if you want you could do something exactly as I did over here and you could have an additional directory which is in between images and your images and you can call this whatever way you want this is a very good strategy in case you want to have for example a train directory containing all the training images and then another directory which could be called validation for example and this is where you are going to have many images in order to validate your process your training process your algorithm and you could do the same with an additional directory which could be called test for example or you can just use these directories in order to label the data right to create different versions of your data which is another thing which is very commonly done so you could create many directories for many different purposes and that will be perfectly fine but you could also just paste all the images here and that's also perfectly fine and you can see that for the labels directory I did exactly the same we have a directory which is called train and within this directory is that we have all these different files and for each one of these files let me show you like this it's going to be much better for each one of these files for each one of these txt files we will have an image in the images directory which is called exactly the same exactly the same file name but a different extension right so in this case this one is called .txt and this one is called .jpg but you can see that it's exactly exactly the same file name for example the first image is called oa2ea8f and so on and that's exactly the same name as for the first image in the images directory which is called oa2ea8f and so on so basically for absolutely every image in your images directory you need to have an annotations file and a file in the labels directory which is called exactly the same exactly the same but with a different extension if your images are .jpg your annotations files are .txt so that's another thing which also defines the structure you'll need for your data and that's pretty much all so remember you need to have two directories one of them is called images, the other one is called labels within the images directories is where you're going to have all your images and within your labels directories is where you will have all your annotations, all your labels and for absolutely every single image in your images directory you will need to have a file in the labels directory which is called exactly the same but with a different extension if your images are .jpg your annotation files should be .txt and the labels should be expressed in the yolo format which is as many rows as objects in that image and every single one of these rows should have the same structure you are going to have five terms the first one of them is the class ID in my case I only have one class ID I'm only detecting alpacas so in my case this number will always be zero but if you're detecting more than one class then you will have different numbers then you have the position the X and Y position of the center of the bounding box and then you will have the width and then you will have the height and everything will be expressed in relative coordinates so basically this is the structure you need for your data and this is what this step is about so that's pretty much all about converting the data or about formatting the data and now let's move on to the training now it's where we are going to take all this data and we are going to train our object detector using yolo V8 so now that we have taken the data into the format we need in order to train yolo v8 now comes the time for the training now it comes the time where we are going to take this custom data set and we are going to train an object detector using yolo V8 so this is yolo V8 official repository one of the things I like the most about YOLO V8 is that in order to train an object detector we can do it either with python with only a few python instructions or we can also use a command line utility let me see if I find it over here we can also execute a command like this in our terminal something that looks like this and that's pretty much all we need to do in order to train this object detector that's something I really really liked that's something I'm definitely going to use in our projects from now on because I think it's a very very convenient and a very easy way to train an object detector or a machine learning model so this is the first thing we should notice about yolo V8 there are two different ways in which we can train an object detector we can either do it in python as we usually do or we can run a command in our terminal I'm going to show you both ways so you're familiar with both ways and also I mentioned that I am going to show you the entire process on a local environment in a python project and I'm also going to show you this process in a google colab so I I know there are people who prefer to work in a local environment I am one of those people and I know that there are other people who prefer to work on a Google colab so depending on in which group are you I am going to show you both ways to do it so you can just choose the one you like the most so let's start with it and now let's go to pycharm this is a pycharm project I created for this training and this is the file we are going to edit in order to train the object detector so the first thing I'm going to do is to just copy a few lines I'm just going to copy everything and I'm going to remove everything we don't need copy and paste so we want to build a new model from scratch so we are going to keep this sentence and then we are going to train a model so we are just going to remove everything but the first sentence and that's all right these are the two lines we need in order to train an object detector using yolo V8 now we are going to do some adjustments, obviously the first thing we need to do is to import ultralytics which is a library we need to use in order to import yolo, in order to train a yolo V8 model and this is a python Library we need to install as we usually do we go to our terminal and we do something like pip install and the library name in my case nothing is going to happen because I have already installed this library but please remember to install it and also please mind that when you are installing this Library this library has many many dependencies so you are going to install many many many many different python packages so it's going to take a lot of space so definitely please be ready for that because you need a lot of available space in order to install this library and it's also going to take some time because you are installing many many many different packages but anyway let's continue please remember to install this library and these are the two sentences we need in order to run this training from a python script so this sentence we're just going to leave it as it is this is where we are loading the specific yolo V8 architecture the specific yolo V8 model we are going to use you can see that we can choose from any of all of these different models these are different versions or these are different sizes for yolo V8 you can see we have Nano small medium large or extra large we are using the Nano version which is the smallest one or is the lightest one, so this is the one we are going to use, the yolo V8 Nano, the yolo V8 n then about the training about this other sentence we need to edit this file right we need a yaml file which is going to contain all the configuration for our training so I have created this file and I have named this file config.yaml I'm not sure if this is the most appropriate name but anyway this is the name I have chosen for this file so what I'm going to do is just edit this parameter and I'm going to input config.yaml this is where the config.yaml is located this is where the main.pi is located, they are in the same directory so if I do this it's going to work just fine and then let me show you the structure for this config.yaml you can see that this is a very very very simple configuration file we only have a few Keys which are PATH train val and then names right let's start with the names let's start with this this is where you are going to set all your different classes right you are training an object detector you are detecting many different categories many different classes and this is where you are going to input is where you're going to type all of those different classes in my case I'm just detecting alpacas that's the only class I am detecting so I only have one class, is the number zero and it's called alpaca but if you are detecting additional objects please remember to include all the list of all the objects you are detecting, then about these three parameters these three arguments the path is the absolute path to your directory containing images and annotations and please remember to include the absolute path. I ran some issues when I was trying to specify a relative path relative from this directory from my current directory where this project is created to the directory where my data is located when I was using a relative path I had some issues and then I noticed that there were other people having issues as well I noticed that in the GitHub repository from YOLO V8 I noticed this is in the the issues section there were other people having issues when they were specifying a relative path so the way I fixed it and it's a very good way to fix it it's a very easy way to fix it it's just specifying an absolute path remember this should be an absolute path so this is the path to this directory to the directory contain the images and the labels directories so this is this is the path you need to specify here and then you have to specify the relative path from this location to where your images are located like the specific images are located right in my case they are in images/train relative to this path if I show you this location which is my root directory then if I go to images/train this is where my images are located right so that's exactly what I need to specify and then you can see that this is the train data this is the data the algorithm is going to use as training data and then we have another keyword which is val right the validation dataset in this case we are going to specify the same data as we used for training and the reason I'm doing this is because we want to keep things simple in this tutorial I'm just going to show you the entire process of how to train an object detector using yolo V8 on a custom data set I want to keep things simple so I'm just going to use the same data so that's pretty much all for this configuration file now going back to main that's pretty much all we need in order to train an object detector using yolo V8 from python that's how simple it is so now I'm going to execute this file I'm going to change the number of epochs I'm going to do this for only one Epoch because the only thing I'm going to show you for now is how it is executed, I'm going to show you the entire process and once we notice how everything is working once we know everything is up and running everything is working fine we can just continue but let's just do this process let's just do this training for only one Epoch so we can continue you can see that now it's loading the data it has already loaded the data you can make use of all the different information of this debugging information we can see here you can see now we were loading 452 images and we were able to load all the images right 452 from 452 and if I scroll down you can see that we have additional information additional values which are related to the training process this is how the training process is going right we are training this object detector and this additional information which we are given through this process so for now the only thing we have to do is only waiting we have to wait until this process is completed so I am going to stop this video now and I'm going to fast forward this video until the end of this training and let's see what happens okay so the training is now completed and you can see that we have an output which says results saved to runs/detect/train39 so if I go to that directory runs/detect and train39 you can see that we have many many different files and these files are related to how the training process was done right for example if I show you these images these are a few batches of images which were used in order to train this algorithm you can see the name is train batch0 and train batch1 I think we have a train batch2 so we have a lot of different images of a lot of different alpacas of different images we used for training and they were all put together they were all concatenated into these huge images so we can see exactly the images which were used for training and The annotation on top of them right the bonding boxes on top of them and we also have similar images but for the validation dataset right remember in this case we are using the same data as validation as we use for training so it's exactly the same data it's not different data but these were the labels in the validation data set which is the training data set and these were the predictions on the same images right you can see that we are not detecting anything we don't have absolutely any prediction we don't have absolutely any bounding box this is because we are doing a very shallow training we are doing a very dummy training we are training this algorithm only for one epoch this was only an example to show you the output how it looks like to show you the entire process but it is not a real training but nevertheless these are some files I'm going to show you better when we are in the next step for now let me show you how the training is done from the command line from the terminal using the command I showed you over here using a command like this and also let me show you how this training is done on a Google colab so going to the terminal if we type something like this yolo detect train data I have to specify the configuration file which is config.yaml and then model yolov8n.yaml and then the number of epochs this it's exactly the same as we did here exactly the same is going to produce exactly the same output I'm just going to change the number of epochs for one so we make it exactly the same and let's see what happens you can see that it we have exactly the same output we have loaded all the images and now we are starting a new training process and after this training process we are going to have a new run which we have already created the new directory which is train40 and this is where we are going to save all the information related to this training process so I'm not going to do it because it's going to be exactly the same as as the one we did before but this is exactly how you should use the command line or how you can use this utility in order to do this training from the terminal you can see how simple it is it's amazing how simple it is it's just amazing and now let me show you how everything is done from a Google colab so now let's go back to the browser so I can show you this notebook I created in order to train yolo V8 from a Google colab if you're not familiar with Google collab the way you can create a new notebook is going to Google Drive you can click new more and you select the option Google collaboratory this is going to create a new google colab notebook and you can just use that notebook to train this object detector now let me show you this notebook and you can see that it contains only one two three four five cells this is how simple this will be the first thing you need to do is to upload the data you are going to use in order to train this detector it's going to be exactly the same data as we used before so these are exactly the same directories the images directory and the label directory we used before and then the first thing we need to do is to execute this cell which mounts Google Drive into this instance of google collab so the only thing I'm doing is just I just pressed enter into this cell and this may take some time but it's basically the only thing it does is to connect to Google Drive so we can just access the data we have in Google Drive so I'm going to select my account and then allow and that's pretty much all then it all comes to where you have the data in your Google drive right in the specific directory where you have uploaded the data in my case my data is located in this path right this is my home in Google Drive and then this is the relative path to the location of where I have the data and where I have all the files related to this project so remember to specify this root directory as the directory where you have uploaded your data and that's pretty much all and then I'm just going to execute this cell so I save this variable I'm going to execute this other cell which is pip install ultralytics the same command I ran from the terminal in my local environment now I'm going to run it in Google collab remember you have to start this command by the exclamation mark which means you are running a command in the terminal where this process is being executed or where this notebook is being launched so remember to include the exclamation mark everything seems to be okay everything seems to be ready and now we can continue to the next cell which is this one you can see that we have done exactly the same structure we have input exactly the same lines as in our local environment if I show you this again you can see we have imported ultralytics then we have defined this yolo object and then we have called model.train and this is exactly the same as we are doing here obviously we are going to need another yaml file we are going to need a yaml file in our Google Drive and this is the file I have specified which it's like exactly the same configuration as in the um as in the in the yaml file I showed you in my local environment is exactly the same idea so this is exactly what you should do now you should specify an absolute path to your Google Drive directory that's the only difference so that's the only difference and I see I have a very small mistake because I see I have data here and here I have just uploaded images and labels in the directory but they are not within another rectory which is called Data so let me do something I'm going to create a new directory which is called Data images labels I'm just going to put everything here right so everything is consistent so now everything is okay images then train and then the images are within this directory so everything is okay now let's go back to the Google collab every time you make an edit or every time you do something on Google Drive it's always a good idea to restart your runtime so that's what I'm going to do I'm going to execute the commands again I don't really need to pip install this Library again because it's already installed into this environment and then I'm going to execute this file I think I have to do an additional edit which is uh this file now it's called google_colab_config.yaml and that's pretty much all I'm just going to run it for one Epoch so everything is exactly the same as we did in our local environment and now let's see what happens so you can see that we are doing exactly the same process everything looks pretty much the same as it did before we are loading the data we are just loading the models everything it's going fine and this is going to be pretty much the same process as before you can see that now it takes some additional time to load the data because now you have... you are running this environment you're running this notebook in a given environment and you're taking the data from your Google Drive so it takes some time it's it's a slower process but it's definitely the same idea so the only thing we need to do now is just to wait until all this uh process to be completed and that's pretty much all I think it doesn't really make any sense to wait because it's like it's going to be exactly the same process we did from our local environment at the end of this execution we are going to have all the results in a given directory which is the directory of the notebook which is running this process so at the end of this process please remember to execute this command which is going to take all the files you have defined in this runs directory which contains all the runs you have made all the results you have produced and it's going to take all this directory into the directory you have chosen for your files and your data and your google collab and so on please remember to do this because otherwise you would not be able to access this data and this data which contains all the results and everything you have just trained so this is how you can train an object detector using yolo V8 in a Google collab and you can see that the process is very straightforward and it's pretty much exactly the same process exactly the same idea as we did you in our local environment and that's it so that's how easy it is to train an object detector using yolo Y8 once you have done everything we did with the data right once you have collected the data you have annotated data you have taken everything into the format yolo V8 needs in order to train an object detector once everything is completed then running this process running this training is super straightforward so that's going to be all about this training process and now let's continue with the testing now let's see how these models we have trained how they performed right let's move to the next step and this is the last step in this process this is where we are going to take the model we produced in the training step and we're going to test how it performs this is the last step in this process this is how we are going to complete this training of an object detector using yolo v8, so once we have trained a model we go to the uh to this directory remember to the directory I showed you before regarding... the directory where all the information was saved where all the information regarding this training process was saved and obviously I I'm not going to show you the training we just did because it was like a very shallow training like a very dummy training but instead I'm going to show you the results from another training I did when I Was preparing this video where I conducted exactly the same process but the training process was done for 100 epochs so it was like a more deeper training right so let me show you all the files we have produced so you know what are all the different tools you have in order to test the performance of the model you have trained so basically you have a confusion Matrix which is going to give you a lot of information regarding how the different classes are predicted or how all the different classes are confused right if you are familiar with how a confusion Matrix looks like or it should look like then you will know how to read this information basically this is going to give you information regarding how all the different classes were confused in my case I only have one class which is alpaca but you can see that this generates another category which is like uh the default category which is background and we have some information here it doesn't really say much it says how these classes are confused but given that this is an object detector I think the most valuable information it's in other metrics in other outputs so we are not really going to mind this confusion Matrix then you have some plots some curves for example this is the F1 confidence curve we are not going to mind this plot either remember we are just starting to train an object detector using yolo V8 the idea for this tutorial is to make it like a very introductory training a very introductory process so we are not going to mind in all these different uh plots we have over here because it involves a lot of knowledge and a lot of expertise to extract all the information from these plots and it's not really the idea for this tutorial let's do things differently let's focus on this plot which is also available in the results which were saved into this directory and you can see that we have many many many different plots you can definitely go crazy analyzing all the information you have here because you have one two three four five ten different plots you could knock yourself out analyzing and just extracting all the information from all these different plots but again the idea is to make it a very introductory video and a very introductory tutorial so long story short I'm just going to give you one tip of something the one thing you should focus on these plots for now if you're going to take something from this video from how to test the performance of a model you have just trained using yolo v8 to train an object detector is this make sure your loss is going down right you have many plots some of them are related to the loss function which are this one this one and this one this is for the training set and these are related to the validation set make sure all of your losses are going down right this is like a very I would say a very simple way to analyze these functions or to analyze these plots but that's... I will say that that's more powerful that it would appear make sure all your losses are going down because given the loss function we could have many different situations we could have a loss function which is going down which I would say it's a very good situation we could have a loss function which started to go down and then just it looks something like a flat line and if we are in something that looks like a flat line it means that our training process has stuck so it could be a good thing because maybe the the algorithm the machine learning model really learned everything he had to learn about this data so maybe a flat line is not really a bad thing maybe I don't know you you would have to analyze other stuff or if you look at your loss function you could also have a situation where your loss function is going up right that's the other situation and if you my friend have a loss function which is going up then you have a huge problem then something is obviously not right with your training and that's why I'm saying that analyzing your loss function what happens with your loss is going to give you a lot of information ideally it should go down if it's going down then everything is going well most likely, if its something like a flatline well it could be a good thing or a bad thing I don't know we could be in different situations but if it's going up you have done something super super wrong I don't know what's going on in your code I don't know what's going on in your training process but something is obviously wrong right so that's like a very simple and a very naive way to analyze all this information but trust me that's going to give you a lot a lot of information you know or to start working on this testing the performance of this model but I would say that looking at the plots and analyzing all this information and so on I would say that's more about research, that's what people who do research like to do and I'm more like a freelancer I don't really do research so I'm going to show you another way to analyze this performance, the model we have just trained which from my perspective it's a more... it makes more sense to analyze it like this and it involves to see how it performs with real data right how it performs with data you have used in order to make your inferences and to see what happens so the first step in this more practical more visual evaluation of this model of how this model performs is looking at these images and remember that before when we looked at these images we had this one which was regarding the labels in the validation set and then this other one which were the predictions were completely empty now you can see that the the predictions we have produced they are not completely empty and we are detecting the position of our alpacas super super accurately we have some mistakes actually for example here we are detecting a person as an alpaca here we are detecting also a person as an alpaca and we have some missdetections for example this should be in alpaca and it's not being detected so we have some missdetections but you can see that the the results are pretty much okay right everything looks pretty much okay the same about here if we go here we are detecting pretty much everything we have a Missdetection here we have an error over here because we are detecting an alpaca where there is actually nothing so things are not perfect but everything seems to be pretty much okay that's the first way in which we are going to analyze the performance of this model which is a lot because this is like a very visual way to see how it performs we are not looking at plots we are not looking at metrics right we are looking at real examples and to see how this model performs on real data maybe I am biased to analyze things like this because I'm a freelancer and the way it usually works when you are a freelancer is that if you are building this model to deliver this project for a client and you tell your client oh yeah the model was perfect take a look at all these plots take a look at all these metrics everything was just amazing and then your client tests the model and it doesn't work the client will not care about all the pretty plots and so on right so that's why I don't really mind a lot about these plots maybe I am biased because I am a freelancer and that's how freelancing works but I prefer to do like a more visual evaluation so that's the first step we will do and we can notice already we are having a better performance we are having an okay performance but this data we are currently looking at right now remember the validation data it was pretty much the same data we use as training so this doesn't really say much I'm going to show you how it performs on data which the algorithm have never seen with completely and absolutely unseen data and this is a very good practice if you want to test the performance of a model, so I have prepared a few videos so let me show you these videos they are basically... remember this is completely unseen data and this is the first video you can see that this is an alpaca which is just being an alpaca which is just walking around it's doing its alpaca stuff it's having an alpaca everyday life it's just being an alpaca right it's walking around from one place to the other doing uh doing nothing no it's doing its alpaca stuff which is a lot this is one of the videos I have prepared this is another video which is also an alpaca doing alpaca related stuff um so this is another video we are going to see remember this is completely unseen data and I also have another video over here so I'm going to show you how the model performs on these three videos I have made a script in Python which loads these videos and just calls the predict method from yolo v8, we are loading the model we have trained and we are applying all the predictions to this model and we are seeing how it performs on these videos so this is the first video I showed you and these are the detections we are getting you can see we are getting an absolutely perfect detection remember this is completely unseen data and we are getting I'm not going to say 100 perfect detection because we're not but I would say it's pretty good I will say it's pretty pretty good in order to start working on this training process uh yeah I would say it's pretty good so this is one of the examples then let me show you another example which is this one and this is the other video I showed you and you can see that we are also detecting exactly the position of the alpaca in some cases the text is going outside of the frame because we don't really have space but everything seems to be okay in this video too so we are taking exactly the position of this uh alpaca the bonding box in some cases is not really fit to the alpaca face but yeah but everything seems to be working fine and then the other video I showed you you can see in this case the detection is a little broken we have many missdetections but now everything is much better and yeah in this case it's working better too it's working well I would say in these three examples this one it's the one that's performing better and then the other one I really like how it performed too in this case where the alpaca was like starting its alpaca Journey... we have like a very good detection and a very stable detection then it like breaks a little but nevertheless I would say it's okay it's also detecting this alpaca over here so uh I will say it's working pretty much okay so this is pretty much how we are going to do the testing in this phase remember that if you want to test the performance of the model you have just trained using yellow V8 you will have a lot of information in this directory which is created when you are yolo the model at the end of your training process you will have all of these files and you will have a lot of information to knock yourself out to go crazy analyzing all these different plots and so on or you can just keep it simple and just take a look at what happened with the training loss and the validation loss and so on all the loss functions make sure they are going down that's the very least thing you need to make sure of and then you can just see how it performs with a few images or with a few videos, take a look how it performs with unseen data and you can make decisions from there maybe you can just use the model as it is or you can just decide to train it again in this case if I analyze all this information I see that the loss functions are going down and not only they are going down but I notice that there is a lot of space to to improve this training, to improve the performance because we haven't reached that moment where everything just appears to be stuck right like that a flat line we are very far away from there so that's something I would do I would do a new deeper training so we can just continue learning about this process also I would change the validation data for something that's completely different from the training data so we have even more information and that's pretty much what I would do in order to iterate in order to make a better model and a more powerful model now let's get started with this tutorial this is detectron 2 official repository and this is exactly the framework we are going to use today I have used detectron 2 many many many times in my projects as a computer vision engineer I think it's an amazing framework, an amazing algorithm, and in this video I'm going to show you how to train an object detector using detectron 2. now the first thing I'm going to do is to show you the date we are going to use today now we're going to use the same alpaca dataset we already used in one of my previous tutorials if you watched my previous video on how to train an object detector using yolo V8 then most likely you are already familiar with this data set this is exactly the data we are going to use in this tutorial too and this is how the images look like now it's very important that in my case I already have the annotations of this data you can see all of these txt files this is my annotation these are my annotations for all my data for all my images but if you're watching this tutorial then most likely you want to know how to train detectron2 on your own custom data and most likely you want to know how to do all the annotation right you want to build this data set from scratch you want to annotate all of your images and the annotation of an object detection dataset is something I have already covered in one of my previous videos in my previous video where I showed you how to train an object detector using yolo V8 I think it doesn't really make a lot of sense to cover the entire process again in this tutorial so if you are curious to know how to annotate your custom data then go ahead and watch that other video I'm going to post a link somewhere in this video and now let's go to pycharm to a python project I created for this tutorial and these are the requirements for this project as always please remember to install these requirements before starting with this tutorial otherwise nothing is going to work so please remember to install these packages and now let me show you these three files they are called train.py util.py and loss.py let's start with train.py this is the file we are going to execute in order to do all of our training all of our training process and you can see it all starts with a very very long docstring explaining how you need to format your data this is very very very important in the util.py and loss.py we have many different functions and we also have a class definition we have many different... code... we have a lot of code which already handles the entire training process already handles the... parsing the data already handles everything so the only thing you need to do in order to make this training process to work as expected is to put the data into this format to put your file system into this format too you need to put everything as it's specified in this doc string so let me show you you can see that the annotations should be provided in yolo format this is this format class xc yc which is the X and Y position of the center of the bounding box of the annotation and then the width and the height of the bounding box now let me show you one of my annotations files let me show you how it looks like you can see for example in this case we have five numbers the first one is a zero and then we have four float numbers so this is exactly the annotation the bounding box in yolo format the first number the zero is the class ID which in my case is always going to be 0 because I only have one class in this data set and then these two numbers are the X and the Y positions of the center of the bounding box these two numbers are the center of the bounding box and then this number is the width and this this number is the height of our bounding box right so please remember to format all of your annotations into the YOLO format which looks exactly like this class ID X and Y position of the center of the bounding box and then the width and then the height and then your file system needs to be structured exactly like this let me show you in my computer in my file system if I go to data this is my root directory where my data is located you can see I have two folders one of them is called train the other one is called val within train I have two other folders one of them is called images and the other one is called anns within images is where I have all my images all my training images and within anns is where I have all my annotations for my training images and then if I go to val you can see exactly the same structure I have two folders one of them is images the other one is anns within images I have all my images and within anns I have all my annotations for the validation data now this is exactly what's described here right you can see that we have a data directory and within data directory we have two folders train and val within train we have two additional folders images and anns and the same about val we have two directory images and anns and then this is exactly what I show you in my local computer and then for absolutely every single image in this directory we have an annotation file in this other directory with exactly the same name but a different extension this is very important and please remember to structure your data your file system exactly like this otherwise nothing is going to work because all the functions which are in this file which handle all the parsing and reading the data and getting the annotations and so on all of these functions are expecting the data exactly like this so please remember to structure everything as it's described in this doc string otherwise you are going to have issues in this training process now let's continue if I scroll down you can see I have this argparser and these are all the arguments we can specify, we can define for this training process you can see that we have the data directory obviously this is very very important then we also need to define what are all the names of our classes if I go to my file system you can see that in my case I have this file which is class dot names and in my case it only contains one class name which is alpaca but if you are doing something like a multi-class object detector then most likely you are going to have other classes as well now let's go back to pycharm you can see that another argument is the output directory this is where all the models and all the results everything is going to be saved and then we have different hyper parameters for example the learning rate this is going to be the learning rate of our training process then the batch size the number of iterations of our training process then the device if we want to do this training in a CPU or in a GPU this is very very important then this argument is the checkpoint period which means how often we are going to save the weights of the model we are training right we are going to be training by this number of iterations and every a given number of iterations every 500 iterations we are going to be saving the weights of this model and this is something we are going to see later on this tutorial when we are doing the validation of the model we trained now another hyper parameter which is very very super amazingly important is model this is where we are going to specify the baseline we are going to use in order to do this training process in my case this is the baseline I have set you can see it's coco detection retina net R 101. and now let me show you something that's very very important which is where this model comes from let me show you in my browser this is detector2 model zoo and baselines this is very very very important when you are working with detectron2 you have many many many models to choose from so this model zoo is basically a collection, it is a very very large collection of all the baselines of all the pre-trained models you can choose from when working with detectron 2. if I scroll down you can see that we have many different sections for example here we have a section which is Coco object detection baselines then we have another one for instance segmentation we have another one for keypoint detection then panoptic segmentation and so on right we have many many different sections for all the different types of algorithms right you can see that we have many many many models many architectures many baselines we can choose from and basically the idea is that when we are training our own model when we are training our custom model we can just take whatever pre-trained model we want we can take whatever baseline we want and we can just train our own model on top right this is very important because you can see that we have many different metrics we have all the different performances of all of these different models we also have the inference type... the inference time of all these different models so this is this is amazing because we can just choose the model we like the most for the specific project we are working in and in my case this is the model I have selected, I have selected the retina net R 101 so this is the model we are going to be using in this tutorial but in your case please go ahead and choose whatever model you want because it's basically the same I mean the entire process I'm going to show you in this tutorial is going to work exactly the same for whatever other model you want to choose from here so this is the model zoo this detector 2 model zoo and please take a look at this zoo take a look at all the models which are available and this is just amazing you can see that this is like a very very large collection and it's just amazing so let's go back to pycharm and you can see that this is where you are going to specify the architecture... the model you are going to use in your training process now let's continue you can see that after I just parse through all of these different arguments the only thing I'm doing is calling util.train I am calling the train function which is defined in my util.py file and I am just calling this function from a very very high level right I'm just calling this function I'm putting all these arguments as input and that's it and this function is going to take care of the entire training process if you have watched my previous tutorials on yolov8: the image classifier, the object detector, the instance segmentation model, the keypoint detector... absolutely all of my models, you will remember that the training process is super super simple super straight forward the only thing we need to do with yolo V8 is to code a couple of lines and that is it, so for this video, for detectron2 I wanted to give you something that's like the same level of complexity the same level of abstraction right I wanted to give you something super super high level which you can just go ahead and use without really caring about all the different details and about everything that's working under the hood right so that's why I made this train.py file like this right you can just set up all of your arguments and then you can just call train and you can just forget about all the complexity about using detectron2 that's something I wanted to do for you because that's going to make things much much simpler for you to just train your custom... your model on your custom data and that's it and I don't know about you but my case detectron2 yolov8 or whatever other machine learning framework, algorithms, whatever you can think of, for me they are only tools I use to solve my problems right so being able to train an object detector using detectron2 by just calling a function like this from a very very high level for me is amazing I don't know about you but if you're anything like me then you're just going to be super super happy with this function and if that's the case then just jump to the next chapter where I am going to show you how to continue this training process and how to do this training from a google colab but if you do care about the details, if you do care about how everything works like under the hood, if you want to know exactly... how these functions work and exactly how the data is parsed and everything if you want to know more details then just continue watching and I'm going to give you more details. And now let's move to util.py you can see that we have four different functions and these functions are the functions that take care of the entire training process so let me start with train this is the function we are calling over here in order to start with the training, with the training process so this is a very good place to start with this util.py file you can see that we have many many different parameters many different arguments into this function and for each one of these parameters, for each one of these arguments, we have a very short description of what they mean of what this parameter is so this is very important please remember to take a look at this documentation, at this docstring when you are reviewing this file when you are reviewing this function because this is going to help you a lot to further understand what each one of these parameters and arguments mean now let's continue we can see that in the first line we are calling another function... we are calling another function in this util.py file which is register_datasets, by the way detectron2 works we always need to 'register' the datasets before starting with the training process now let me show you this function and this is another function in the util.py file you can see that we are taking two parameters as input which is a root directory and the class list file and basically the only thing we're doing in this function is calling this method we're calling dataset catalog dot register and we are just doing something else right... we are just taking these two arguments as input into this function but basically just remember that in this function we are registering the data all of our data all of our annotations into detectron2 and this is a very important step when we are working with this detectron2, you can see that we are registering the training set under the keyword train and we are registering the validation set under the keyword val this is very important because we are going to make a reference to these two words (to train and val) later on so please remember and then the second argument is this Lambda function we have over here and you can see that basically we are calling another function in this util.py file which is called get_dicts so this is basically the function we are putting over here and we are putting these two arguments which is basically the location of the images and the location of the annotations for the training set and for the validation set we are iterating in the training set and in the validation set and for each one of these iterations we are registering each one of these sets right now let me show you this other function get_dicts and you can see that basically the documentation we have in this function is very very proper it's a very good documentation and it says read the annotations for the dataset in yolo format and create a list of pictures containing information for each image the arguments are a directory containing images and another directory containing annotations and the return is a list of dictionaries with all this information the file name for every single image and an unique identifier for every image then the height and the width of the image and then the annotations; the bounding box and also the category ID the class ID right and if I show you the code it's very straightforward the only thing we're doing is iterating in absolutely all the files in the annotations directory and for each one of these files we are opening the image for this annotation... the image that belongs to this annotation and then we are just taking the height and the width for this image and we're just creating this dictionary with all the information which is the image file name the ID the height and the width of this image and we are just saving everything into a dictionary and then the only thing we're doing is we are parsing through all the annotations and we're just getting all the bounding boxes and we are just getting also the class ID right so basically we are parsing through all of our data and we are getting all the information of our images and all the information of our annotations and something that's very very important is that if you remember our annotations are specified in the yolo format which means is the class ID and then is the X and Y position of the center of the bounding box and then it's the width and then it is the height and we are converting the annotation into another format which is the x y w h in absolute coordinates this is very important because this may be confusing but just remember that we are taking the annotation which is in the yolo format and we are converting into this other format x y means that it's the upper left corner and then it's the width and then its the height of the bounding box right so that's basically what we are doing here we are converting the annotation from this format into another format and then it's just getting the class ID and that's basically all and at the end of this function we are returning this list right is a list of dictionaries with all this information right so just go through this function and it's going to be super super straightforward and you have this super comprehensive docstring telling you exactly how everything works and all the input parameters and all the output and so on so this is is all for this function for get_dicts and now let's continue by reading register_datasets so after we register the dataset... the training dataset and also the validation dataset then we need to tell detectron2 exactly what are the class names right because so far we are parsing through the data we are parsing through the annotations but we are only parsing through the class IDs right the annotations they only contain the class ID but they don't really have the class name so it's very important for detectron2... it's very important we tell detector 2 what are the class names and that's why we are calling this function after we register the datasets so that's pretty much all for this function for register_datasets and now let's continue so after we register the datasets we can continue with the next line which is get config this is where we are going to create the entire configuration file we are going to use in this training process and the first line is get config and this is basically a detectron2 built-in function and basically we are getting something like a default configuration file with many many many default hyper parameters that's basically what we are getting here a very very long very comprehensive default configuration file and then the next line we are updating this file with many other values which are specific to the specific model we are using here right in my case it's retina net r 101 and basically after we are getting this default configuration file we are just updating this file with many other values which are specific of this model right that's very important and then the only thing we're doing is just setting other values... we are just manually updating other values in this config file right you can see that for each one of the other lines it's config dot a given key and it's the value for that key right in this case we are updating the the value of the training set the validation set the test set and so on right it's very self-explanatory each one of these configuration values right for example here we are telling detectron2 to use CPU here we are setting the weights of this model which is basically the pre-trained weights of this model we have over here and then we are just setting the batch size the checkpoint period which is how often we are going to be saving the checkpoints the learning rate and so on and I would say this is the most important part of this function, by far, right, because this is where we are telling detectron2 where is the training data and where is the validation data right because we have registered the datasets and if you remember we called one of the datasets train and we called the other dataset val so this is where we are telling detectron that the training data is the dataset we registered under this keyword train and the validation data is the data... the dataset we registered under this keyword: val, this is very very very important and this... I would say it's the the most important part of this function and that's basically all for this get config now let's continue and you can see that then the only thing we're doing is we're creating the output directory then we are creating this object which is the trainer, the trainer... the one that's going to take care of the training process then this line and actually these three lines we have over here this is also very important because when we are training a model using detectron2, during the training process we will have a lot of information regarding the loss function in the training set but we will not have any information regarding the loss function in the validation set that's the way detectron2 works by default so if we want to add this information if we want to add the loss function in the validation set if we want to access this information this is exactly what we need to do and this is why I created this class we have over here which is validation loss, this is the class which is defined in loss.py so long story short these three lines we have over here it's related to creating this custom output creating this custom debugging information regarding the training process so we have more information regarding how the training process is doing... how the training process is going and this is very important because this way we are going to have additional information and this is going to be super super useful once we are validating this model so now let's continue you can see that this line is resume or load and this is pretty much if we are resuming this training or if we are training from scratch and in my case I'm training from scratch so resume equal false and then the only thing I'm doing is calling trainer.train and this is pretty much all it takes in order to start this training process and that's pretty much all so this is a much more detailed explanation of these four functions we have here, in the util.py file and also of the function or actually the class definition we have here in loss.py so this is in order to give you more details regarding all these other functions and this class definitions and so on and now let's continue and let's go back to train.py because now that we already have all the code we need in order to do this training the only thing we need to do is to press play and that's it you can see I press play I get some huge output and this is... I'm just going to stop the training I'm going to show you something this is pretty much all the model... all the network... all the hyper parameters for this network we are using in order to train this model in my case remember it's retina net r 101 and then the only thing we will need to do from now on is just wait until the training is completed but in my case I'm not going to train it locally because it's going to take a lot of time I'm going to show you how to do this training from a Google collab because this is going to make the process much much much simpler and much much faster this is going to take care of the entire training process much faster than if I would do it in my local computer so I'm going to tell you how to do it from a google colab the first thing you need to do is to obviously upload your data and this is very important please remember to upload your data, otherwise obviously you will not be able to train this model from a google colab and in my case you can see that this is my data the same data I showed you in my local computer these are my train and my val directories and now you also need to... sorry, you also need to upload these files which are util.py train.py loss.py and class names right so basically is these files over here train.py util.py loss.py and also the class names which is this one so remember to upload all these files otherwise nothing is going to work and now let's move to this google colab, to this Jupiter notebook and I'm going to tell you exactly how you can train this model from here basically this is a very straightforward process the only thing you need to do is to execute each one of these cells so it's something very very simple to do and we are just going to be executing the code we have over here, the first step is to connect your Google colab with Google Drive so basically you need to execute this cell and you need to wait a couple of minutes... click on connect to Google Drive I select my account I scroll all the way down and I click allow and that's pretty much all then I need to wait a few seconds and that's going to take care of connecting my google colab with Google Drive okay so everything is completed then I'm going to continue with the next cell I'm going to run this... install these requirements which is running all this pip installs... this is going to be very straightforward the only thing you need to do is wait until everything is completed... okay that took a few minutes but now it's completed and you can see I have an output over here you must restart the runtime in order to use newly installed versions and if I scroll up I got a similar output over here so basically remember to restart the runtime if you have a similar message and that's going to be pretty much all so... Google Drive is now Mounted so everything is okay we have installed the requirements now let's continue then you need to change the working directory of this notebook so you need to execute this cell but it's very important you update this path to the path where you have uploaded the data and all of your files right in my case is content gdrive my drive and then this is the location of my data if I show you my Google Drive you can see this is my drive computer vision engineer TrainDetectron2ObjectDetector and if I show you here my drive computer vision engineer TrainDetectron2ObjectDetector so please remember to update this path with the location of your data in your Google Drive, your data and also your files right everything should be located in the same directory once you have edited this location the only thing you need to do is to press Ctrl enter and then that's going to be pretty much all in order to change the working directory and then the only thing you need to do is to execute this cell so you can see that we are executing the train.py file and I'm setting these arguments which are the device I am setting device into GPU it's very important because that's pretty much the reason why we are using a Google colab a Jupiter notebook in Google collab so this is very important then I'm also setting the learning rate in this value and I am going to train for 6000 iterations, I would say these two values these two arguments are not absolutely needed you can just use the default values but in my case for my data for my problem I noticed these values were better it was much better to use this learning rate and also it was better to do a shorter training only 6000 epochs would be just fine so now I have to press enter I need to execute this cell and that's going to be pretty much all to do this training process you can see how simple this is once everything is within this train.py file right once I created these functions and I put everything in this util.py... we can just execute everything from a super super high level calling train.py and please let me know what you think in the comments below but I think it's just amazing we can just train detectron2 2 from a super super high level as we are doing over here the only thing we're doing is calling train.py and we are passing the arguments exactly like this from a super high level we don't really care about the details we don't really care about the complexity we don't really care about nothing it's amazing, I don't know what you think but I think it's amazing please let me know in the comments below what you think so this is pretty much all for this training the only thing we will need to do now is we will need to wait until everything is completed and this is going to take some time this is going to depend on your data on your annotations on your problem on your specific problem in my case for my data it took something like two hours to do the entire training process so we are not really going to wait until this is completed because I have already trained this model when I was preparing this tutorial so let me just show you what the output looks like, once you trained your model you are going to have a directory which is called output exactly like the one I have over here and this is where you're going to have all the results of your training process you can see... you're going to see all of these models all of these checkpoints which are the weights of your process of your training process in all these different steps right and this is where we are going to notice this argument over here checkpoint period because we have set that the checkpoints should be saved every 500 steps and if you notice this is are all the checkpoints these are all the weights we have saved and if you check the numbers you can see that these are... 499 then 999 1499 and so on so all of these files are 500 steps apart and yeah so that's basically what it means to save the checkpoints every 500 epochs, at the end of your training process or actually during your training process you are going to be saving the checkpoints the weights exactly like this so at the end of your process you're going to have many many many many weights files exactly like I have over here so these are my weights but what I'm going to do is I'm going to take this file... this is the file with all the information of our training process in the training set and in the validation set so this metrics.json file is the one we are going to inspect is the one we are going to analyze to validate this model so I'm just going to download this file and now let's go to pycharm because I want to show you this file which is plot_loss.py so I have already downloaded this file and it's in my directory let me show you todays tutorial detectron 2 code and this is my metrics.json file I have just downloaded and now if I show you this plot_loss.py basically what we are doing over here is parsing through this file right parsing through all the information we have in this file let me open this file for you so you can see exactly how the information looks like you can see it looks very crazy right we have a lot of values we have a lot of information and basically we need a way to parse through this information and we need a way to visualize all this information super super quickly so that's why I created this... plot_loss.py because it's going to help us a lot in order to just get all the information we want from this file and just plot everything into a very nice looking plot so we can just do this validation much much quicker so let me show you how it looks like I'm just going to press play I'm going to tell you in a few minutes why I have commented these two lines I'm going to press play plot loss play and you can see that this is the training loss and the validation loss, the blue values are the training loss and the orange values are the validation loss but obviously this is something that we cannot analyze because this is a lot of information this is... this doesn't really look very well right so we're going to do something now which is going to make everything much much prettier which is we're going to do a moving average on each one of these functions basically we are going to apply another function which is going to smooth these values and it's going to make everything much much smoother, I already made all the code we need and I already made this function which is moving average and the only thing we need to do is I'm going to delete these comments and basically now we are going to plot the loss values... the same loss values we are getting from this metrics.json file and then we're also plotting the moving averages right we are plotting the same functions but the averages and you can see that this is how the averages look like right this is something that's much much prettier, and in order to show you much better I'm just going to remove these two plots and we are only going to plot the moving averages right this is much prettier this is much much better so now we have in blue the training loss I'm going to adjust the labels okay now you can see that everything looks better we have this values over here in blue we are plotting the training loss and in orange we are plotting the validation loss and we can see that both of these functions are going down and that's a very good sign but in the case of the training loss it seems we have reached a plateau over here so the training process goes super smoothly until it reached something like 5000 steps and in the case of the validation loss it seems we also reach a plateau but much much sooner right so this is basically where we are going to validate this model and this is also where we are going to decide which one of our checkpoints we are going to choose from this model right because we have many many many weights we have many checkpoints and we can just use any of these files in order to produce our inferences in order to produce our predictions so this is where we're going to Define exactly which one of these checkpoints we are going to use and I would say I would... I like how everything is going until this point over here because you can see that the training loss is going down and the validation loss is kind of going down as well and this is pretty much where everything is like a plateau right so if you ask me I would keep this checkpoint over here in the 3000 epochs so in the... sorry in the 3000 steps this checkpoint over here so this is where you're going to draw your conclusions and you're going to make decisions regarding what you're going to do next obviously another conclusion could be to do the training again it all depends on what's going on with your data ideally your training loss and your validation loss should be like closer together right in this case they are very far apart and that's something I don't really like but that's like ideally I think that if we take this model over here in the 3000 steps I think everything is going to be just fine but ideally I would like to have like these two plots more closer together right because otherwise this could mean the model is overfitting to the training data and the model is not going to perform well in unseen data but never mind let's just take this model, the one we trained over here in the 3000 steps and let's see what happens so I'm just going to get back to pycharm because now it's time to make our predictions now it's time to take the model we trained to take the checkpoint we chose and let's just make some predictions with this checkpoint with this model so let me show you how to do that I'm going to this file which is predict.py and this is the file we are going to use in order to make our predictions and you may see that everything is already coded so everything is ready and I'm just going to explain absolutely every single line of this file so you understand exactly how it works and you understand exactly every single line of this file you can see that the first few lines are a few Imports so I'm just importing a few functions which are important in order to make these predictions these are a few imports from detectron2 and I'm also importing CV2 then the first line is getting a configuration file absolutely every single time we use detectron2 we need a configuration file we need an object which is going to contain all the configuration for the specific task we are going to do with detectron2 in this case we are just getting this default configuration file with a lot of default values and then we are updating this file this default configuration with many other values which are specific to the model we use in my case as I used this model, as I use this pre-trained model, this Baseline, I have to use exactly the same one here and the only thing I'm doing is updating this default configuration file with many other values which are specific to this model then this is very very very very important I am setting the model.weights to the location, the path, of the checkpoint we are going to use and if I show you my google drive remember these are all the checkpoints we generated with this training process and as I am going to use the one we generated at the 3000 steps this is the one I have already downloaded and it's already in my file system you can see that this is the directory, the folder, of my python project and this is the file we are going to use: model_0002999.pth so this is exactly the file we are going to use and this is exactly the location of this file then as I am going to make these predictions in my local CPU I am setting device to CPU then I am creating this object which is this predictor and basically this is going to be the predictor we are going to use in order to make our predictions then I am loading an image, very important, because we definitely need an image if we are going to make predictions and this is the image I am loading let me show you this is exactly the image of an alpaca we are going to use in order to make predictions ideally we should be getting the location, the bounding box of this alpaca let's see what happens but this is what we should get ideally we should get the location of this alpaca and this is the location the path to this image right you can see its data val images data val images and then it is just this name if I search you can see this is the image we are going to use now let's go back here and then the only thing we need to do is to call predictor and we need to input the image we are going to predict and then we are just going to get all the outputs right? all the results but let's just stop for a minute and let me show you exactly how output looks like I'm just going to print outputs I'm going to comment everything else this is maybe the only coding I'm doing in this tutorial so I'm just going to press play and you can see that this is the output we got right so these are the predictions for this image you can see that we have many different fields one of them is pred boxes and these are basically all the bounding boxes, all the objects we are detecting, this is the first one and then these are all the other objects we are detecting something like 8, 8 different objects in this image and this is very important these are the all the bounding boxes and these are the X and Y coordinates of the top left corner and these are the X Y coordinates of the bottom right corner so these are the bounding boxes you can see that we are also detecting the scores we are also getting information regarding the scores the confidence values of each one of these bounding boxes for example the first one is 88.6 percent and the last one is 5 percent so... you can see that these are all the different confidence values and then we are also getting this information which is the class we are predicting right in my case I'm only using one class which is alpaca and it's encoded with the number zero but this is where you will have all the different numbers of all the different class IDs of all the objects you are detecting and please mind that in my case although my data set contains only one class ID because I'm only detecting alpacas you may notice that some of these objects were detected with a different class ID I have a 39 47 56 I have used detectron2 many times in many projects and this is an issue I have found in different projects so you can see for example in this case I should be getting only zeros because I only have one class in my dataset but I'm also detecting other random numbers so please please take a look at the numbers you are getting here and please take a look that everything makes sense and just make sure you are only detecting the numbers you should be detecting and if you are getting some random numbers as I'm doing right now just don't use those predictions do something like an if something like that and if the number you are getting is not within your predictions is not within your classes then just don't use those predictions, do something like that, but also in my case for example now you can see that the random numbers the random values are some detections with a very very low confidence for example this one is the fourth one so one two three four this one which is something like an 8 percent confidence and then this one which is a 5.8 percent and this one which is a 5.3 percent so I guess it's most likely this is going to be an issue with those objects with a very very low confidence value but you never know so please make sure the numbers you are getting they make sense now let's continue so I showed you the output you you are getting from the detectron 2 now I'm just going to uncomment everything and I'm going to continue explaining this file so you can see that the next line it says threshold equals 0.5 and this is the detection threshold we are defining so we are only going to consider valid all of those detections with a confidence value Which is higher than 0.5 now let's continue you can see that this is basically... we are parsing through the outputs you can see that we are detecting three objects the pred boxes the scores and the pred classes right so the only thing I'm doing is I'm parsing through this information and I'm just getting these objects pred classes scores and bounding boxes and then I am iterating in all the bounding boxes and for each one of these boxes I am getting the score, the confidence score, of that specific detection I'm getting the class ID I'm detecting the number of the class ID I am detecting and then if the confidence value is greater than the threshold then I am just getting all the values, the X Y position of the top left corner and the bottom right corner and then I'm just drawing a rectangle on top of my image, I'm not really checking that I am getting only zeros, I am not doing it here but that's a very good homework for you, I invite you to make an edit into this file and you make something like an edit for example here, and you say something like if the confidence score is greater than the detection threshold and pred is within the class IDs of my class dot names file, of this file over here, right, if the prediction we are getting is within my classes, is a valid number, if I am getting a number which makes sense then and only then draw the bounding box right that's a homework for you that's a very very good homework for you so yeah I'm just going to continue and you can see that I'm drawing the bounding box and then the only thing I'm doing is plotting this image so let's see what happens now I'm going to press play and let's see if we are detecting this alpaca properly or not... amazing we are detecting the alpaca super super properly remember this the image we are using and we are just detecting the only alpaca we have in this image and yeah we are just drawing the bonding box and the bounding box is enclosing the alpaca super super properly so everything is working just fine so this is going to be all for this tutorial this is exactly how you can train your object detector on your own custom data using detectron2 and this is going to be all for today. hey my name is Felipe and welcome to my channel in this video I'm going to show you how to use Amazon recognition as an object detector, Amazon recognition is a very interesting tool and a very powerful tool which I have used many times in my projects as a computer vision engineer now let me show you super quickly all the different categories all the different objects you can detect using AWS recognition and you can see that this is a very very long and a very comprehensive list of objects right for example you can detect dinosaurs you can also detect diamonds you can detect driving licenses e-scooters and so on if I scroll down you can see that these are many many categories and in total we have something like 290 different objects so this is definitely a lot and this is a very interesting tool because there are many cases in many situations in many projects in which you need to detect a very specific type of object and in some cases it may not make a lot of sense to train an entire object detector only to detect a very specific objects in some cases it may be more convenient and it may be easier and it may be quicker much quicker to just use something like Amazon recognition out of the box and you can just detect all of these different objects in the list right for example if we were working in a project and we need to detect Wheels we can either train an object detector from scratch to detect wheels or we can just use Amazon recognition out of the box right so this is a very interesting tool and I have used it many times in my projects and this is exactly what we will be doing today and in this video I'm going to show you how to use Amazon recognition to detect zebras this is a random category a random object I have chosen from this list so this is exactly the object we will be using in order to show you how to use Amazon recognition now let me show you super quickly the video we are going to use as an example so we can use this tool and you can see that this is a video in which we have many many many zebras we are going to use this video in order to detect all the zebras and in order to show you how to use Amazon recognition, now... what we're going to do now is going to pycharm and I'm going to show you the entire process of how to create a project how to create all the files we need how to install the requirements I'm going to show you absolutely every single step of this process we are going to start this project and we're going to build this project from scratch right so the first thing I'm going to do is I have already opened pycharm I'm going to file new project and I'm going to just create a project and I'm going to create this project exactly here which is this folder I have over here and I'm going to create it here where it says tutorial AWS reko this is where you are going to choose the exact directory where you want to create your project then I'm going to create a new environment and this is where my environment is going to be located and I'm going to create this environment using python 3.8 now I'm going to click on create I'm going to choose this window because I'm going to open this project over here and you can see that this is a completely and fully and absolutely empty project the only thing we have is the virtual environment which is called env and that is it, now the first thing I'm going to do is to install the requirements is to install the python libraries we are going to use today so I'm going to settings then I'm going to a project and python interpreter and I'm going to click on this button this plus button over here and then I'm just going to choose... I'm going to type opencv python this is one of the libraries we're going to use, I am going to click on install package and then we are also going to use boto3 and that's pretty much all, these are the two libraries we need in this project and then I'm just going back to pycharm and now I'm going to create the first file we are going to use in this project and I'm going to click here new python file and then I'm going to call this file main.py so the the first thing I'm going to do for now is to just write the entire pipeline the entire process we will be doing today the first step will be to create an aws recognition client... aws reko client right this is going to be the first step in this process then we are going to set the class set the target class we are going to be detecting right I already mentioned we were going to detect zebras in this tutorial so this is exactly where we are going to specify exactly what the object what's the category we are going to be detecting then we are going to load the video right the video we are going to detect today then we are going to read frames from the video the next step is to convert the frame to jpg this is a very important step then we are going to convert this... we are going to get a buffer from this conversion and we're going to convert this buffer to... to bytes right, it's going to be the next step in this process then the only thing we need to do is to use Amazon recognition in order to detect objects and then we are going to write all the detections to our file system right we are going to write everything to our disk to our local computer and this is exactly the process in which we are going to be working today now let me show you something else I'm going to create another file which is called credentials because in the first step in this process we are going to create this AWS reko client, and in order to do so we are going to need a couple of keys we are going to need an acces key which I'm just going to set in none for now and we're also going to need a secret key which I'm also going to set in none for now right we are going to need these two keys in order to continue with this project because we need to use these two keys in order to create a client, an AWS rekognition client, now let's go back to my browser and let me show you exactly how to create these two keys so let's go back to my AWS Management console and I'm going to show you super quickly how to create these two keys we need in this project but first obviously you need an AWS account in order to continue right this is very very important and also you need to login into your account once you have an account once you have created an account and you are logged into your account you are going to see something like this this is your AWS Management console and these are all the services which you have available in AWS right these are a lot... but in today's tutorial we are only going to use one service only one service which is IAM so we need to type IAM over here and we need to select this option then this is your IAM Management console and you need to select users we are going to create a new user then you need to select add users and we are going to choose a name for this user I'm going to say something like AWS reko tutorial right this is the name of my user this is the user I'm going to create then you need to select attach policies directly and we are going to search for rekognition right and I'm going to select Amazon recognition full access I click here then next and that's pretty much all so I'm just going to create user so the user is now created and then I'm going to select the user over here AWS reko tutorial and then I am going to security credentials because now it's where we are going to create the two keys we need in our project so we scroll down until we... until this section over here access keys and create access keys then you you can see that we have all these different options and if I'm not mistaken it's pretty much the same how you create this access key pretty much absolutely all these options are going to create exactly the same keys and you can just use... use them from your project if I'm not mistaken but we are going to use this one over here which is local code because this is the description which fits better to our project right you plan to use this access key to enable application code in a local development environment to access your AWS account if I'm not mistaken it's pretty much the same if we use any other option but let's just use the option which fits better with our use case and now you can see we have a warning over here which is alternative recommended use an integrated development environment IDE which supports the AWS toolkit toolkit enabling authenification through iam identity Center and this is very important because this is a warning we get from AWS because it means there is a better way or there is a more secure way to create these keys and to access this service but in this tutorial we are not going to mind this warning because it would involve to create a solution which it's only useful for a very specific IDE right in my case I'm using pycharm and if I follow these instructions I would be using a solution which is only useful for pycharm right and I want to make this tutorial as generic as possible and I want you to use it as well so in case you're using a different IDE let's just create these access keys in a different way right the only thing I'm going to do is I'm going to select this checkbox over here I understand the above recommendation and I want to proceed to create an access key and I'm going to click next right I'm going to show you a very very generic way to do it which is going to work for whatever your IDE is right if you use pycharm or if you use visual studio and so on so I'm not going to type anything here so just create access key and these are our access Keys something that's very very important is that access keys are very personal and you should never disclose them with anyone in any situation right so you should never do something like I'm doing right now right just making a video with my access keys completely available for anyone watching this tutorial never do something like this right in my case it's not really that important because I'm just going to delete these Keys once this tutorial is over but please be super super Mindful and super careful with who has access with your access keys, with your private access keys, because this is very very sensitive information, so the only thing I'm going to do for now is to copy these two Fields I'm going to start with this one which is access key I'm going to copy this field and I'm going to get back to pycharm I'm going to my file to credentials.py and the only thing I'm going to do is to paste the access key over here right then let's get back to this page and I'm just going to copy the secret access key and I'm going to head back to pycharm I'm just going to paste the secret key and that's pretty much all so these are the two keys you need in this project and now we can continue with the main.py file and we can just start coding our entire pipeline so let's get started and the first thing I'm going to do is to import boto3 and let's import opencv as well so we can just focus on everything else right so I have imported the two libraries we have installed in this project and now let's get started by creating this AWS reko client and this is how we're going to do I'm going to call this client reko client and this is something like boto3 dot client and then I need to input rekognition and then this is where we are going to input the access keys right so we're going to have two keys one of them is AWS access key ID and then the other one will be something like AWS secret access key right and now the only thing you need to do is to import credentials I need to import these two variables right so the first one will be something like credentials dot access key and then the other one will be credentials.secret_key and that's pretty much all so now let's continue and we're going to set the target class so I'm going to create a variable which is Target class and this is where I'm going to define the class we are going to be detecting today, as I already told you we are going to be detecting zebras so now let's continue now it's time to load the video and what I'm going to do is to go to my directory where I have the video and I'm going to copy and paste this video to my directory where I have created this pycharm project so now the video is located in this pycharm project and it's called zebras.mp4 so let's go back to pycharm so now let's call... let's load this video exactly like this I'm going to call CV2 video capture and then this will be zebras dot MP4 and this will be cap okay now let's read frames from the video so I'm going to define a variable which is ret I'm going to initialize it as true and then while ret I'm going to read frames from the video like this ret frame equal to cap dot read right so we are reading frames from the video and now let's convert this Frame to jpg and this is how we are going to do I'm going to call CV2 imencode if I'm not mistaken then this will be jpg and then frame right and this is going to return two variables one of them we are not going to use it so it doesn't matter and the other one is a buffer okay now let's convert buffer to bytes and I'm going to do it like this let's call this something like image bytes and this will be buffer to bytes if I'm not mistaken something like this I'm not sure about this character I'm just going to execute this file I'm going to do it for only one frame so we make sure everything's okay and let's see what happens okay and I got an error and it says something like could not find encoder for this specify extension in function imcode let's see if I have a... if I have a character missing I think it's dot jpg let's see now... now I have an error which is object has no attribute to bytes so I'm almost sure that this is without the underscore let's see now... and now everything is okay okay so I'm just going to remove this break and now it's the most fun part of this tutorial because now it's the time to use Amazon rekognition to detect objects in this video so this is exactly how we are going to do I'm going to call the client we have just... we have just created reko client and I'm going to call detect labels I'm going to input the image we have just created this image bytes and this will be something like image I'm going to open a dictionary and this will be bytes and then image bytes and that's pretty much all and now I'm going to set the minimum confidence value in which we are going to... for which we are going to detect objects right we are going to set this value in 50% so... and I think this is a capital M and this means that we are only going to detect objects if the confidence value is greater than 50%, for everything else we are not going to get the object right we are going to filter all the detections with a confidence value lower than 50 percent that's exactly what it means and this will be something like response right and now let's do it like this okay and now I'm just going to iterate for for label in response labels right I'm going to iterate in all the results we got so this is how I'm going to do... if label name equal to our Target class right so if the object we have detected is a zebra then we are going to iterate for instance number in range Len label instances and if I'm not mistaken this is with a capital I right so we are going to iterate in all the zebras we have detected and now let's continue now let's get the bounding box we have detected with this... in this object in each one of these objects so this will be something like label instances instance number and then we need to call bounding box let's execute the code so far to make sure everything is okay and let's see let's just let's do it for only one frame so I'm going to break the loop here labels right because this is with a capital L most likely okay everything's just fine so I'm going to delete the break and I'm going to get back here and let's continue so now I'm going to unwrap all the information in the boundary box and this is something like X1 is equal to bonding box left and I'm going to cast it to int okay then y1 is equal to int bonding box top okay then the width of this bounding box is equal to bounding box... um width with a capital W if I am not mistaken and then the height is equal to int bounding box height and let's see what happens if we just execute... if we just print these values so I'm going to print X1 y1 width and height and also I'm going to remove the int for now because... for now let's just remove it so I can show you something and then I'm going to add the int again but let's just for now to make sure everything is okay I'm just going to execute this as it is... okay let's see what happens okay you can see these are the values we are getting and this is why I removed the int and this is why I'm not casting to int because otherwise everything will be a zero or a one so these are the values we are getting and everything it's in relative coordinates this is very very important so what we need to do now is to multiply these values for the width and the height of the frame we are reading right so I'm going to Define two new variables which are H and W and these are the height and the width of every frame so this will be frame dot shape and now let's just continue by doing something like this so X1 will be bounding box left multiplied by the width of the image right then y1 will be exactly the same but for H... times H and then this is times W and this is times H okay and now I'm going to cast it to int okay and then let's print the values for X1 y1 width and height again and let's see what happens okay and now you can see that we are getting integers and everything seems to be okay right we are getting objects we are detecting objects so everything is okay so the then the next step of this pipeline is to write the detections but before we do so let's just make sure everything is 100% proper everything is working just just fine and let's just visualize some of the frames with all the bounding boxes we are detecting on top and let's see what happens so I'm going to call cv2 dot rectangle I'm going to input the frame and then X1 y1 and then x 1 plus width and y1 Plus height. and then I need to input the color if I'm not mistaken which is going to be green and then the thickness of the rectangle which will be three for now... and then let's see what happens I'm going to visualize this Frame by calling imshow frame frame and CV2 waitkey okay so we are plotting a bounding box on top of absolutely every single frame we are plotting a bounding box for each one of our objects and let's see what happens I'm just going to execute this file and let's see if we are detecting all of our zebras and everything seems to be working just fine right if I just press a letter you can see that we are just detecting all the frames this is not running on real time because obviously we are detecting many many many zebras and we are plotting a rectangle a bounding box for each one of these zebras so this is not running on real time but you can see that nevertheless this is working just just fine so the only thing we need to do now is to take all these detections and we need to write these detections to our file system to our computer so this is how we are going to do I'm going to remove all the plotting because we are not going to do it anymore and now let's just write the detections and in order to do so I'm going to create a new variable... with the output directory with the location of the output directory which is where we are going to save all these detections so I'm going to Define this variable like output dir and this will be my local computer and the directory will be called Data so let's go back to the directory of this pycharm project and let's create a new directory which is called Data I'm going to press enter and that is it now let's save all these detections into the YOLO format so I'm going to create another directory which is imgs I'm going to create another variable for the images directory which will be something like output dir imgs and This Is os path join output dir and images right I'm going to import OS and then I'm going to create another variable for the annotations for the detections I'm going to call this other variable anns...output dir anns and this will be something like this and now I'm going back to my local computer to my file system and within this data directory I'm going to create two additional directories one of them for the frames for the images which I'm just going to call imgs exactly how I have called this variable over here and then I'm going to create another folder which is called anns right exactly as I have called this other variable over here so now everything is set everything is ready we have just created the directories where we are going to save all the data now let's get back over here and the only thing we need to do is I'm going to do something like with open I'm going to do it here before we start this iteration is going to be much much better if we do it here for every single one of these frames we are going to open a text file and the path name will be something like with os path join output directory anns and then this will be the file name which I'm going to call frame Dot txt and then I'm going to input the frame number format frame number which we haven't defined so I'm going to Define it in a second but let's just say string frame number zfill 6 right now let me explain this in a few... in a couple of seconds but for now let's just get here I'm going to define a new variable which is frame number I'm going to initialize it as -1 and then I'm going to increment it for every single frame we read over here okay so we are initially in -1 we are incrementing this variable here and then for absolutely every single image we are creating this file name which is frame and then this integer... this number but with six zeros we are filling this number with six zeros so we make sure all the file names are all the same length that's very important, that's actually more for formatting reasons it's not 100% needed but it's going to make it look much much nicer so now let's just continue and I'm going to open this... as write and then as f and then that's pretty much all okay and now for each one of our detections the only thing we need to do is to write these detections and this is how we are going to do f dot write we are going to write five numbers remember we are going to do it in the yolo format so we need five numbers and this will be something like the first one of these numbers will be a zero because we will be detecting only one object which is zebra so this will always be the number zero then it's the the X and the Y coordinates of the center of this bounding box so this is something like X1 plus width divided by 2 and then it's exactly the same but for the y coordinate plus height divided by two okay and then it's the width and then the height and I see there's an issue here okay a parenthesis missing let's see now okay and... okay perfect and if we are using the yolo format remember we are just converting all these values into integers but if we are going to save the annotations into the yolo format we don't really need to do this conversion right so I'm just going to delete the integer and this multiplication I'm going to something like this because remember how the YOLO format works we need the coordinates into the relative... we need relative coordinates so we... with the values like this will be just fine and that's pretty much all okay so we are writing all the detections and once we have written all the detections the only thing we need to do is to close the file and that's pretty much all and let's save the images as well let's just prepare this dataset as if it were a data set in the yolo format so we can just take this dataset and we could potentially train a model we could train an object detector with the data we are going to be saving, and in order to do so we need to save the detections but we also need to save the images so I'm going to save the images over here we can just do it after we save all the detections we can call cv2 imwrite then the file location which will be pretty similar to the um to the detections but we are going to change txt by jpg and that's pretty much all but we also need to change the directory which will be imgs okay and then we need to input the frame... and that is all... okay so let's see now if everything is okay let's just run it for only one image and let's see what happens everything is just fine and if I go to my local directory I open anns you can see I have a file with many many detections which makes sense because we have many many zebras and then if I go to the images directory you can see I have a frame the first frame from the video so everything seems to be just fine so the only thing we need to do now is to execute exactly the same process but for absolutely all the frames so I'm going to remove this break and then let's see what happens okay I see I got an error because we should be doing everything else only if we have read a frame right so this is a very small mistake and also while I was waiting for the execution to be completed I realized another mistake which is we should be dividing only the width and only the height by two these are the X and the Y coordinates of the center of the bounding box everything should be okay now so in order to be 100% sure everything is okay I'm just going to execute this file again now the execution has been completed and we don't have any errors so everything is just fine and if I go to the images directory you can see I have 755 images because we are starting from zero so we have 755 images and these are the images of our zebras right these are all the frames from the video and then if I go to the annotations directory you can see I have all my annotations and I also have 755 files right we have 754 and we are starting from zero so we have 755 so everything is working just fine so this is exactly how you can use Amazon rekognition as an object detector this is exactly how you can detect objects using Amazon rekognition and it's going to be all for this tutorial in this video we're going to work with automatic number plate recognition and this is exactly what you will be able to do with this tutorial you can see that we are detecting all the license plates in this video and we're also reading the text from these license plates we're using 100% python we're going to use an object detector based on yolo V8 we are going to do object tracking and we are going to read the text from the license plates using easyocr so this will be an amazing tutorial my name is Felipe welcome to my channel and now let's get started so let's get started with this tutorial today we are going to work with automatic number plate recognition and let me show you a few resources a few repositories which are going to be super super useful for today's tutorial the first one is Yolo V8 because we are going to be detecting license plates and then we're going to be reading the text from the license plates right and in order to detect our license plates we are going to use an object detector which is based on yolo V8, so yolo V8 is going to be super super important in today's tutorial and I'm going to show you more details in a few minutes but for now let me show you the other repository which we are also going to use in this tutorial and it's going to be super super important and it's sort it's an object tracking algorithm which is called sort because today we're going to do object detection and we're also going to do object tracking this is going to be an amazing tutorial and in order to do object tracking we are going to use sort and then once we have detected the license plates once we have implemented all the object tracking once we have done everything we need to do we are going to read the content of the license plate using easyocr so this is a python Library which is going to be super super super important in this tutorial and now let me show you the data we are going to use in this tutorial let me show you the video we are going to use in order to test the automatic license plate recognition software we are going to use in this tutorial you can see that this is a video of a highway and we have many many cars which are going through this highway and the important thing about this video is that all the cars... we have like a very very frontal view of absolutely all the cars and most importantly we have a very frontal view of all the license plates right you can see that for absolutely every license plate we detect in this video we have a very very very frontal view and this is an ideal point of view to build a project like this so this is exactly the video we are going to use in this project and now let me show you something else if I go to Google and I search for license plate and I go to images let me show you something you can see that we have a lot of diversity when it comes to license plates right we have many different types of license plates we have some license plates which are comprised only with numbers like this one then we have other license plates which are only letters like these two and we have many many different examples we have many different types many different formats I would say that absolutely every single country, absolutely every single state, absolutely every single time in history have its own a license plate format right its own license plate style its own license plate system right there are many many different type of license plates there's a lot of diversity when it comes to license plates and obviously that it it's very very challenging to build an automatic license plate recognition software to deal with absolutely every single type of license plate right, it's... I'm not going to say it's impossible it's not impossible but it's a very very challenging task so in order to make it more simple in order to simplify our problem we are going to focus only on one very specific type of license plate which is this one we are going to be working with the United Kingdom license plate system, with the United Kingdom license plate format, which is comprised of seven characters the first two characters are letters then we have two numbers and then we have three more letters right so we have two letters two numbers and three letters and this is the exact structure of the license plate type we are going to be working today in this tutorial right, this is the exact same type we are going to be detecting with the software we are going to build in this tutorial but today I'm going to show you a very generic process and a very generic pipeline so by making some adjustments into the code we are going to be making today you will be able to apply the same process to other types of license plates right we are going to work with this type in this project but you will be able to make some adjustments in everything we're going to be doing today so you will be able to apply the same process to other types of license plates right so that's something I'm going to show you better in a few minutes but for now let's continue now let me show you something else, when we were starting this tutorial I showed you that we were going to use an object detector based on yolo V8 to detect license plates now let me show you the data I used in order to train this license plate detector right this is exactly the data set I used in order to train this detector, and I'm going to give you a link to this dataset in the description of this video, and if you want to know exactly how I trained this object detector I invite you to take a look at one of my previous videos where I show you how I train an object detector using yolo V8, in that video is the step-by-step guide of how to train an object detector using yolo V8 and that's exactly the same process I followed when I was creating this license plate detector so this is the data I used and if you want to know exactly how I trained that object detector then just take a look at the video I'm going to be posting over there right so now let's continue I already showed you all the resources we were going to use in this tutorial I already showed you the type of license plate we are going to be detecting today and now it's time to go to pycharm so we can start implementing all the code of today's tutorial, and now let's go to pycharm let's go to this pycharm project and let me show you some files I have over here, you can see I have many many different files and for now let's just focus on these two: main.py and util.py. main.py is the file in which we are going to be coding the entire pipeline of this tutorial right you can see that this is a sequence of steps which we are going to follow in order to build this automatic license plate recognition software you can see that the first step is loading the models then loading the video then we're going to read frames and so on this is the entire pipeline the entire process we are going to be building today and then we have this other file which is util.py, in this utils file we have five functions let me show you these are the functions we have defined over here and from all of these functions we are going to focus on these two which are read license plate and get car, these functions... if I open these functions you can see that they are completely empty right we need to implement these functions in this video and then the other three functions they are already implemented right everything is ready and we're just going to use them and the idea is to focus on these two functions over here because these two functions are way more important from a computer vision point of view right so these are the functions we are going to focus the most and this is the util.py file now if I go back to main.py now it's time we start with this process now it's time we start with this Pipeline and in order to do so we are going to start importing YOLO so I'm going to say from ultralytics import YOLO and then we are going to load the models that's the first step in this process and the interesting part is that we are going to have two models because we are going to be detecting license plates but we are also going to be detecting cars that's going to be a very important part in this process so I'm going to be loading two models I'm going to call the first one of these two models coco model because this is a model which was trained on the coco dataset and this is going to be YOLO and we're only going to use a pre-trained model from YOLO V8 which is Yolo V8 nano.pt right we are just going to call this pre-trained model and this is the model we're going to use in order to detect cars it's very important we detect cars I know we are going to detect the license plates and we are going to read license plates but detecting cars is going to be super super super important and you're going to see exactly why in a few minutes then we're also going to load the license plate detector and we're going to call it license plate detector and this is going to be YOLO and we need to input the path to this license plate detector and the license plate detector is located in a directory which is called models and is called license plate detector.pt so I'm just going to... models... okay now it's time to load the video we are going to use today and in order to do so I'm going to import CV2 and I'm going to call CV2 video capture and I'm going to input the video location which is something like the current directory and it's called sample.mp4 okay and this is going to be cap okay now we are going to read frames from the video so I'm going to define a variable which is ret I'm going to initialize it as true and then while ret I'm going to read frames from the video like this ret frame equal to cap dot read if ret then I am going to continue okay and this is going to be pretty much all for now so we are reading frames from the video and now it's time to continue detecting all the vehicles right we are going to be detecting all the cars and therefore we are going to be detecting the vehicles and in order to do so this is where we are going to use the first model which is the model trained on the coco dataset so we are going to do something like this I'm going to call coco model and I'm going to input the frame and this is going to be results right I'm going to call this object 'detections' and in order to move one step at the time... I need to access the first element... in order to move one step at the time I'm going to print detections and I'm only going to execute the first 10 frames otherwise it's going to be very... this is going to take a lot of time so and frame number lesser than 10 and obviously I need to Define a variable which is frame number I'm going to initialize it in -1 and then I'm just going to increment it here okay and I don't really need the pass anymore and let's see what happens if I print detections okay so everything seems to be working just fine this is all we got and you can see that this is a lot of information these are all of our detections so everything seems to be working just fine so what I'm going to do now is we are going to iterate for detection in detections and this is going to be for detections.boxes dot data dot to list and let's print detection again so we know exactly how this looks like and we know how to access all the information okay so this is how each one of our detections looks like right you can see that we have one two three four five six numbers and the way this works this is going to be something like X1 Y1 X2 Y2 then we will have the score and then we will have the class ID right this is detection so remember we are using a model which was trained on the coco dataset so we are detecting many many different objects right this is the class ID this is exactly the type of object we are detecting at every single time at every single one of these detections so this is very important and then we have the confidence value right this is how confident our object detector is of this specific detection and then this is the bounding box right so we have X1 Y1 X2 Y2 the bounding box then the confidence score and then the class ID and something that's very very important we are doing all of this in order to detect Vehicles so as the coco dataset... as this model which was trained on the coco dataset is detecting many many different objects we are going to say something like this if int class ID in vehicles then we are going to continue and vehicles is a variable which we haven't defined and we are going to Define it with the indexes with the class IDs of all the vehicles in the coco dataset this is a list of all the objects which we can detect using this model right you can see that these are a lot of objects and some of these objects are related to vehicles and some other objects are not for example you can see we have person bicycle car motorbike airplane bus train truck and so on right so from all this very very long and very comprehensive list we are going to make sure we are detecting a vehicle so we are going to say if the class ID we are detecting is either a car or a motorbike or a bus or a truck then we are going to continue and if not we are going to neglect the bounding box, the detection we just got, and the indexes we are interested in are 0, 1, 2 for car so we are just going to put two then three for motorbike four five for bus and then six seven for truck right we don't really have any motorbike in this video I know for sure because I already watched the video but nevertheless in order to make this more generic I'm just going to add a motorbike as well so if our class ID is within our vehicles then we are going to continue and I'm going to create another variable which is detections_ and this is where I'm going to save all the bonding boxes of all the vehicles we are going to detect in this video so I'm going to do something like this if we have detected a vehicle then I'm going to append the bounding box and the confidence score to this new variable and please mind that I'm not saving the class ID from now on it's not really that important, we are not really going to care about the specific class ID we have detected from now on the only thing we care about about our detections is that they are Vehicles right and we don't really care to know exactly what type of vehicles so this is the new variable in which I'm going to be working from now on in this tutorial in this process and now let's continue and now it's the time in which we are going to implement the object tracking remember we were going to work with object tracking in this tutorial and now it's the time where we are going to implement this tracking functionality into this project and before we do so let me give you a very very quick explanation Regarding why exactly we are using this tracking why exactly we are going to implement this object tracking and basically every time we solve a problem every time we solve not only a computer vision problem but any type of problem you need to use absolutely all the information you have available regarding that problem and in this case we are going to be tracking license plates which are moving through a video right we are going to be detecting license plates on individual frames and these license plates are objects which are moving through a video so if we are able to track this license plate through all this video we will have more information and this additional information is going to be super valuable in order to build a more robust solution so that's pretty much the reason why we are going to implement this object tracking and we are going to be tracking... we're not going to be tracking the license plates themselves but we are going to be tracking the cars, the vehicles, and I'm going to show you exactly why later on so this is what we are going to do we are going to work with this repository remember I showed you this repository when we were starting with this tutorial and the first thing you should do is cloning this repository into your local drive into your local directory you need to clone this repository into the directory into the root directory of your pycharm project so in my case this is the root directory of my pycharm project this is where I have all my Python scripts and this is where I have all my files related to this project and you can clone this repository in one of these two ways let me show you one of the ways is opening a terminal and typing something like git clone and the repository URL so I'm going to click here I'm going to copy the repository URL and then I'm going to paste the repository URL here and then the only thing you will need to do is to press enter right and that's exactly how you can clone this repository into your local computer but there is another way in which you can do it and actually this is a much more simple way and maybe you prefer to do it like this which is just downloading the entire repository as a zip file and once you have downloaded this file this ZIP file the only thing you need to do is to copy and paste is to take this directory this sort Master directory into your local directory right that's the only thing you need to do is to drag and drop this directory into your local computer and that's it and please mind that this directory is called sort-master but you will need to edit the name you will need to rename this directory into sort right you can see here in my computer this is my directory this is called sort if I open this directory you can see these are all the files which are in this repository so basically remember to rename this directory into sort it's going to be called sort-master but you need to rename this directory into sort that's very very very important otherwise you will have some issues possibly you will have some issues with the next steps in this tutorial so let's go back to pycharm this is the repository you need to clone into your local directory and remember to call the directory containing this repository remember to call this directory sort now let's take it back to pycharm and what I'm going to do now is just importing sort... let's call from sort dot sort I'm going to import everything we are going to import absolutely everything from this library and then I'm going to call an object I'm going to create a new object which is called mot_tracker and this is going to be equal to sort right this is the object tracker we are going to use in order to track all the vehicles in our video and now let's get back here and what I'm going to do now is just calling mot_tracker.update and I'm going to input a numpy array... of this list we have created containing all the vehicles in our video right and this is going to be something like track IDs right so track IDs is going to contain all the bounding boxes of all the vehicles we have detected in this Frame but with the tracking information right it's going to add an additional column an additional field which is going to be the car ID the vehicle ID for each one of the cars we are going to detect and this vehicle ID or this car ID is going to represent that specific car through all the video right so let's continue so now we are tracking all of our objects all of our cars and now it's the time to detect the license plates right so far the only thing we have detected is the cars in the video but now it's the time to detect the license plates in order to do so we are going to use this detector over here which is license plate detector and we're going to do it exactly the same way as we have detected the cars right I'm just going to copy and paste this sentence and I'm going to replace coco_model by license plate detector right and this way we are going to be detecting all the license plates I'm going to call this object license plates and then I'm going to iterate in all the license plates we detected within this Frame and in order to do so I'm going to call for license plate in license plates dot boxes dot data dot to list and that's pretty much all and then let's unwrap all the information we got from this license plate exactly as we did before so this is going to be something like X1 Y1 X2 Y2 score and class ID this is going to be license plate okay then we will need to assign each license plate to a given car right because we have detected all the cars in every frame and all the license plates in every frame but so far we have cars and we have license plates and we don't really know which license plates belong to which car right and we know for sure that every single license plate will be on one of our cars but we don't really know which one goes with which one right so now in this step is where we are going to assign a car to absolutely every single one of our license plates right and in order to do so we are going to use one of the functions in our util.py file we are going to use this function which is get car this function receives a license plate and receives this object we have over here receives this object with all the tracking information for all the cars in that specific frame and it returns a Tuple containing the vehicle coordinates and its ID right so we are going to call this function get car and this function is going to return the car this license plate belongs to right this is what we're going to do I'm going to import from util import get_car and now I'm going to call get_car I'm going to input the license plate and I'm going to input this object which is track IDs remember this object contains all the bounding boxes and also all the tracking related information right that's very important and the return will be the coordinates of the car this license plate belongs to so it's going to be something like X car 1 Y car 1 X Car 2 Y Car 2 and then the car ID for this car right remember every single car in our video will have an ID it will have a very unique ID which is going to identify the car through all the frames in the video that's very important and also please mind that this function is completely empty for now right this function is only returning some very dummy values and this function is completely and 100% empty and this is exactly what we will need to implement in the next step in this project right once we are completely ready once we have completed this pipeline then at the end of this pipeline at the end of this process then we are going back here to util.py, to this file to the util.py file, and we're going to implement this function right so now we have assigned the license plate to a very specific car now we know what's the car this license plate belongs to and now we can continue with the next step which is cropping the license plate and this is how we're going to do we are going to call frame and then we're going to input the license plate coordinates which is int Y1 int Y2 and then int X1 and int X2 right so this is the license plate crop and that's pretty much all we need to do in this step of this process and now let's continue to the next step which is processing this license plate right now we are going to apply some image processing filters to this crop we have over here in order to further process this image so we improve this image so it's much simpler for the OCR technology for easyocr to read the content from the license plate now it's time to apply some image processing filters to this crop and specifically the filters we are going to apply are a grayscale conversion and then we are going to apply a threshold so let's see how we can do that I'm going to call CV2 dot cvt color I'm going to input the license plate crop and then I'm going to call CV2 color bgr 2 gray and this is going to be license plate Gray license plate crop Gray right now we have converted the license plate crop into a grayscale image and now the only thing we need to do is to call CV2 threshold we are going to input this grayscale image then is the threshold which I'm going to set in 64 and then it's the value at which we are going to take all the pixels which are lower than the given threshold right which is 255 and then I say the value at which we are going to take all the pixels which are lowered than the threshold because we are going to use the inverse threshold we are going to use the thresh binary... thresh binary inverse type of threshold and this type of threshold is going to take all the pixels which are lower than 64 and is going to take them to 255 and all the pixels which are higher than 64 is going to take them to zero right that's exactly how this threshold works and if you want more details on how this function works I invite you take a look at one of my previous videos where I show you an entire course of opencv with python and one of the lessons in this course is exactly about thresholding right it's exactly about this function so I'm going to be posting a link to this course somewhere in this video so you are welcome to take a look at this course and this lesson particularly to get more details on how thresholding works now let's continue this is going to be equal to a variable which we are not going to use in the tutorial so it doesn't really matter and then I'm going to call the output license plate crop threshold right so this is going to be the thresholded image and its exactly the image we are going to input into our OCR technology into our easyocr algorithm, in order to be more more clear about the difference between these two images I am going to visualize these images super super quickly so you see exactly how they look like I'm going to call imshow and I'm going to input this image which is license plate crop I'm going to call this window crop I'm going to call it original Crop so it's more clear this is the image we are cropping from the frame and then I'm going to call cv2 imshow again and in this case I'm going to be plotting the threshold and I'm going to input this other variable and then the only thing I'm going to do is to call CV2 wait key and let's take a look at these two images super super quickly so you see exactly how they look like and this is what we got and you can see that this is the frame this is the crop we are making from the frame so this is the license plate and this is exactly how we are cropping this license plate from the frame and this is the thresholded image right you can see that in this image absolutely every single Pixel is either white or black and this type of image this thresholded image will make it much much simpler to easyoce to our OCR technology to read the content from this image right this is the image we are going to use in order to read the license plate number because this is going to make it much much simpler to easyocr so it's going to be much simpler to our OCR to read the license plate so that was like a very very quick way to show you how these two images look like and now let's continue now it's the time to read the license plate number we are almost there we have almost completed the this process and this is how we're going to do now we're going to call another function which is defined in util.py and this function is read license plate and you can see that this function is not implemented either this function is completely empty we are returning some dummy values and this is another function which we are going to implement later on we are going to implement after we are happy with this process once we are completely and absolutely happy with this pipeline then we are going to move to util.py and we are going to implement this function as well. But for now we are just going to... we're just going to use this function so I'm going to import it as well uh no this is not the function name... read license plate... something like this and now let's see how we can use this function I'm going to call util Dot read license plate and this is going to return two values let's look at the function documentation to see exactly what are the values which are going to be returned here... we are going to... it is going to return a tuple containing the formatted license plate text and its confidence score so this is going to be something like license plate text and then license plate text confidence score right these are the two values we are going to be getting from here and the input should be the license plate crop in our case we are going to input the thresholded crop right this thresholded version of our crop and that's pretty much all right remember we are just completing the pipeline the most generic process then we are going to get back here in order to implement this function and this other function right and now let's continue now the only thing we need to do is to write the results we are almost there we have almost completed this process and now obviously if we want to take these results and we want to visualize these results or if we want to analyze these results whatever thing we want to do with these results we obviously need to write these results to our local computer so this is how we are going to do in order to write these results we are going to use another function which is also defined in this util.py file and it's called write csv and this function is implemented this function is 100% and fully implemented you can see that this is all the code we have for this function and everything is just ready and we can just use this function as it is remember in this tutorial and in basically all my tutorials we always focus on the computer vision part of the problems so writing this csv file is not really that important from a computer vision point of view so that's why we are not really going to implement this function live in this video but this is already implemented and we're just going to use it so let's see what this function does and it says write the results to a CSV file and it receives two arguments which are the results which is a dictionary containing the results and then it also receives a path to the CSV file we are going to produce and this is going to be the path in which we are going to write this CSV file right it's the path in which we are going to save the CSV file we are going to produce so if we are going to input a dictionary then we need to produce a dictionary in order to input into this function right we need to take all all of our information and we need to put all of this information into a dictionary right that's very very important so that's what we are going to do now because for now the only thing we have done is just Computing all the information but we have not saved this information into any type of dictionary or anything like that so I'm going to create a new variable which is called results on results is going to be a dictionary and then this is where I'm going to save all the information and this is how we are going to do the first key in this dictionary will be the frame number right we are going to save all the information and we are going to start with the frame number we are going to have a different key for absolutely every single frame in our video and then for absolutely every single frame we are going to save all the information which is related to all the cars we are detecting and most importantly to all the license plates right so then I'm going back to the end of this pipeline here and I'm going to say something like... I'm going to make a very quick edit first which is going back to this function and instead of returning two None I'm going to be returning two zeros right because we are going to reserve this other output we are going to reserve the None, None output for those times in which we are going to find an error or we are going to have any type of issues reading the license plate and this is going to be much more clear later on once we are implementing this function but for now just bear with me that it's much more convenient to return some dummy values which are different than None so let's get back here and this is where we're going to say if license plate text is not None we are going to save all the information about this license plate in this dictionary we have just created so we are going to take this variable over here which is results for that specific frame number and we're going to create a new entry with all the information for the license plate we have detected right and this is how we're going to do I'm just going to write it first and I'm going to explain it once it's done once I'm completed and this is what I'm going to do I'm going to say the next key is the car ID right this is going to be results frame number car ID and then for this car I'm going to create a new dictionary which is going to have two keys one of them is car and the other one is license plate for car we are going to have another dictionary which is the bounding box and that's it right and for the license plate we are going to have another dictionary which is something like bbox... the bounding box then also the text we have detected then the confidence value for the bounding box and then the confidence value for the text right okay and that's pretty much all so I'm just going to format this a little nicer and that's pretty much all now let's see what exactly we need to input in each one of these fields okay so basically for the car bounding box we are going to input these values over here which are the car bounding box right these are the coordinates of the bounding box of this specific car and then for the license plate bounding box we are going to input these values which are the coordinates for the bounding box of this license plate and then for the text we are going to input this value which is license plate text for bounding box score we are going to input this value which is the score in for in which we have detected this license plate then for text score we are going to input this variable which is license plate text score and by doing so we don't have any errors and everything is okay so for every single frame for every single frame number we are going to be saving all the information which is related to each one of our cars and all the information for each car will be the information of that specific car where the car is located and then all the information about the license plate which we have detected in that specific car right and for the license plate we are going to save all the information we have right and we're going to save all this information only in those cases in which we have detected the license plate and every time we have successfully read the license plate number from this license plate so this object is not None we are going to be saving all these information into this dictionary only in that case only when we have detected the license plate and when we have read its license plate number right and please notice the structure I have built for this information for this dictionary because remember every time we detect a license plate it will not be floating around in space completely isolated no that will never happen every time we detect a license plate it will be on a given car and this car will be on a given frame right so this is exactly why this structure I have decided for this dictionary and once we have created all this information the only thing we need to do is to call... I'm going to import this function as well, I am going to input the name was something like write csv so let's import write csv as well and something is going on because we are not really using this import we have over here so if I scroll down I see I'm not really importing the function itself I think there we should be okay okay so now let's go back here and I'm going to call this function which is write csv and I need to input the dictionary so I'm going to input results and I'm also going to input where I want this CSV file to be saved and I'm going to save all this information into a CSV file called test.csv so what I'm going to do now is I'm going to execute this pipeline I'm going to execute this process as it is and then we are going to take a look at this file and then we are going to continue right then we are going to see if the file we are going to create it makes sense right so I'm just going to press play okay the execution is now completed and now if I go to my local directory to the directory of this pycharm project this is test csv so this is the file we have just created and if I open this file you can see that this is all the information we have saved and we have extracted from this video right remember we are processing only the first 10 frames we are still processing only the first 10 frames so this is the all the information we have extracted so far and please remember we are just Computing some dummy values from some of... from some of our functions so this is this is not really all the information this is all the information we have compute so far but other than all of these zeros over here and these zeros over here you can see everything looks pretty pretty well right, we are just producing an entire CSV file with all the information we have computed from this video we are almost there and actually we are there we are ready we have completed this pipeline we have completed this process we are almost almost there the only thing we need to do now is going back to ulil.py because we need to implement these two functions get car and read license plate and once these functions are implemented then we are going to be producing a real file right we are going to be using a file with the entire information here and here right we are going to be producing the real license plate number and the real license plate score and also the car bounding box and the car ID for absolutely every single license plate in absolutely every single frame in which we have a detection right so we are almost there I am super excited and now let's continue to the util.py file so we can Implement these functions and let's start with get car remember from the main.py pipeline we were using this function which is get_car in order to assign which car each license plate belongs to right we have many many cars and many many license plates and for each one of these license plates we want to know what's the car this license plate belongs to so this is exactly where we were using this function get car and now let's see exactly how we are going to implement the function and in order to do so I'm going to show you a few pictures this is a random frame from our video right you can see that this is a frame we have many many cars and this is only a frame from the video once we have detected all the cars we are going to have a situation like this we are going to have many many many detections because at every single frame we are going to have many many many many cars I don't know how many cars we have in this picture but they are many they are something like I don't know 20 30 50 maybe 60 cars they are many many cars so for every single frame we are going to have many many detections which are going to be our cars, we are going to have many bounding boxes for all of our cars and also at every single frame we are going to have all of our license plates but please focus please mind that we are only going to have maybe one or two or three license plates for every single frame right so we are going to have many cars but only a few license plates and the idea is to know which car this license plate belongs to and the way we are going to know that is by looking at all of these bounding Boxes by looking at all of these cars and by finding the car which contains the license plate right by finding the bounding box of the car which contains the bounding box of this license plate right that's the way we are going to find what's the car which belongs to this license plate so that's exactly the idea of what we are going to be implementing in this function now let's see exactly how we can do that the first thing I'm going to do is unwrap all the information in license plate so in order to do so I'm going to do something like this because this is exactly the same object license plate so I'm just going to do this okay then I'm going to iterate in all the cars we have over here I'm going to say for... let's say for j in a len vehicle track IDs we are going to be iterating in all the cars we have detected and remember this is the entire information this is the bounding box and this is also the car ID remember so now we are going to unwrap all the information for each one of these cars and this is going to be something like x car 1 y car 1 x car 2 y car 2 and car ID this is exactly the information which is in each one of the elements of this object vehicle truak Ids and this is vehicle track ID j okay so that's pretty much all we are iterating in absolutely all the bounding boxes of all the cars we found in this Frame we are iterating in all these bounding boxes for each one of these bounding boxes we are going to verify if it contains the license plate right that's exactly what we are going to verify and this is how we're going to do it we are going to see if X1 is greater... remember X1 is the upper left coordinate of the license plate if X1 is greater than x car 1 and Y1 is greater than y car 1 right we are verifying that this coordinate over here it's greater than this other coordinate over here we are trying to verify if we meet this condition right and then the other condition we need to meet is if this point we have over here these coordinates we have over here they are lesser than this other point we have over here right we need to meet these two conditions and this is exactly how we're going to do it if X1 greater than x car 1 and Y1 greater than y car 1 and X2 lesser than x car 2 and Y2 lesser than y car 2 then we are we are going to... we have found the bounding box this license plate belongs to we have found the car on which this license plate is located right that's what it means if we have met all of these conditions that's what it means so in this situation we are going to... I'm going to define a new variable which is foundIt and foundIt is going to be false at the beginning and then it's going to be true in this case and in this case we're also going to break the loop right and then I'm also going to Define another variable which is going to be car_index and car index will be j okay now if foundIt then return this value which is going to be... okay so if we have found the car which contains this license plate then we are going to return these values which are the bounding box of the car and also the car ID and in any other case we are just going to return this output in order to make it more clear that we have not found the car we are going to return something like this so it's going to be much more clear so that's pretty much all... that's it, we have implemented this function which is get car and now let's continue so now let's let's see if everything works well now we should have the uh the right values for all the cars we are detecting and the only thing I'm going to do is I'm going to execute this script again and let's see what happens okay I got an error and I think I know what's the problem I think we need to iterate in range len vehicle track IDs and now everything should be okay let's try again okay now it's completed and now let's see the new file we have created the new test.csv file and now you can see that we have some values for car ID and we also have some values for the car bounding box so we are moving one step at the time but we are making progress right so now let's continue with the util.py file and now let's move to the next function which is read license plate now it's time to implement this function over here and something I'm going to do first is I'm going to do an if over here and I'm going to continue with this pipeline only if car ID is different than -1 right and now let's continue and let's see how we can implement this function which is read license plate and the only thing we need to do is to call easyocr and let's see how we can read the license plate and let me show you some variables I have defined over here these variables are going to be super super important now this variable are going to be super amazingly important you're going to see exactly why and then also let me show you this reader we have here I have already defined I have already initialized this OCR reader and you can see that I'm calling easyocr and then I'm calling this method which is reader so the only thing we need to do now is calling reader dot read text and I'm going to input the license plate crop and this is going to be detections then I'm going to iterate for detection in detections because remember we could be detecting many many many many different objects many different text objects in this image so for each one of these objects we are going to unwrap these objects first and this is going to be something like bounding box text and score this is going to be the detection right each one of these detections is going to be something like the bounding box of the text we have detected then the text we have detected and then the confidence value for which we have detected this text and then we are going to convert this text to uppercase and we are going to remove all the white spaces right this is exactly how we are going to do and this will be equal to text okay and now it's the time in which we are going to use this format right remember when we were starting this tutorial I told you we were going to focus on this very specific type of license plate right we are going to work with this type of license plates each license plate is going to have seven characters the first two characters are going to be letters then two numbers and then three letters this is the format of absolutely every single license plate we are going to be working with in this tutorial so we are going to make sure every single text we detect complies with this format and in order to do so I have already created a function which is license complies format this function returns a Boolean value which is pretty much the verification of if this license plate complies with the format or not we are going to be verifying if we have seven characters and we're also going to be verifying the first two characters are letters and then the second... the third and the fourth characters are numbers and then the last three characters are letters again right this is exactly what we are doing with this function and this is a very important function we are going to use now so let me show you exactly how we are going to use this function if license complies format text then and only then we are going to return the text and the confidence score we are going to return these two values, these two variables, which are text and score right only if the text complies with the format we are asking absolutely all the license plates right only in this case we are going to be returning these values and in any other case we are going to return None right this is very very very important and this is going to make our solution way more robust and way way better and something that makes the solution even better is that we are not going to return the text on itself we are going to call another function which is format license and let me show you exactly what we are going to be doing with this function I'm going to call format license text and let me show you the... let me give you the idea, the high level idea behind this function sometimes when we are using an OCR technology when we are using a library like easyocr sometimes it's very challenging to tell some characters apart for example it's very challenging to tell a five apart from an S right so you can see that the letter S and the number five are very similar and it's very very very challenging for an OCR to tell the difference between these two characters and we are going to have exactly the same situation for the letter I and the number 1 or for the letter O and the number 0 for example right those are characters which are very very hard to differentiate, they are very hard to tell apart so this function I have over here format license the only thing it does is going through all the characters in the license plate in the text and for each one of these characters it fixes whatever issue we may have with this type of confusion right if for example we are reading this character over here and easyocr, the OCR technology we are using, it says is the letter S we know for sure it's not the letter S because we are expecting a number here so if we have detected the letter S then we convert this value to the number 5 and the same happens here if we are reading this value this character and we are getting the number 5 we know for sure for a fact that that's not the number 5 because we are expecting a letter here so we are going to convert the number 5 into the letter S that's exactly the idea the high level idea of what we are going to be doing with this function we are going to be going through absolutely all the characters in the license plate and for each one of these characters we are going to be fixing these type of issues in case we find any type of issues like this and that's pretty much all and I invite you to take a look at these two functions to format license and to license complex format and to take a much closer look and to properly understand exactly how they work right that's your homework that's your homework from this video so you properly understand how they work so now let's continue and now we are returning format license text and score if our license complies with our format and we are returning none in any other case and we are done we are completed now we have completed our process now let's see what happens now I'm going to execute this file again and let's see what happens I'm going to make a very very small change I'm only going to execute it for 10 frames but I'm going to do it like this if ret then if frame number um greater than 10 then I'm going to break the loop this is going to be much better and now let's see what happens I'm going to execute main again okay it seems I have a typo over here this is obviously not remove but this is replace I got confused because I was removing the white spaces but this is obviously not the name of the function we want to use here so now let's see what happens okay now the execution has been completed and now we have produced a new test.csv file and if I open this file you can see that we still have all the information related to the car ID and the car bounding box and now we have all the license plate numbers we have read from the frames from the license plates and also the confidence score for each one of these license plates so we made it we have completed this process now we are completed we are done so everything is ready the only thing I'm going to do now is to execute this script execute this main pipeline for the entire video so I'm just going to remove this break over here and that's pretty much all and now I'm going to press play again and then I'm going to show you how to visualize this data so everything looks like the video I showed you in the intro so let's see what happens and now let's go back to pycharm so I can show you exactly how you can create a visualization as the one I showed you when we were starting this video in order to do so this is where we are going to use these two files visualize.py and add missing data.py and you're going to find these two files in the GitHub repository of today's tutorial so you can just go ahead and use them in your project and before using these two files let me show you something first if I go back here to the test.csv file we have created let me do something I'm going to filter by car ID I'm going to show you all the data all the information we have extracted for only one of our cars I'm going to select only the car ID number three right this is only a random car ID in our data you can see that all the frame numbers we have detected for this car ID are not consecutive so this means that we have detected the number zero... the frame number zero then the number one then it jumps to the number four then it jumps to the number nine then 12 13 14 15 16 17 then 27 so we have many many missing frames right for some reason we don't have the information for this car ID for many frames which are in between these... these two for example right we don't have the information for the frame number two the frame number three or the frame number five six seven eight uh 10 11 right there are many many missing frames for this car ID so that's something that's going on and remember that we are not saving all the information because we are only saving the information for those license plates for which we have detected the car the license plate it is on right? the license plate... the car where the license plate is located and also we're only saving the information the license plates for which we have read a license plate... a license plate number which complies with our format right so we are not saving all the information, there's a lot of information which we are not saving into this CSV file remember how the OCR Technologies usually work I mean they are very very good they perform very good but in some cases they have errors they have mistakes so if in some cases they are not reading a number which complies with this format then we are not going to be saving the information for those frames so that's the reason why we have some missing frames over here that's the first thing I want you to notice then another thing which is going to be much more important is take a look what happens with the license plate numbers now we have read the license plate numbers in all of these frames and we have read a number which complies with our format so everything it's okay but you can see that we have many numbers right for example we have many many different values many different numbers if I show you the number we have detected in the first frame it's different than the one we have detected here in the frame number four right and then if I continue scrolling down you can see that we have also detected other values for example here this is different and if I continue scrolling down this is also different here we have an N we have a P so for every single car ID we are going to have many many different values for the license plate and this is a huge issue this is this is a very very important thing we need to solve because obviously every single car has only one valid value for its license plate so if we have so many values if now we have so many values for the license plate how do we make a decision how do we know what's the the real one right what's the real value the most accurate value for the license plate how do we make a decision what's our criteria that's a huge problem and this is exactly where the object tracking is involved because for every single car in the video... because we are going to be tracking the car through all the different frames in the video, for every single car we are going to have the value for the license plate we have detected in that given frame for that car so if we want to know what's the value for the license plate of a given car through all the frames in the video the only thing we need to do is to select the license plate we have read we have detected with the highest confidence score right you can see this column is the confidence score in which we have detected every single one of these license plates so the only thing we need to do is to take a look what's the license plate we have detected with the highest confidence and that's it, that's going to be your criteria to know what's the license plate number of this car and that's it that's the way we are going to solve our problem and that's exactly where the object tracking is involved and that's exactly why it's so important to track... to apply to implement an object tracking algorithm into this problem because this is how we are going to solve this problem this is going to be our criteria to select the license plate number for every single car in this video so remember we have these two problems this is how we are going to solve this problem and then we have we still have this other problem which is that we have some missing frames for every single car right this problem actually is not... it's not really a big problem and the only thing is going to affect is the visualization right because now we are going to take all this information and we are going to visualize this information so the only thing is going to happen with all these missing frames is that we are just not going to visualize the license plate and we are not going to visualize the license plate value for that given frame so let me show you what happens if we visualize if we create a video from the text file... the CSV file I just showed you we will have a visualization which looks like this which will be okay I guess but it's like um but... it's not an ideal visualization right it's like uh it's it's not really pretty it's not really good looking this doesn't really look good ideally we would like to have a visualization which is more stable for every single license plate we would like to see the license plate on a fixed position through all the different frames in which we are detecting the license plate for that car right that's exactly what we eould expect and this is not really good looking this doesn't really look good right so in order to fix this problem which again is not a huge problem and the only thing it does is to affect the visualization we are going to use one of these two scripts which is called add missing data and the only thing this script does is interpolate all of those frames in which we have not detected a license plate or in which we are not extracting the information for the license plate so the only thing we're going to do is interpolate the values for the bounding boxes for the car and the license plate in all of those frames, we are going to interpolate the values and that's it for example in the frame number 41 you can see we have the information for the frame number 40 and we have the information in the frame number 42 but we don't have the information in the frame number 41 so the only thing the add missing data.py script does is going to consider the bounding boxes for this Frame and the founding boxes for this Frame and it's going to take the average of all the different coordinates and by taking the average it's going to compute what it's the value of the found inbox in the missing frame right and it's going to compute exactly the same process in absolutely all the other missing frames so that's the way we are going to solve this problem all the missing frames remember this is only a problem of visualization this is a matter of visualization is not a huge problem and then once we have fixed that issue then we can just create the video and that's it so these two files I'm going to give you these two files in the GitHub repository of this tutorial and now let me show you how this works so the first thing you need to do is to execute add missing data and you need to change here the path to the file name you are going to interpolate right in our case its test.csv and then you need to specify what's the file name of the CSV you are going to create with the interpolated data let me show you super quickly how this file looks like I'm going to filter by car ID and I'm going to select the number three again and you can see that in this case we have computed absolutely every single frame right we are starting the number zero just as before but now we have computed absolutely every single... the values for the bounding boxes for absolutely every single frame until the number 65 which is the last frame in which we have detected this car right so this exactly the data we are creating with add missing data.py and once we have created this data this new CSV file then we go to visualize.py and then we input something like test interpolated.csv and then we specify what's the file name of the video we are going to create in this case out dot mp4 and the only thing we need to do is to execute this file and then to execute this file and then after a few minutes we are going to have a video which looks exactly like this and this is going to be all for today my name is Felipe I'm a computer vision engineer and these are exactly the type of videos and the type of tutorials I make in this channel if you enjoyed this video remember to click the like button and also remember to subscribe to my channel this is going to be all for today and see you on my next video so on today's tutorial we will be making an object detection web application we will be detecting tumors on a brain MRI image now let me show you how it works I'm going to drag and drop an image from my computer so this is the image we have uploaded and if I click here on detections you can see that we have detected two objects we have detected two tumors on this image so this is exactly the project in which we will be working today on today's tutorial we are going to make the entire web application using Python and streamlit and we're gonna detect objects using an object detector trained with detectron2 so my name is Felipe welcome to my channel and now let's get started so let's get started with this tutorial and the first thing we need to do is to create a new pycharm project you can see that this is pycharm and now let me show you how to create a new project we need to click here on new project I'm going to select the directory where I'm going to create this project which in my case is here and then I'm going to select tutorial this is the directory in which I'm going to create this pycharm project and then I'm going to create a new environment and The Interpreter will be python 3.8 everything else will be just the default values so I'm going to click on Create and that's pretty much all now the next step will be to install all the requirements we are going to use today so I'm going to create a new file which is called requirements.txt I'm going to name this file requirements.txt I'm going to press enter and then I'm going to paste all the... all the requirements all the dependencies we need to install in this project which are all of these packages we have over here so I'm just going to copy and paste these packages over here and that's pretty much all now I'm going to the terminal and I'm going to press pip install -r requirements and that's pretty much all I press enter and that's going to take care of installing all the requirements and you can see that I got an error and basically this error is because we need to install all of these dependencies first and then we need to install this final dependency... right you can see that this one is called detectron2 we need to install everything else first and then at the end we need to install detectron2 so I'm just going to comment this line and then I'm going to press pip install -r requirements again okay now all the requirements have been installed and now the only thing we need to do is to install detectron 2 so I'm going to uncomment this line and I'm going to press pip install -r requirements again and this is going to take care of installing all the requirements but as we have already installed all these packages the only one that's going to be installed now is detectron 2 so we need to wait a few minutes okay and that's pretty much all in order to install detectron2 and now we are all set all of our requirements have been installed so it's time to continue let me show you how to create a new file let's create a new python file so we're going to select file new python file and this file will be main.py so this is the file in which we are going to be coding the entire web application of today's tutorial and remember in this tutorial we are going to be detecting tumors on brain MRIs so we definitely need an object detector in order to detect these type of objects right let me show you the data I used in order to train this object detector this is a dataset I found in roboflow and I'm going to give you a link to this dataset in the GitHub repository of today's tutorial so you can just go ahead and take a look at this dataset if you want to, and this is an object detector I trained using detectron 2. and I'm not going to going to show you the details of how I trained this object detector because that's something I have already covered in one of my previous videos in one of my previous videos I showed you how to train an object detector using detectron2 and I showed you the step-by-step guide I showed you the entire process so if you are curious to know how exactly I trained this object detector I invite you take a look at the video I'm going to be posting over there and now let's continue this is the data I used in order to train this object detector and now let me show you the entire pipeline in which we are going to be working today let's get back to pycharm and let me show you exactly what are all the steps we are going to be making in this tutorial the first step will be setting up the title of the web application so this is the first step in this process then the next step is setting up the header right the third step will be creating a file widget upload file so the user can upload an image about a brain MRI so we can detect all these objects on top of this image then the next step is loading the model right loading the object detector we are going to be using to detect objects then we are going to load the image the user has uploaded then we are going to detect objects and then the last step in this process will be to visualize the objects we have detected on top of the original image and we are just going to display this visualization to the user right so these are the steps of the entire process the entire pipeline in which we are going to be working today and I'm going to show you every single step of this process so you can see these are one two three four five six seven steps in only 7 steps we will have this web application up and running so let's get started and the first step in this process is importing streamlit as st okay then in order to set up the title I'm going to call st dot title and the title will be something like brain MRI tumor detection then in order to set up the header I'm calling st dot header and this will be something like please upload an image okay then in order to create the file upload widget I'm going to call st dot file uploader and I'm going to input two parameters the first one is an empty string and then it's all the types we support in this widget and I'm going to say something like png jpg and then jpeg okay and that's pretty much all and in order to move one step at the time let's see if everything executes just fine I'm going to execute the code as it is so far so I'm going back to the terminal and I'm going to type streamlit run main.py this is going to open my browser and we are going to see exactly how our web application looks so far and everything looks just perfect so we are okay in order to continue so let's get back to pycharm and let's continue with the next step in this process which is loading the model loading the object detector we are going to be using today and remember we are going to be using an object detector which I trained using detectron 2 and remember I already showed you how to use the detectron2 in one of my previous tutorials so let's go back to my browser and let's see exactly how we can use this model which I trained with detectron2 let's go to the GitHub repository of this previous tutorial and let's see exactly how this... training this model or how using this model was all about so I'm going to this file over here which is predict.py and this is the file we used in order to load the model in order to make predictions with a model trained with detectron2 so the only thing I'm going to do in this tutorial is to copy some of the code in this file and I'm just going to paste it in the main.py file of our ocyharm project right remember that in this tutorial we are not going into the details of how to use detectron2 so I strongly recommend you to take a look at my previous video to take a look at this video over here which you are going to find in my YouTube channel so you can see exactly how this... using this model how using detectron 2 works right because we are not going into the details in this tutorial right so this is my strong recommendation for you please take a look at that previous video the only thing we're going to do now is just copy and paste some of these lines which I'm going to explain super super quickly right you can see that we are getting a configuration file then we are getting the weights for this model and we are getting the weights from this model from our local drive so we are specifying a file path, a location in our... in our local drive and the only thing we're doing is specifying the weights location then we are creating an object which is our predictor and this is exactly the model we need in order to continue with this process so this is a very quick explanation regarding this code we have over here and now let's continue now you can see that we need to make a few Imports because we are not finding these objects we have over here these functions we have so I'm going up all the way up and I'm going to say something like from detectron2 dot config import get config and that should be... should be all for this function we have over here then from detectron2 dot engine import default predictor and that should fix this issue over here and now we need to import from detectron2 import model zoo and that should be all in order to fix this issue over here I'm going to delete these comments and that's pretty much all so everything that's here is everything we need in order to load this model but obviously we need a model in order to load right because this is just the default code we had in our GitHub repository so let me show you exactly where it's my model in my local drive if I go to my file system you can see that I have this file over here which is model.pth and I have this other file which is labels.txt this is the model we need model.pth these are the weights of our model and what I'm going to do is to copy this file and I'm going to paste it in the in the directory of this pycharm project right you can see that this is the main.py in which we are currently working in this is the requirements.txt file we created a few minutes ago and this is exactly where I'm going to paste this model and I'm going to do something else which is creating a new directory which is called model and this is where I'm going to put the model and everything it's okay and remember I showed you we have another file which is all the labels we are detecting but in our case this is a very very dummy labels.txt file because we only have one category we are only detecting one class which is tumor and a very very very quick note is that remember the dataset I used in order to trained this object detector in this dataset we had two classes which were negative and positive and this is something like two different types of tumors... or that's what I think... but what I decided to do when I was training this object detector was merging these two labels these two categories into only one object and I called this only one object I called it tumor right so that's exactly why we have only one class over here given that... although the original dataset I used had two categories so that was it's a very quick note regarding the model I trained and now let's go back to my file system we are not going to use this directory anymore I go to model and this is the model we are going to be using this is the weights... the model weights we are going to be using so remember this is within another directory which is called Model I go back to pycharm and the only thing I'm going to say is something like model and then the name is something like model.pth okay and that's pretty much all in my case I'm going to run this code in my local computer which is using a CPU so this is what I need to specify if your computer or the computer where you are running this code has a GPU then the only thing you need to do is to comment this line and everything will run on your gpu but in my case I'm going to run it locally on my CPU so I'm just going to leave this line as it is, now let's continue now it's time to load the image we are going to use in order to detect all these objects so this is what I'm going to do if file I have to make another edit so we are uploading a file and we are calling the file the user has uploaded we are calling this object file so now if file so if the user has uploaded something we are going to continue and we are going to call the image we're going to call this object we're going to call it image and then image will be Image dot open file and then something like to RGB right an image is an object we are going to import from pillow right from PIL import image okay that should be all, okay now in order to move one step at the time let's go back to my browser and let's see if everything executes just fine I'm going to refresh and everything is just fine and now I'm going to select an image let's see if everything it's okay the data I'm going to use it's located over here this is train and val I'm just going to select a random image which is this one and let's see what happens we have an error because this is not called to but this is called convert if I'm not mistaken let's see now I'm going to refresh and I'm going to do the same process again I'm going to select the same image and I'm going to drop it over here and you can see that now we have another error because it's not covert but it's convert I had another typo okay now let's see what happens I'm going to refresh again let's hope everything is okay now I'm going to take the image I'm going to drop it here and let's see what happens now we have to wait a couple of seconds we may be loading the model so this may take a few seconds... and everything it's okay we are not visualizing the image so if we are not getting any error that means everything is okay so let's go back to pycharm everything it's okay so far and now it's time to detect objects we are moving super super quickly we are almost there right we have almost completed this process this Pipeline and the only thing we need to do now is to detect objects and in order to detect objects with this model which was trained with detectron2 I am going back to my browser and to this repository because let's see exactly how we can make this prediction the only thing I'm going to do is to copy and paste everything that's from here up to here we don't really need to draw the rectangle but let's just copy everything so I'm going to copy then I'm going to pycharm and I'm going to paste it here we will need to make a few edits but most of the code will remain the same right I'm just going to fit this image over here because if I go back to my GitHub repository you can see that this image is actually a numpy array right we are reading this image using opencv so the format is a numpy array and we need to input a numpy array right over here so I'm going to do something like I'm going to Define a variable which is image array and this will be numpy as array right we will need to import numpy so I'm going to say something like import numpy as np and that's pretty much all now I'm going to input image array and that should be it so this is pretty much all, we are going to be detecting all the objects we have... we are going to be returning all the objects we have detected with a confidence value greater than 50 percent and other than that everything is just fine and that's it and we don't really need to draw the rectangle so I'm just going to delete it and that's pretty much all, so we have loaded the image we have detected all the objects on top of this image using our model and now it's time to continue with the visualization now we are going to take all the detections all these objects we have detected and we are going to draw bonding boxes on top of the image the user has uploaded so this is amazing because we're moving super super quickly and let's see how we can continue with the visualization, now it's the time in which we are going to draw bounding boxes on top of our images and in order to do so we are going to use plotly, plotly is an amazing python Library which I have used many times in my projects it's an amazing Library you can do some very very crazy visualizations using plotly, some very Dynamic visualizations so this is a very amazing Library we are going to use now and something that's very important is that in my tutorials we always focus on the computer vision part of the problem and everything that's related to the visualization is not really that interesting from a computer vision perspective so what we are going to do now is just taking the code for the visualization which I have already prepared over here right this is a function which is called visualize and this is the function we are going to use in order to visualize the bounding boxes on top of our images so please pay attention please focus because otherwise you may be lost please take a look at what we are going to be doing now I'm going to the project I'm going to file new python file and I'm going to create a new python file which is called util.py then I'm going back to this file I have over here and I'm just going to copy the entire file I'm going to press Ctrl C and then I'm going to press Ctrl V over here so this is all the code we need in order to do the visualization remember the visualization is very very very interesting and very important but it may not be the most interesting thing from a computer vision perspective and that's why we are not really minding everything that's visualization everything that's related to how to visualize all these bounding boxes on top of the images we are just going to use this function and that's pretty much all, I need to do a few Imports otherwise this is not going to work I'm going to import streamlit as st and that's pretty much all if I'm not mistaken yeah now let me show you something which is related to all the code I have just copied you can see that this is the code of two different functions right one of them is called visualize and this is a function we are going to use now in a few minutes in order to visualize all the bounding boxes on top of our images and the other function is called set background and this is another function which is only going to make a very very very small and very aesthetic detail at the end of this tutorial which is changing the background of the web application right this is only a detail this is definitely not the most important thing from a computer vision perspective right this is just changing the background of the web application of the browser so this is something we are going to do at the end and this is also in the code I have just copy and pasted into this file but now let's focus on this other function which is visualize you can see this function receives two parameters one of them is image and the other one is bounding boxes and you can see that the image is the input image and then the bounding boxes are a list of all the bounding boxes in the format X1 Y1 X2 Y2 so now let's go back to main because let's see exactly how we can call this function over here the first thing I'm going to do is from... from util import visualize right now the function is imported into our main process and now let's go back here and then this is where we are going to call this function remember we need to input two parameters one of them is the image we are going to import... the image we are going to use in order to draw all the bounding boxes and we need to input the image in the pillow format and then the other variable is bounding boxes right bboxes and please please focus, please pay attention because we already have a variable which is bboxes but if we go back to the documentation you can see that this variable is a list of bounding boxes in the format X1 Y1 X2 Y2 so this is not the same as this other variable we have over here please pay attention because otherwise it may be a little confusing so this is what I'm going to do I'm going to define a new variable which is bounding boxes underscore this is going to be a list and what I'm going to do here is just appending the bounding boxes exactly as we need them to be right so this is what I'm going to do and if I go back to util.py this is exactly what we need to input okay so we have this object over here and the only thing I'm going to do is to paste this object over here and I invite you to take a look at this file... I invite you to take a look at this function visualize so you can see exactly how it works and you are going to see exactly we are using the plotly library and we are calling some functions and we are doing some stuff which is related to visualization right I invite you take a look at this function this is going to be available in the GitHub repository of today's tutorial but now let's continue and let's see exactly what happens if we refresh this website and if we upload a new image and let's see exactly what type of visualization we will be getting with this function so I'm going back to my local computer to my file system I'm going to take a random image again and I'm going to drop it over here and you can see that now this is what we get which is exactly the same image I uploaded over here this image over here but now we have these two buttons one of them is original which is... which means this is the original image we have uploaded and the other one is called detections and if I press this button you can see that we are plotting the bounding box exactly on top of the tumor of this brain right... I mean I'm not a doctor so I have no idea what I'm looking at I have the impression this is a brain and this is an MRI and based on the colors I have the feeling that this is the issue right this is a tumor so it looks like we have detected exactly what we should have detected right but this is the data I used in order to train the model right this is the training data now let's see if we have exactly the same performance with a data... with an image in our validation set right this is completely complete and absolutely unseen data for my model so let's see what happens if I just take a random image like this one I'm going back here this is the image I have just uploaded remember now we are taking completely unseen data for my model and let's see what happens if I move to the other tab to the other bottom which is detections and we are detecting successfully detecting the the bounding box the object we should be detecting in this image so everything is working just fine and in order to make it more challenging and more fun let's see if we can detect an image with two objects I know that there are a few... like this one which has two objects so I'm just going to drop this image here and let's see if we can detect both of these objects both of these issues and we can see that we can detect both of them so everything seems to be working just fine and this is pretty much all in order to set up this web application up and running you can see that we are uploading images and we are just detecting all the issues in this image and we are just plotting everything exactly as we should the only thing I'm going to do now is to use this other function we have over here which is set background right the only thing I'm going to do is to change the background of this web application so we make it a little a little nicer and this is exactly how I'm going to call this function so I'm just going to main.py I'm going to import... from util import visualize and then set background and then I'm going back to my file system and this is an image I have prepared in order to change the background it may not be the perfect background ever but I think it's going to work we are going to put this background in our web application so let's see what happens I'm going to copy and paste it over here and now I'm going back to pycharm and I'm just going to call set background and I'm going to input bg.png and let's see what happens if I refresh and you can see that now we have a much better looking background so everything looks much much better now, now let me open a new image I'm just going to select for example this image over here so we can see how the entire web application looks like with this new background we have to wait a couple of seconds and now we are getting the image with all the detections on top so this is going to be pretty much all for this tutorial this is exactly how you can create an object detection web application using Python and streamlit and this is going to be all for today if you enjoyed this video I invite you to take a look at other of my previous videos where I show you how to make an image classification web application and I'm going to be posting a link to this othr tutorial over there so remember if you enjoyed this video most likely you will enjoy that video too because it's exactly the same process and it's a very very similar web application. Congratulations. You have completed my course on object detection. My name is Felipe. I'm a computer vision engineer and this is exactly the type of videos and the type of courses I make in this channel. If you enjoyed this video, I invite you to click the like button. And I also invite you to subscribe to my channel. This is going to be all for today and see you on my next video.

Info

Channel: Computer vision engineer

Views: 16,212

Rating: undefined out of 5

Keywords:

Id: UL2cfTTqdNo

Channel Id: undefined

Length: 275min 25sec (16525 seconds)

Published: Mon Jul 10 2023