Object detection with Python FULL COURSE | Computer vision

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hey my name is Felipe and welcome to this fully comprehensive course on object detection. We will start by discussing what object detection is and how to measure the performance of an object detector. Then I'm going to show you a step by step guide on how to train your own object detector on a custom dataset. And I'm going to show you three different ways to detect objects on your images and videos: Yolov8, detectron2 and AWS Rekognition This course is ideal for beginners as well as for more advanced developers as it contains very valuable information and insights I gathered from years of experience as a computer vision engineer. By the end of this course, you will be familiar with different object detection algorithms and you will be able to create amazing projects using state of the art computer vision technologies. And now let's get started. So let's start with this lesson about what is object detection. I'm going to cover the definition and I'm also going to mention a few examples. So object detection is a computer vision technique to identify and locate objects within images and videos. And there are many technologies to perform object detection. These are only a few of all the available technologies of all the available algorithms which you can use to do object detection. For example, you can use the Python library mediapipe, which is a very popular library to do hand detection and face detection. You can also use OpenCV, which is a library available for Python and C++ You can use Yolov8, which is the most recent version of YOLO. You can use Detectron2, which is a high level framework based on Pytorch. And this is a very popular framework in order to do many different computer vision related tasks. You can also use AWS Rekognition, which is a service available through a cloud provider. And these are only a few of all the different ways to do object detection. And there are many, many, many, many, many, many, many ways. And I don't know how many ways to do object detection. These are only a few of them. And although there are many algorithms and many technologies, all of them were pretty much the same from a high level perspective. From an input output perspective, all of them receive an image as input and the output is a list of all the detected objects in that image. And the objects in that image are given by these three values. The bounding box which is the location of the object in the image. Then the confidence score, which is a value from 0 to 1. And it means how confident the object detector is regarding that detection. And then the object category or the class name, right? Because if we... if we have detected an object, we want to know what object we have detected, right? We want to know the name of that object. So pretty much all the object detectors work pretty much the same way and they are going to return something which looks like this. The bounding box is usually specified with four values and there are many different formats, many different conventions in order to specify the bonding box. And this is one of the most popular formats in order to do so which is the X and Y position of the top left corner and then the X Y position of the bottom right corner, right, with these two values with these two corners, then we have specified the bounding box and then the confidence core and then the class name. So remember, although there are many, many, many ways to do object detection, they all work pretty much the same way from an input output perspective. And this is a very specific example of how to do object detection in this image. You can see that this is the image of a cat and a dog. And this is a Python script, a very, very simple Python script which uses yolov8 in order to detect all the objects within this image. And I'm not going into the details of how this script works, but this is going to be available in the github repository of this tutorial. So if we execute this script, you can see that at the end we have... we are iterating in all the detections we have detected in this image and we are printing all the detections. And if we execute this script, we are going to print something like this, you can see that we have detected a cat. This is the bounding box where the cat is located. And this is the confidence score of the object detector regarding this detection. And then we have also detected a dog. This is the bounding box of the dog and this is how confident the object detector is regarding this detection. So this is a very simple example of how object detection works on a very specific image and this is going to be pretty much all for this lesson. So remember, object detection is a technique, a computer vision technique to identify and locate all the objects within an image or a video. And although there are many, many many many different ways to do object detection, all of them were pretty much the same from an input output perspective. Now let's move to the next lesson about object detection metrics. So let's talk about object detection metrics. We will answer the question how to measure the performance of an object detector. And you can see that we are just starting with this lesson. And we immediately got this huge warning sign which says when using object detection metrics, you are only comparing your predictions with your ground truth. This is very, very, very important and you're going to see exactly why later on this lesson. But for now, let's continue. So this is the road map we will be covering today. I have divided all the content in this lesson into two sections. The first one is about fundamentals and this is where we will discuss all the definitions of all the metrics we will be using today, all the different examples. I'm going to show you about these metrics and we will assume we are working under ideal conditions and this is very, very very important and you're going to see exactly what I mean with ideal conditions later on. And then we have the other section which is the one for the more advanced topics. And this is where we will assume real life conditions for now, let's continue. So let's start with the fundamentals, we are going to cover the most common metrics. And we will assume the data we are using to train the model is perfect. This is what I meant with ideal conditions, right? We will assume our data set is perfect, which involves we have many samples, we have a huge dataset. And in case we have many different classes in our dataset, we we assume all of our classes are equally distributed, which means we have the same amount of objects for each one of our classes. But most importantly, we will assume our dataset is perfectly annotated. So we have no issue in our dataset whatsoever, right? These are the ideal conditions we will be assuming in this section. Now let's continue. These are the most common metrics which are commonly used in object detection. So we have the loss function which is used during training during the training process. And then we have these two other metrics which are part of the evaluation process of an object detector, which are the intersection over union and the mean average precision. Now let me show you a very specific example of how this looks like on real life, right? Remember from our previous lesson I told you there were many, many, many different ways to do object detection, many different technologies, many different algorithms. Now yolov8 is only one of all the different options of doing object detection. And when training a model with yolov8, this is what we get at the end of the training process. At the end of the training process, we will have all these many plots. So we can analyze the training process itself. And we can also analyze the performance of the object detector we have just trained. And from all of these plots, you can see that six of them are related to the loss function. And I'm not going into the details on why we have so many plots for the loss function. But just keep in mind that this is such an important metric that we have all these many plots in order to analyze the performance of the model and the performance of the training process regarding the loss function. This is such an important metric that is why we have so many plots. And then the remaining four plots are related to the main average precision. And in the case of yolov8 right, in the case of training an detector with yolov8, the intersection of a union is not provided. But this is also a very important metric in object detection. Now let's continue, let's start with the loss function. This metric is related to the learning process to the training process. And there are different loss functions. There are many loss functions and they usually involve very complex mathematical expressions, very, very, very complex mathematical formulations and expressions. And the only thing I'm going to cover in this course about object detection is that regarding the loss function lower is better. So a lower value of the of the loss function means it's better. And if we go back to these plots we have over here, you can see that in all of these plots, regarding the loss function in all of them, you can see that the loss function is going down as we increase the number of epochs, right. So the the this is the only thing I want you to remember for now, the the loss function is related to the learning process. They usually involve very, very, very complex and very advanced mathematical expressions and lower is better. Now, let's continue, let's move on to the intersection over union. And this metric measures the detection accuracy. It ranges between 0 and 1. So the intersection of our union is a value between 0 and 1 and higher is better. And this is exactly how the intersection over union is computed. So we have... given two bounding boxes, right? Remember we are going to be comparing our detections with the ground truth. So we will have a bounding box for our detections and we will have a bounding box for the ground truth. Given these two bounding boxes, we will measure the area of overlapping and we will measure the area of union. And then we will just compute the intersection of union making this very simple calculation, right? Let me show you an example. So we have these two pictures of a cat, we have a cat in each one of these pictures. And you can assume these are the ground truth bounding boxes for these objects, right? You can see in each case, this is a bounding box which encloses the object perfectly. This is the ground truth. Now let's assume we are using an object detector and these are the detections we got with our object detector. And now let's assume we want to compute the intersection over union for each one of these cases. In the case of this example over here, we have a very, very, very small intersection, we have a very small overlapping. So if we apply this formula we have over here, which is the area of overlapping over the area of union, we will have a very, very, very, very low value and this value is 0.15 right. So in this case, we have a very, very small value because these two boundary boxes have a very, very, very, very small area of overlapping, of intersection. But in the other case, you can see that our prediction is very, very, very close to the ground truth, right? It's it's almost perfectly matching the ground truth. So in this case, we will have a higher value of intersection over union. And in this case is 0.95. So this is a very, very simple example for you to get like a much better idea regarding the intersection over union. Now let's continue, let's move to the mean average precision. The mean average precision is based on the precision recall curve. And the precision recall curve is based on the intersection over union and the detection confidence score. Right? Remember from our previous lesson on what is object detection, remember I mentioned that all of the different frameworks, all the different algorithms in order to do object detection, all of them have pretty much the same structure regarding the input output and the output will always involve a bounding box and also a confidence score. So the precision recall curve is based on the intersection over union and the detection confidence score. The recall measures how effectively we can find objects, right? From the precision recall curve, we have two elements, one of them is precision, the one is recall and the recall measures how effectively we can find objects and then precision measures how well we perform once we find an object please mind these two definitions. Please mind the difference between these two definitions. This is very, very, very important and this is going to be much more clear in a few minutes because I'm going to show you a few examples. But please please uh focus on each one of these two definitions of how we are defining recall and how we are defining precision And then about the mean average precision, remember that higher is better. Now let's move on. Now it's where we are going to describe an example on how to compute the mean average precision. So this is our dataset, right? let's assume that we have 10 apples in our dataset. And for each one of these apples, for each one of these objects, we have the ground truth, right? We have a bounding box which encloses the object perfectly, right? So this is our data and these are our annotations, this is our ground truth. Now let's assume we are working with an object detector and these are our detections, right? In some cases, for example, here or here or here we are getting like an OK detection. But in other cases like here or here, we are not getting a very good detection, right. So let's see how we can compute the mean average precision in this example. So this is the ground truth with the predictions on top. Now, we are visualizing both the predictions and the ground truth. And these are values which are going to be super important in order to compute the mean average precision. You can see that for each one of these objects, we have two values, the score, which is the confidence score of that prediction, right? It's the confidence score of the green bounding box and then the intersection over union which is the intersection over union between the green bounding box and the blue bounding box is the intersection of a union between our prediction and the ground truth. You can see that for each object in our dataset, we have these two values, the confidence score of the green bounding box and the intersection over union between the green bounding box and the blue bounding box. And what we will be doing now is we will be applying this very, very simple process. This is pseudo code, this is not real code, right? This looks like Python but its not really Python. This is the pseudo code of the process we will follow in order to compute the mean average precision. Please mind and please please pay attention because this is very important. You can see that we are defining a variable which is intersection over union threshold and we are defining this variable as 0.5. Then we are iterating in many different values for the confidence score threshold. And we are defining two variables. For each one of these iterations, we are defining two variables which are true positives and false positives. And we are initializing each one of these variables in zero. Then for each one of our detections for each one of our green bounding boxes, we are going to verify if the confidence score we got is greater than the confidence score threshold. we are computing in this iteration. And if it is greater than this confidence score threshold. We will take a look at the intersection of over union between the green bounding box and the blue bounding box. And if this is greater than the intersection of over union threshold, then we are going to increment the true positives variable. And in any other case, we are going to increment the false positives variable. So this is a very simple and a very straightforward process in order to compute the mean average precision. But please focus on this process, please go through this process more than once. Please be super super clear on how this works because it's very, very important to understand how this process works. So once we have computed the true positives and the false positives, right? Remember that for each one of the values in confidence score threshold, we will be computing the true positives and the false positives, we will be computing these two variables. Once we have computed these two variables, we are going to define precision and recall exactly like this, precision will be the true positives divided by the number of true positives + false positives. And in case of recall, we will be dividing the true positives for the total number of ground truth objects. And in our case, the total number of ground truth objects is 10, right? Remember we have 10 blue bounding boxes and that's exactly our ground truth. So in our case, this number will always be 10. Now, let's continue. So let's go through this situation. Let's go through this process once and again. And let's start with a confidence score threshold of 0.75 right. In order to do so, we are going to go through each one of these green bonding boxes. We we're going to go through each one of our detections and we are going to keep only the ones which are... which have a confidence score greater than 0.75 which are these three bounding boxes, right? If we go back, you can see that in this case, the confidence score is 0.7. In this case is 0.40 0.2 and so on. In all the other bounding boxes, the confidence score is lower than our threshold of 0.75. And we are going to take a look at the intersection over union. And in case the intersection over union is greater than the threshold we have defined of 0.5 then we are going to increment the true positives. And if not, if it is not greater, if it's lower than 0.5 we are going to increment the false positives. And in this case, you can see the intersection over union is 0.85. So this is greater than 0.5. So this is a true positive. This is also 0.85. So this is also greater than 0.5. So this is also a true positive. And in this case, this is also a true positive because the intersection of union is 0.8. So in this case, we have three true positives and we have zero false positives. If we go... if we move here, you can see that this is exactly what we have just mentioned. The true positives is three, the false positives is zero. So if we compute precision and recall, we get that precision is 1 and recall is 0.3 right? It's a very, very, very simple process. A very straightforward process. Please go through this example once and again until you are completely and 100% clear on what we are doing because once you get familiar with the process is very, very simple. But now let's move on to a confidence score threshold of 0.5. In this case, we are going to do exactly the same. We're going to filter all the detections with a confidence score lower than 0.5. And this is what we got. And now let's go through each one of these detections. And let's see if the intersection over union is greater or lower than 0.5. In this case, it's greater than 0.5. So this is a true positive. This is also a true positive, also true positive, this is also true positive. This is also true positive. But in this case, the intersection over union is 0.4 w hich is lower than 0.5. So this is a false positive. So we have 5 true positives and only 1 false positive. And if we compute the position and recall we get the precision is 0.83 and the recall is 0.5. Right? Let's continue. Now, let's move to confidence score threshold equal to 0.25. We are going to filter all the all the detections with a confidence score lower than 0.25. This is what we got. And let's take a look at the intersection over reunion. You can see in this case true positive true positive, true positive. In this case, it's 0.1. So this is a false positive. This is also true positive true positive and this is a false positive and this is also true positive. So we have 1 2 3 4 5 6 true positives and 2 false positives. And this is what we have over here. We have six true positives, two false positives and the precision is 0.75 and the recall is 0.6. Now let's continue. Now let's compute exactly the same values but for a confidence score threshold of zero. In this case, we are not going to filter any attention because all of them have a confidence score which is greater than zero. And in this case, you can see that this is a true positive true positive, true positive. This is a false positive. This is a true positive. This is also a false positive. And then all the other ones are true positives... except this one which is a false positive. So we have 1 2 3 4 5 6 7 true positives and only three false positives. And this is exactly what we have over here. So the precision is 0.7 and the recall is 0.7. So we have computed all these different values for precision and recall. And from here is super super easy and super straightforward to put everything together under a precision recall curve, right? We can very, very easily to take all these pairs of values of precision and recall and to put everything together on a plot which looks like this, right. And if we compute the area under the curve, we will be computing the average precision, which is a very important value we need to compute before computing the mean average precision. And please do the math yourself, and if I'm mistaking, please let me know in the comments below. But if I'm not mistaking, this is the value I have computed for this curve we have over here. And a very quick note is that as we were using an intersection over union threshold of 0.50 this is sometimes referred to as average precision at 50 right? If you search for the literature or if you search for other blogs or youtube videos and so on, if you search for other places in which they talk about the, the average precision or the mean average precision you will find that sometimes this value is referred to as average precision at 50. And we can also compute other values. For example, if we were using an intersection over union threshold of, for example, 0.90 then this will be the average precision at 90 right. This is a very quick note. But for now let's just continue. So we have computed the average precision. And from here, if we want to compute the mean precision, the only thing we need to do is a very, very, very simple calculation because in our case, we are working with only one class right, we are detecting apples and we are working with only one class which is apple. But in the most generic case, you will be computing the average precision for many many many different classes, right? So in the most generic case, the mean average precision will look something like this, right? You will have many different average position values for each one of your classes. And then in order to compute the mean average position, the only thing you need to do is to sum everything together and to take the average right, that's exactly how you can compute the mean average precision And remember in our case, we are always working with an intersection over union threshold of 0.50 as we are using only one value for the intersection over union threshold. This is exactly how the mean average position l ooks like in our case. Now, let's continue. This is your homework. This is your homework from this tutorial. Now tell me which model performs better. We have two models and for each one of these models, we have the intersection over union and we also have the mean average precision, right? The intersection over union of the model A is 0.70 and the mean average precision is 0.80. And for the model B, the intersection over union is 0.55 and the mean average precision is 0.72. Now, your homework from this video is to tell me which model performs better. So let me know in the comments below if you find the answer to this question and I will be super happy to read your answer in the comments below. And if you don't know which model performs better, then also let me know in the comments below. And I will be super happy to help you or maybe another member in our community will be super happy to help you as well. But this is going to be all for this section, for the fundamentals. And now let's move to the more advanced section. This is where we are going to work with imperfect data, right? This is where we are going to have a dataset which is going to comply with one of the following sentences. Maybe we don't have enough samples. Maybe we are working with a very small dataset. Maybe we have an unbalanced dataset, which means we have many different classes and we don't have the same amount of objects for all of our classes. Or maybe most importantly, we have errors in our annotations, right? If you have one or more than one of these issues in our dataset, this is where it's super super super important to remember we are comparing our detections against the ground truth, right? All of the metrics we have mentioned so far. The only thing we're doing is comparing the detections against the ground truth. This is where it's super super super important to remember the warning we got when we were starting this lesson. So if we are in this situation, I want you to take your performance matrix with a grain of salt which means compute everything you want to compute, compute the intersection of reunion. Please compute the mean average precision, all your losses, compute every single metric you want. But please take all of your metrics with a grain of salt. And a very good example of this situation is one of my previous tutorials where I showed you how to train a semantic segmentation model using yolov8. This previous tutorial was not really about object detection, but this was about semantic segmentation, but I think it's a very good example nevertheless. in this previous tutorial, we had a ground truth, we have a dataset which had many, many, many, many many different errors. And in this previous tutorial, we noticed that the detections we got with the model we trained, were even better than the data we used to train the model, were even better than the ground truth. So this is a very, very good example of what happens in a situation like this, right? This is a very good example of a situation in which we have many issues in our data. And we have to be super, super, super cautious in the way we interpret, in the way we read in the way we make sense of the object detection metrics now, I'm not going to show you the entire previous tutorial on semantic segmentation using yolov8. But let's just remember a few minutes from this previous tutorial, those few minutes where we noticed, we had an issue with our data and we noticed that the predictions we got with our model were even better than the data we used to train this model. Let's remember these few minutes. in order to continue with this process with  this validation is that we are going to take a look at what happens with our predictions how  is this model performing with some data with some predictions and for this we are going to take  a look what happens with all of these images right you can see that these are some batches  and these are some some of our labels some of our annotations for all of these images and then  these are some of the predictions for these images right so we are going to take a look what happens  here and for example I'm going to show you these results, the first image, and you can see  that looking at this image which again these are not our predictions but this is our data these are  our annotations these are our labels you can see that there are many many many missing annotations  for example in this image we only have one mask we only have the mask for one or four ducks  we have one two three four five dogs but only one of them is annotated we have a similar behavior  here only one of the ducks is annotated here is something similar only one of them is annotated  and the same happens for absolutely every single one of these images so there are a lot of missing  annotations in this data we are currently looking  at and if I look at the predictions now these are  the same images but these are our predictions we can see that nevertheless we had a lot of missing  annotations the predictions don't really look that bad right for example in this case we are  detecting One Two Three of the five Ducks we so we have an even better prediction that we have  over here I would say it's not a perfect detection but I would say it's very good right it's like  it's not 100% accurate but it's like very good and I would say it's definitely better than the  data we used to train this model so that's what happens with the first image and if I take a look  at the other images I can see a similar Behavior right this is the data  we used for training this algorithm and these are the predictions we got for these images and  so on right it seems It's like exactly the same behavior exactly the same situation for  this image as well so my conclusions by looking at these images by looking at these predictions  is that the model is not perfect but I would say performs very well especially considering that  the data we are using to train this model seems to be not perfect seems to have a lot a lot  of missing detections have a lot of missing elements right a lot of missing objects so  that's our conclusion that's my conclusion by looking at these results and that's  another reason for which I don't recommend you to go crazy analyzing these plots because when  you are analyzing these plots remember the only thing you're doing is that you are comparing your  data the data you are using in order to train this model with your predictions right the only thing  you're doing, you're comparing your data with your predictions with the predictions you had with  the model right so as the only thing you are doing is a comparison between these two things then  if you have many missing annotations or many missing objects or if you have many different errors  in your data in the data you're using to train the algorithm then this comparison it's a little  meaningless right it doesn't really make a lot of sense because if you're just comparing one thing  against the other but the thing you are comparing with has a lot of Errors it has a lot of  missing objects and so on maybe the comparison doesn't make any a lot of sense whatsoever right  that's why I also recommend you to not go crazy when you are analyzing these plots because they  are going to give you a lot of information but you are going to have even more information  when you are analyzing all of these results and this is a very very very good example of what  happens in real life when you are training a model in a real project because remember that building  an entire dataset, a dataset which is 100% clean and absolutely 100% perfect is very very very  expensive so this is a very good example of what happens in real life usually the data you're using  to train the model, to train the algorithm has a few errors and sometimes there are many many many  errors so this is a very good example of how this validation process looks like with data which  is very similar to the data we have in real life which in most cases is not perfect And obviously you are more than welcome to watch the entire tutorial after you complete this course. For now, let's just move to the next video where I'm going to show you how to train an object detector on your own custom data. hey my name is Felipe and welcome to my channel  in this video we are going to train an object detector using yolo V8 and I'm going to walk you  step by step through the entire process from how to collect the data you need in order to train an  object detector how to annotate the data using a computer vision annotation tool how to structure  the data into the exact format you need in order to use yolo V8, how to do the training and  I'm going to show you two different ways to do it; from your local environment and also from  a Google collab and how to test the performance ofthea model you trained so this is going to be  a super comprehensive step-by-step guide of everything you need to know in order to train  an object detector using yolo v8 on your own custom data set so let's get started so let's  start with this tutorial let's start with this process and the first thing we need to do is to  collect data the data collection is the first step in this process remember that if you want to train  an object detector or any type of machine learning model you definitely need data, the algorithm, the specific algorithm you're going to use in this case yolo V8 is very very important but the data  is as important as the algorithm if you don't have data you cannot train any machine learning model  that's very important so let me show you the data I am going to use in this process these are some  images I have downloaded and which I'm going to use in order  to train this object detector and let me show you a few of them these are some images of alpacas  this is an alpaca data set I have downloaded for today's tutorial and you can see these are all  images containing alpacas in different postures and in different situations right so this is  exactly the data I am going to use in this process but obviously you could use whatever data set you  want you could use exactly the same data set I am going to use or you can just collect the data  yourself you could just take your cell phone or your camera or whatever and you can just take the  pictures the photos the images you are going to use you can just do your own data collection  or something else you could do is to just use a a publicly available data set so let  me show you this data set this is the open image dataset version 7 and this is a dataset which is  publicly available and you can definitely use it in order to work on today's tutorial in order to  train the object detector we are going to train on todays tutorial so let me show you how it looks  like if I go to explore and I select detection uh you can see that I'm going to unselect all  these options you can see that this is a huge data set containing many many many many many  many many many categories I don't know how many but they are many this is a huge data set  it contains millions of images, hundreds of thousands if not millions of annotations thousands  of categories this is a super super huge data set and you can see that you have many many different  categories now we are looking at trumpet and you can see these are different images with trumpets  and from each one of these images we have a bounding box around the trumpet and if I show you  another one for example we also have Beetle and in this category you can see we have many different  images from many different type of beetles so this is another example or if I show you this one  which is bottle and we have many different images containing bottles for example there you can see  many different type of bottles and in all cases we have a bounding box around the bottle and I could show you I don't know how many examples because there are many many many different categories  so remember the first step in this process is the data collection this is the data I am going  to to use in this project which is a dataset of alpacas and you can use the exact same data  I am using if you want to you can use the same data set of alpacas or you can just collect your  own data set by using your cell phone your camera or something like that or you can also download  the images from a publicly available dataset for example the open images dataset version 7. if you  decide to use open images dataset version 7 let me show you another category which is alpaca this  is exactly from where I have downloaded all of the images of alpacas so if in case you decide to use  this publicly available data set I can provide you with a couple of scripts I have used in order to  download all this data in order to parse through all the different annotations and to  format this data in the exact format we need in order to work on today's tutorial so in case  you decide to use open image data set I am going to give you a couple of scripts which are going to  be super super useful for you so that's that's all I can say about the data collection remember you  need to collect data if you want to train an object detector and you have all those different ways  to do it and all these different categories and all these different options so now let's move on  to the next step and now let's continue with the data annotation you have collected a lot of images  as I have over here you have a lot of images which you have collected yourself or maybe you have  downloaded this data from a publicly available data set and now it's the time to annotate this  data set maybe you were lucky enough when you were creating the dataset and maybe this data set you  are using is already annotated maybe you already have all the bounding boxes from all of your  objects from all your categories maybe that's the case so you don't really need to annotate your  data but in any other case for example if you were using a custom data set, a dataset you have collected  yourself with your own cell phone your camera and so on something you have collected in that case  you definitely need to annotate your data so in order to make this process more comprehensive in  order to show you like the entire process let me show you as well how to annotate data so we are  going to use this tool which is CVAT this is a labeling tool I have used it many many times in  many projects I would say it's one of my favorite tools I have used pretty much absolutely all  the object detection computer vision related annotation tools I have used maybe I haven't used  them all but I have used many many of them and if you are familiar with annotation tools you would  know that there are many many of them and none of them is perfect I will say all of the different  annotation tools have their advantages and their disadvantages and for some situations you prefer  to use one of them and for other situations it's better to use another one CVAT has many advantages  and it also has a few disadvantages I'm not saying it's perfect but nevertheless this is a tool I  have used in many projects and I really really like it so let me show you how to use it you  have to go to cvat.ai and then you select try for free there are different pricing options  but if you are going to work on your own or or in a very small team you  can definitely use the free version so I have already logged in this is already logged into my  account but if you don't have an account then you will have to create a new one so you  you're going to see like a sign up page and you can just create a new account and then you can  just logged in into that account so once you are logged into this annotation tool you need to  go to projects and then create a new one I'm going to create a project which is called alpaca  detector because this is the project I am going to be working in and I'm going to add a label  which in my case is going to be only one label which is alpaca and then that's pretty much all  submit and open I have created the project it has one label which is alpaca remember if your project  has many many different labels add all the labels you need, and then I will go here which is create  a new task I am going to create a new annotation task and I'm going to call this task something  like alpaca detector annotation task zero zero one this is from the project alpaca detector and this  will take all the labels from that project now you need to upload all the images you are going to  annotate so in my case I'm obviously not going to annotate all the images because you can see these  are too many images and it doesn't make any sense to annotate all these images in this video These  are 452 images so I'm not going to annotate them all but I'm going to select a few in order to show  you how exactly this annotation tool works and how exactly you can use it in your project also in my  case as I have already as I have downloaded these images from a publicly available data set from  the open images dataset version 7 I already have the annotations I already have all the  bounding boxes so in my case I don't really need to annotate this data because I already have the  annotations but I'm going to pretend I don't so I can just label a few images and I can show you  how it works so now I go back here and I'm just going to select something like this many images  right yeah I'm just going to select this many images I'm going to open these images and then  I'm going to click on submit and open right so this is going to create this task and at the same  time it's going to open this task so we can start working on our annotation process okay so this is  the task I have just created I'm going to click here in job number and this and the job number  and this will open all the images and now I'm going to start annotating all these images so we  are working on an object detection problem so we are going to annotate bounding boxes we need to  go here and for example if we will be detecting many different categories we would select what  is the category we are going to label now and and that's it in my case I'm going to label always the same  category which is alpaca so I don't really need to do anything here so I'm going to select shape  and let me show you how I do it I'm going to click in the upper left corner and then in the  bottom right corner so the idea is to enclose the object and only the object right the idea is to  draw a bonding box around the object you only want to enclose this object  and you can see that we have other animals in the back right we have other alpacas so I'm just going  to label them too and there is a shortcut which is pressing the letter N and you can just create  a new bounding box so that's another one this is another one this is another alpaca and this is  the last one okay that's pretty much all so once you're ready you can just press Ctrl s that's  going to save the annotations I recommend you to press Ctrl S as often as possible because it's  always a good practice so now everything is saved I can just continue to the next image now we are  going to annotate this alpaca and I'm going to do exactly the same process I can start here obviously  you can just start in whatever corner you want and I'm going to do something like this okay  this image is completely annotated I'm going to continue to the next image in this case I am going  to annotate this alpaca too. this is not a real alpaca but I want my object detector to be able  to detect these type of objects too so I'm going to annotate it as well this is going to be a very  good exercise because if you want to work as a machine learning engineer or as a computer  visual engineer annotating data is something you have to do very often, actually training  machine learning models is something you have to do very often so usually the data annotation is  done by other people, right, it is done by annotator s there are different  services you can hire in order to annotate data but in whatever case whatever service you use  it's always a very good practice to annotate some of the images yourself right because if  you annotate some of the images yourself you are going to be more familiar with the data  and you're also going to be more familiar on how to instruct the annotators on how to annotate this  particular data for example in this case it's not really challenging we just have to annotate these  two objects but let me show you there will be other cases because there will be always situations  which are a little confusing in this case it's not confusing either I have just to I have to label  that object but for example a few images ago when we were annotating this image if an annotator  is working on this image that person is going to ask you what do I do here should I annotate  this image or not right if an annotator is working on this image and the instructions you provide  are not clear enough the person is going to ask you hey what do I do here should I annotate  this image or not is this an alpaca or not so for example that situation, another situation will be  what happened here which we had many different alpacas in the background and some of them for  example this one is a little occluded so there could be an annotator someone who ask you hey do  you want me to annotate absolutely every single alpaca or maybe I can just draw a huge bonding box  here in the background and just say everything in the background is an alpaca it's something that  when an annotator is working on the images they are going to have many many different questions  regarding how to annotate the data and they are all perfect questions and very good questions  because this is exactly what's about I mean when you are annotating data you are defining exactly  what are the objects you are going to detect right so um what I'm going is that if you annotate some  of the images yourself you are going to be more familiar on what are all the different situations  and what exactly is going on with your data so you are more clear in exactly what are the objects  you want to detect right so let's continue this is only to show a few examples this is another  situation in my case I want to say that both of them are alpacas so I'm just going to say  something like this but there could be another person who says no this is only one annotation  is something like this right I'm just going to draw one bonding box enclosing both of them  something that and it will be a good criteria I mean it will be a criteria which I guess it would  be fine but uh whatever your criteria would be you need one right you need a criteria so while you  are annotating some of the images is that you are going to further understand what exactly is  an alpaca what exactly is the object you want to consider as alpaca so I'm just going to continue  this is another case which may not be clear but I'm just going to say this is an alpaca this  black one which we can only see this part and we don't really see the head but I'm going to  say it's an alpaca anyway this one too this one too this one too also this  is something that always happens to me when I am working when I am annotating images that I am more  aware of all the diversity of all these images for example this is a perfect perfect example because  we have an alpaca which is being reflected on a mirror and it's only like a very small  section of the alpaca it's only like a very small uh piece of the alpacas face so what  do we do here I am going to annotate this one too because yeah that's my criteria but another person  could say no this is not the object I want to detect this is only the object I want to detect and maybe  another person would say no this is not an alpaca alpacas don't really apply makeup on them this is  not real so I'm not going to annotate this image you get the idea right there could be many different  situations and the only way you get familiar with all the different type of situations  is if you annotate some of the images yourself so now let's continue in my case I'm going  to do something like this because yeah I would say the most important  object is this one and then other ones are like... yeah it's not really that important if we detect  them or not okay so let's continue this is very similar to another image I don't know how many I have  selected but I think we have only a few left I don't know if this type of animals are natural... I'm very surprised about this like the head right it's like it has a lot of  hair over here and then it's completely hairless the entire body I mean I don't know I'm  surprised maybe they are made like that or maybe it's like a natural alpaca who cares who cares...  let's continue so we have let's see how many we have only a few left so let's continue uh let's  see if we find any other strange situation which we have to Define if that's an alpaca or not so  I can show you an additional example also when you are annotating you could Define your bounding box  in many many different ways for example in this case we could Define it like this we could Define  it like this I mean we could Define it super super fit to the object something like this super super  fit and we could enclose exactly the object or we could be a little more relaxed right for example  something like this would be okay too and if we want to do it like this it will be okay too right you  don't have to be super super super accurate you could be like a little more relaxed and it's  going to work anyway uh now in this last one and that's pretty much all  and this is the last one okay I'm going to do something like this now I'm  going to take this I think this is also alpaca but anyway I'm just going to annotate this part  so that's pretty much all, I'm going to save and those are the few images I have selected in order  to show you how to use this annotation tool so that's pretty much all for the data annotation and  remember this is also a very important step this is a very important task in this process because  if we want to train an object detector we need data and we need annotated data so this is a very  very important part in this process remember this tools cvat this is only one of the many many  many available image annotation tools, you can definitely use another one if you want it's  perfectly fine it's not like you have to use this one, at all, you can use whatever annotation tool  you want but this is a tool I think it's very easy to use I like the fact it's very easy to use it's  also a web application so you don't really need to download anything to your computer you can  just go ahead and use it from the web that's also one of its advantages so yeah so this is a  tool I showed you in this video how to use in order to train this object detector so this is going  to be all for this step and now let's continue with the next part in this process and now that  we have collected and annotated all of our data now it comes the time to format this data to  structure this data into the format we need in order to train an object detector using yolo V8  when you're working in machine learning and you're training a machine learning model every single  algorithm you work with it's going to have its own requirements on how to input the data that's going  to happen with absolutely every single algorithm you will work with it's going to happen with yolo  with all the different YOLO versions and it's going to happen with absolutely every single  algorithm you are working with so especially yolov8 needs the data in a very specific format so  I created this step in this process so we can just take all the data we have generated all the  images and all the annotations and we can convert all these images into the format we need in order  to input this data into yolo V8 so let me show you exactly how we are going to do that if you  have annotated data using cvat you have to go to tasks and then you have to select this option and  it's export task data set it's going to ask you the export format so you can export this data into  many different formats and you're going to choose you're going to scroll all the way down and you're  going to choose YOLO 1.1 right then you can also save the images but in this case it's not really  needed we don't really need the images we already have the images and you're just going to click ok  now if you wait a few seconds or a few minutes if you have a very large data set you are going to  download a file like this and if I open this file you are going to see all these different files  right you can see we have four different files so actually three files and a directory and if I open  the directory this is what you are going to see which is many many different file names and if I  go back to the images directory you will see that all these images file names they all look pretty  much the same right you can see that the file name the structure for this file name looks pretty  much the same as the one with as the ones we have just downloaded from cvat so basically the way  it works is that when you are downloading this data into this format into the YOLO format every  single annotation file is going to be downloaded with the same name as the image you have annotated  but with a different extension so if you have an image which was called something.jpg then The  annotation file for that specific image will be something.txt right so that's the way it works  and if I open this image you are going to see something like this you're going to see in this  case only one row but let me show you another one which contains more than one annotation I  remember there were many for example this one which contains two different rows and each one of  these rows is a different object in my case as I only have alpacas in this data set each one of  these rows is a different alpaca and this is how you can make sense of this information the first  character is the class, the class you are detecting I wanted to enlarge the entire file and  I don't know what I'm doing there okay okay the first number is the class you are  detecting in in my case I only have one so it's only a zero because it's my only class and  then these four numbers which Define the bounding box right this is encoded in the YOLO format which  means that the first two numbers are the position of the center of the bounding box then you have  the width of your bounding box and then the height of your bounding box, you will notice  these are all float numbers and this basically means that it's relative to the entire size of  the image so these are the annotations we have downloaded and this is in the exact same format  we need in order to train this object detector so remember when I was downloading these  annotations we noticed there were many many many different options all of these different options  are different formats in which we could save the annotations and this is very important because you  definitely need to download YOLO because we are going to work with yolo and everything it's pretty  much ready as we need it in order to input into yolo V8 right if you select YOLO that's exactly  the same format you need in order to continue with the next steps and if you have your data into  a different format maybe if you have already collected and annotate your data and you have your  data in whatever other format please remember you will need to convert these images or actually to  convert these annotations into the YOLO format now this is one of the things we need for  the data this is one of the things we need in order to we need to format in order to structure  the data in a way we can use this object detector with yolo V8 but another thing we should do is  to create very specific directories containing this data right we are going to need two directories  one of them should be called images and the other one should be called labels you definitely need  to input these names you cannot choose whatever name you want you need to choose these two names  right the images should be located in an directory called images and the labels should be located in  a directory called labels that's the way yolo V8 works so you need to create these two directories  within your image directory is where you are going to have your images if I click here you can  see that these are all my images they are all within the images directory they are all within  the train directory which is within the images directory this directry is not absolutely needed  right you could perfectly take all your images all these images and you could just paste all your  images here right in the images directory and everything will be just fine but if you want you  could do something exactly as I did over here and you could have an additional directory which is  in between images and your images and you can call this whatever way you want this  is a very good strategy in case you want to have for example a train directory containing all the  training images and then another directory which could be called validation for example and this  is where you are going to have many images in order to validate your process your training  process your algorithm and you could do the same with an additional directory which could be  called test for example or you can just use these directories in order to label the data right  to create different versions of your data which is another thing which is very commonly done so you  could create many directories for many different purposes and that will be perfectly fine but you  could also just paste all the images here and that's also perfectly fine and you can see that  for the labels directory I did exactly the same we have a directory which is called train and within  this directory is that we have all these different files and for each one of these files let me  show you like this it's going to be much better for each one of these files for each one of  these txt files we will have an image in the images directory which is called exactly the  same exactly the same file name but a different extension right so in this case this one is called  .txt and this one is called .jpg but you can see that it's exactly exactly the same file name  for example the first image is called oa2ea8f and so on and that's exactly the same name as  for the first image in the images directory which is called oa2ea8f and so on so basically for  absolutely every image in your images directory you need to have an annotations file and a file in  the labels directory which is called exactly the same exactly the same but with a different extension  if your images are .jpg your annotations files are .txt so that's another thing which also  defines the structure you'll need for your data and that's pretty much all so remember you need  to have two directories one of them is called images, the other one is called labels within the images  directories is where you're going to have all your images and within your labels directories is where  you will have all your annotations, all your labels and for absolutely every single image in your  images directory you will need to have a file in the labels directory which is called exactly  the same but with a different extension if your images are .jpg your annotation files should  be .txt and the labels should be expressed in the yolo format which is as many rows as  objects in that image and every single one of these rows should have the same structure you  are going to have five terms the first one of them is the class ID in my case I only have one class  ID I'm only detecting alpacas so in my case this number will always be zero but if you're detecting  more than one class then you will have different numbers then you have the position the X and Y  position of the center of the bounding box and then you will have the width and then you will  have the height and everything will be expressed in relative coordinates so basically this is  the structure you need for your data and this is what this step is about so that's  pretty much all about converting the data or about formatting the data and now let's move on to the  training now it's where we are going to take all this data and we are going to train our object  detector using yolo V8 so now that we have taken the data into the format we need in order to  train yolo v8 now comes the time for the training now it comes the time where we are going to take  this custom data set and we are going to train an object detector using yolo V8 so this is yolo  V8 official repository one of the things I like the most about YOLO V8 is that in order  to train an object detector we can do it either with python with only a few python  instructions or we can also use a command line utility let me see if I find it over here we can  also execute a command like this in our terminal something that looks like this and that's pretty  much all we need to do in order to train this object detector that's something I really really  liked that's something I'm definitely going to use in our projects from now on because I think  it's a very very convenient and a very easy way to train an object detector or a machine learning  model so this is the first thing we should notice about yolo V8 there are two different ways  in which we can train an object detector we can either do it in python as we usually do or  we can run a command in our terminal I'm going to show you both ways so you're familiar with both  ways and also I mentioned that I am going to show you the entire process on a local environment in a  python project and I'm also going to show you this process in a google colab so I I know there are  people who prefer to work in a local environment I am one of those people and I know that there are  other people who prefer to work on a Google colab so depending on in which group are you I  am going to show you both ways to do it so you can just choose the one you like the most so let's  start with it and now let's go to pycharm this is a pycharm project I created for this training and  this is the file we are going to edit in order to train the object detector so the first thing I'm  going to do is to just copy a few lines I'm just going to copy everything and I'm going to remove  everything we don't need copy and paste so we want to build a new model from scratch so we are going  to keep this sentence and then we are going to train a model so we are just going to remove  everything but the first sentence and that's all right these are the two lines we need in order to  train an object detector using yolo V8 now we are going to do some adjustments, obviously the  first thing we need to do is to import ultralytics which is a library we need to use in  order to import yolo, in order to train a yolo V8 model and this is a python Library we need to  install as we usually do we go to our terminal and we do something like pip install and the library  name in my case nothing is going to happen because I have already installed this library but please  remember to install it and also please mind that when you are installing this Library this library  has many many dependencies so you are going to install many many many many different python  packages so it's going to take a lot of space so definitely please be ready for that because you  need a lot of available space in order to install this library and it's also going to take  some time because you are installing many many many different packages but anyway let's continue  please remember to install this library and these are the two sentences we need in order to run  this training from a python script so this sentence we're just going to leave it as  it is this is where we are loading the specific yolo V8 architecture the specific yolo V8  model we are going to use you can see that we can choose from any of all of these different  models these are different versions or these are different sizes for yolo V8 you can see we have  Nano small medium large or extra large we are using the Nano version which is the smallest one  or is the lightest one, so this is the one we are going to use, the yolo V8 Nano, the yolo V8 n then about  the training about this other sentence we need to edit this file right we need a yaml file which  is going to contain all the configuration for our training so I have created this file and I have  named this file config.yaml I'm not sure if this is the most appropriate name but anyway this is  the name I have chosen for this file so what I'm going to do is just edit this parameter and I'm  going to input config.yaml this is where the config.yaml is located this is where the main.pi  is located, they are in the same directory so if I do this it's going to work just fine and then let  me show you the structure for this config.yaml you can see that this is a very very very simple  configuration file we only have a few Keys which are PATH train val and then names right let's  start with the names let's start with this this is where you are going to set all your different  classes right you are training an object detector you are detecting many different categories many  different classes and this is where you are going to input is where you're going to type all of  those different classes in my case I'm just detecting alpacas that's the only class  I am detecting so I only have one class, is the number zero and it's called alpaca but if you are  detecting additional objects please remember to include all the list of all the objects you are  detecting, then about these three parameters these three arguments the path is the absolute path to  your directory containing images and annotations and please remember to include the absolute path.  I ran some issues when I was trying to specify a relative path relative from this directory from  my current directory where this project is created to the directory where my data is located when  I was using a relative path I had some issues and then I noticed that there were other people  having issues as well I noticed that in the GitHub repository from YOLO V8 I noticed this is in the  the issues section there were other people having issues when they were specifying a relative path  so the way I fixed it and it's a very good way to fix it it's a very easy way to fix it it's just  specifying an absolute path remember this should be an absolute path so this is the path to this  directory to the directory contain the images and the labels directories so this is this is the  path you need to specify here and then you have to specify the relative path from this location to  where your images are located like the specific images are located right in my case they are in  images/train relative to this path if I show you this location which is my root directory then if  I go to images/train this is where my images are located right so that's exactly what I need to  specify and then you can see that this is the train data this is the data the algorithm is going  to use as training data and then we have another keyword which is val right the validation dataset  in this case we are going to specify the same data as we used for training and the reason  I'm doing this is because we want to keep things simple in this tutorial I'm just going to show  you the entire process of how to train an object detector using yolo V8 on a custom data set  I want to keep things simple so I'm just going to use the same data so that's pretty much all  for this configuration file now going back to main that's pretty much all we need in order to  train an object detector using yolo V8 from python that's how simple it is so now I'm  going to execute this file I'm going to change the number of epochs I'm going to do this for only  one Epoch because the only thing I'm going to show you for now is how it is executed, I'm going to  show you the entire process and once we notice how everything is working once we know everything is up and running everything is working fine we can just continue but let's just  do this process let's just do this training for only one Epoch so we can continue you can see that  now it's loading the data it has already loaded the data you can make use of all the different  information of this debugging information we can see here you can see now  we were loading 452 images and we were able to load all the images right 452 from 452 and if  I scroll down you can see that we have additional information additional values which are related  to the training process this is how the training process is going right we are training this object  detector and this additional information which we are given through this process so for now the  only thing we have to do is only waiting we have to wait until this process is completed so  I am going to stop this video now and I'm going to fast forward this video until the end of this  training and let's see what happens okay so the training is now completed and you can see that  we have an output which says results saved to runs/detect/train39 so if I go to that directory  runs/detect and train39 you can see that we have many many different files and these files are related to how the training process was done right for example if I show you these  images these are a few batches of images which were used in order to train this algorithm  you can see the name is train batch0 and train batch1 I think we have a train batch2 so we have a lot of different images of a lot of different alpacas of different images we used  for training and they were all put together they were all concatenated into these huge images so  we can see exactly the images which were used for training and The annotation on top of them right  the bonding boxes on top of them and we also have similar images but for the validation dataset right remember in this case we are using the same data as validation as we use for training so it's  exactly the same data it's not different data but these were the labels in the validation data set  which is the training data set and these were the predictions on the same images right you can see  that we are not detecting anything we don't have absolutely any prediction we don't have absolutely  any bounding box this is because we are doing a very shallow training we are doing a very dummy  training we are training this algorithm only for one epoch this was only an example to show you the output  how it looks like to show you the entire process but it is not a real training but nevertheless  these are some files I'm going to show you better when we are in the next step  for now let me show you how the training is done from the command line from the terminal using the  command I showed you over here using a command like this and also let me show you how this training  is done on a Google colab so going to the terminal if we type something like this yolo detect train  data I have to specify the configuration file which is config.yaml and then model yolov8n.yaml   and then the number of epochs this it's exactly the same as we did here exactly the  same is going to produce exactly the same output I'm just going to change the number of epochs for  one so we make it exactly the same and let's see what happens you can see that it we have exactly  the same output we have loaded all the images and now we are starting a new training process and  after this training process we are going to have a new run which we have already created the new  directory which is train40 and this is where we are going to save all the information related  to this training process so I'm not going to do it because it's going to be exactly the same as  as the one we did before but this is exactly how you should use the command line or how you  can use this utility in order to do this training from the terminal you can see how simple it is  it's amazing how simple it is it's just amazing and now let me show you how everything is done  from a Google colab so now let's go back to the browser so I can show you this notebook I created  in order to train yolo V8 from a Google colab if you're not familiar with Google collab the way  you can create a new notebook is going to Google Drive you can click new more and you select  the option Google collaboratory this is going to create a new google colab notebook and you  can just use that notebook to train this object detector now let me show you this notebook and  you can see that it contains only one two three four five cells this is how simple this will  be the first thing you need to do is to upload the data you are going to use in order to train  this detector it's going to be exactly the same data as we used before so these are exactly  the same directories the images directory and the label directory we used before and then  the first thing we need to do is to execute this cell which mounts Google Drive into  this instance of google collab so the only thing I'm doing is just I just pressed  enter into this cell and this may take some time but it's basically the only thing it does is  to connect to Google Drive so we can just access the data we have in Google Drive so I'm going to  select my account and then allow and that's pretty much all then it all comes to where you have the  data in your Google drive right in the specific directory where you have uploaded the data in  my case my data is located in this path right this is my home in Google Drive and then this  is the relative path to the location of where I have the data and where I have all the files  related to this project so remember to specify this root directory as the directory where you have  uploaded your data and that's pretty much all and then I'm just going to execute this cell  so I save this variable I'm going to execute this other cell which is pip install ultralytics the  same command I ran from the terminal in my local environment now I'm going to run it in Google  collab remember you have to start this command by the exclamation mark which means you are running  a command in the terminal where this process is being executed or where this notebook is being  launched so remember to include the exclamation mark everything seems to be okay everything  seems to be ready and now we can continue to the next cell which is this one you can see that  we have done exactly the same structure we have input exactly the same lines as in our  local environment if I show you this again you can see we have imported ultralytics then we have  defined this yolo object and then we have called model.train and this is exactly the same as we are  doing here obviously we are going to need another yaml file we are going to need a yaml file in our  Google Drive and this is the file I have specified which it's like exactly the same  configuration as in the um as in the in the yaml file I showed you in my local environment is  exactly the same idea so this is exactly what you should do now you should specify an absolute  path to your Google Drive directory that's the only difference so that's the only difference  and I see I have a very small mistake because I see I have data here and here I have just  uploaded images and labels in the directory but they are not within another rectory which  is called Data so let me do something I'm going to create a new directory which is called Data  images labels I'm just going to put everything here right so everything is consistent so now  everything is okay images then train and then the images are within this directory so everything  is okay now let's go back to the Google collab every time you make an edit or every time you do  something on Google Drive it's always a good idea to restart your runtime so that's what I'm going  to do I'm going to execute the commands again I don't really need to pip install this Library  again because it's already installed into this environment and then I'm going to execute this  file I think I have to do an additional edit which is uh this file now it's called google_colab_config.yaml and that's pretty much all I'm just going to run it for one Epoch so everything is exactly  the same as we did in our local environment and now let's see what happens so you can see that  we are doing exactly the same process everything looks pretty much the same as it did before we  are loading the data we are just loading the models everything it's going fine and  this is going to be pretty much the same process as before you can see that now it takes  some additional time to load the data because now you have... you are running this environment you're  running this notebook in a given environment and you're taking the data from your Google Drive so  it takes some time it's it's a slower process but it's definitely the same idea so the only thing we  need to do now is just to wait until all this uh process to be completed and that's pretty much all  I think it doesn't really make any sense to wait because it's like it's going to be exactly the  same process we did from our local environment at the end of this execution we are going to have  all the results in a given directory which is the directory of the notebook which is running this  process so at the end of this process please remember to execute this command which is going  to take all the files you have defined in this runs directory which contains all the runs you  have made all the results you have produced and it's going to take all this directory  into the directory you have chosen for your files and your data and your google collab and so on  please remember to do this because otherwise you would not be able to access this data and  this data which contains all the results and everything you have just trained so this is how  you can train an object detector using yolo V8 in a Google collab and you can  see that the process is very straightforward and it's pretty much exactly the same process exactly  the same idea as we did you in our local environment and that's it so that's how easy it is to train  an object detector using yolo Y8 once you have done everything we did with the data right once  you have collected the data you have annotated data you have taken everything into the format  yolo V8 needs in order to train an object detector once everything is completed then  running this process running this training is super straightforward so that's going to be  all about this training process and now let's continue with the testing now let's see how these  models we have trained how they performed right let's move to the next step and this is the last  step in this process this is where we are going to take the model we produced in the training  step and we're going to test how it performs this is the last step in this process this is how  we are going to complete this training of an object detector using yolo v8, so once we have trained  a model we go to the uh to this directory remember to the directory I showed you before regarding... the  directory where all the information was saved where all the information regarding this training  process was saved and obviously I I'm not going to show you the training we just did because it was  like a very shallow training like a very dummy training but instead I'm going to show you the  results from another training I did when I Was preparing this video where I conducted exactly the  same process but the training process was done for 100 epochs so it was like a more deeper training  right so let me show you all the files we have produced so you know what are all the different  tools you have in order to test the performance of the model you have trained so basically you have  a confusion Matrix which is going to give you a lot of information regarding how the different  classes are predicted or how all the different classes are confused right if you are familiar  with how a confusion Matrix looks like or it should look like then you will know how to read  this information basically this is going to give you information regarding how all the different  classes were confused in my case I only have one class which is alpaca but you can see that  this generates another category which is like uh the default category which is background and we  have some information here it doesn't really say much it says how these classes are confused but  given that this is an object detector I think the most valuable information it's in other metrics in  other outputs so we are not really going to mind this confusion Matrix then you have some plots  some curves for example this is the F1 confidence curve we are not going to mind this plot either  remember we are just starting to train an object detector using yolo V8 the idea for this  tutorial is to make it like a very introductory training a very introductory process so we are not going to  mind in all these different uh plots we have over here because it involves a lot of knowledge and  a lot of expertise to extract all the information from these plots and it's not really the idea for  this tutorial let's do things differently let's focus on this plot which is also available in  the results which were saved into this directory and you can see that we have many many many  different plots you can definitely go crazy analyzing all the information you have here  because you have one two three four five ten different plots you could knock yourself out  analyzing and just extracting all the information from all these different plots but again the idea  is to make it a very introductory video and a very introductory tutorial so long story short I'm  just going to give you one tip of something the one thing you should focus on these plots for now  if you're going to take something from this video from how to test the performance of a model  you have just trained using yolo v8 to train an object detector is this make sure your loss is going  down right you have many plots some of them are related to the loss function which are this one this  one and this one this is for the training set and these are related to the validation set make  sure all of your losses are going down right this is like a very I would say a very simple way to  analyze these functions or to analyze these plots but that's... I will say that that's more powerful  that it would appear make sure all your losses are going down because given the loss function we  could have many different situations we could have a loss function which is going down which  I would say it's a very good situation we could have a loss function which started to go down and  then just it looks something like a flat line and if we are in something that looks like a flat line  it means that our training process has stuck so it could be a good thing because maybe the the  algorithm the machine learning model really learned everything he had to learn about this  data so maybe a flat line is not really a bad thing maybe I don't know you you would have to  analyze other stuff or if you look at your loss function you could also have a situation  where your loss function is going up right that's the other situation and if you my friend  have a loss function which is going up then you have a huge problem then something is obviously  not right with your training and that's why I'm saying that analyzing your loss function what  happens with your loss is going to give you a lot of information ideally it should go down if  it's going down then everything is going well most likely, if its something like a flatline  well it could be a good thing or a bad thing I don't know we could be in different situations  but if it's going up you have done something super super wrong I don't know what's going on  in your code I don't know what's going on in your training process but something is obviously  wrong right so that's like a very simple and a very naive way to analyze all this information  but trust me that's going to give you a lot a lot of information you know or to start working  on this testing the performance of this model but I would say that looking at the plots and analyzing  all this information and so on I would say that's more about research, that's what people  who do research like to do and I'm more like a freelancer I don't really do research so  I'm going to show you another way to analyze this performance, the model we have just  trained which from my perspective it's a more... it makes more sense to analyze it like this and it  involves to see how it performs with real data right how it performs with data you have  used in order to make your inferences and to see what happens so the first step in this more  practical more visual evaluation of this model of how this model performs is looking at these images  and remember that before when we looked at these images we had this one which was regarding the  labels in the validation set and then this other one which were the predictions were completely  empty now you can see that the the predictions we have produced they are not completely empty  and we are detecting the position of our alpacas super super accurately we have some mistakes  actually for example here we are detecting a person as an alpaca here we are detecting also  a person as an alpaca and we have some missdetections for example this should be in alpaca and it's not  being detected so we have some missdetections but you can see that the the results are pretty much okay  right everything looks pretty much okay the same about here if we go here we are detecting pretty  much everything we have a Missdetection here we have an error over here because we are detecting  an alpaca where there is actually nothing so things are not perfect but everything seems to be pretty much  okay that's the first way in which we are going to analyze the performance of this model which is  a lot because this is like a very visual way to see how it performs we are not looking at plots we  are not looking at metrics right we are looking at real examples and to see how this model performs  on real data maybe I am biased to analyze things like this because I'm a freelancer and the way it  usually works when you are a freelancer is that if you are building this model to deliver this  project for a client and you tell your client oh yeah the model was perfect take a look at all  these plots take a look at all these metrics everything was just amazing and then your client  tests the model and it doesn't work the client will not care about all the pretty plots and so  on right so that's why I don't really mind a lot about these plots maybe I am biased because I am a  freelancer and that's how freelancing works but I prefer to do like a more visual evaluation  so that's the first step we will do and we can notice already we are having a better  performance we are having an okay performance but this data we are currently looking at right  now remember the validation data it was pretty much the same data we use as training so this  doesn't really say much I'm going to show you how it performs on data which the algorithm have  never seen with completely and absolutely unseen data and this is a very good practice if you  want to test the performance of a model, so I have prepared a few videos so let me show you these  videos they are basically... remember this is completely unseen data and this is the first video  you can see that this is an alpaca which is just being an alpaca which is just walking around  it's doing its alpaca stuff it's having an alpaca everyday life it's just being an alpaca  right it's walking around from one place to the other doing uh doing nothing no it's doing  its alpaca stuff which is a lot this is one of the videos I have prepared this is another video  which is also an alpaca doing alpaca related stuff um so this is another video we are going to  see remember this is completely unseen data and I also have another video over here so I'm  going to show you how the model performs on these three videos I have made a script in Python  which loads these videos and just calls the predict method from yolo v8, we  are loading the model we have trained and we are applying all the predictions to this model and  we are seeing how it performs on these videos so this is the first video I showed you and these  are the detections we are getting you can see we are getting an absolutely perfect detection  remember this is completely unseen data and we are getting I'm not going to say 100 perfect detection  because we're not but I would say it's pretty good I will say it's pretty pretty good in order to  start working on this training process uh yeah I would say it's pretty good so this is one of  the examples then let me show you another example which is this one and this is the other video  I showed you and you can see that we are also detecting exactly the position of the alpaca  in some cases the text is going outside of the frame because we don't really have space but  everything seems to be okay in this video too so we are taking exactly the position of this uh  alpaca the bonding box in some cases is not really fit to the alpaca face but yeah but everything  seems to be working fine and then the other video I showed you you can see in this case the detection  is a little broken we have many missdetections but now everything is much better and yeah in  this case it's working better too it's working well I would say in these three examples this one  it's the one that's performing better and then the other one I really like how it performed too in  this case where the alpaca was like starting its alpaca Journey... we have like a very  good detection and a very stable detection then it like breaks a little but nevertheless I would say  it's okay it's also detecting this alpaca over here so uh I will say it's working pretty much  okay so this is pretty much how we are going to do the testing in this phase remember that if you  want to test the performance of the model you have just trained using yellow V8 you will have a lot  of information in this directory which is created when you are yolo the model at the end of your  training process you will have all of these files and you will have a lot of information to knock  yourself out to go crazy analyzing all these different plots and so on or you can just keep it  simple and just take a look at what happened with the training loss and the validation  loss and so on all the loss functions make sure they are going down that's the very least thing  you need to make sure of and then you can just see how it performs with a few images or with  a few videos, take a look how it performs with unseen data and you can make decisions from  there maybe you can just use the model as it is or you can just decide to train it again in this  case if I analyze all this information I see that the loss functions are going down and not  only they are going down but I notice that there is a lot of space to to improve this training, to  improve the performance because we haven't reached that moment where everything just appears to be  stuck right like that a flat line we are very far away from there so that's something I would do  I would do a new deeper training so we can just continue learning about this process also I would  change the validation data for something that's completely different from the training  data so we have even more information and that's pretty much what I would do in order to iterate in  order to make a better model and a more powerful model now let's get started with this tutorial this is detectron 2 official repository and this is exactly the   framework we are going to use today I have used  detectron 2 many many many times in my projects   as a computer vision engineer I think it's an  amazing framework, an amazing algorithm, and in   this video I'm going to show you how to train  an object detector using detectron 2. now the   first thing I'm going to do is to show you the  date we are going to use today now we're going   to use the same alpaca dataset we already used  in one of my previous tutorials if you watched my   previous video on how to train an object detector  using yolo V8 then most likely you are already   familiar with this data set this is exactly the  data we are going to use in this tutorial too   and this is how the images look like now it's  very important that in my case I already have   the annotations of this data you can see all of  these txt files this is my annotation these are my   annotations for all my data for all my images but  if you're watching this tutorial then most likely   you want to know how to train detectron2 on  your own custom data and most likely you want to   know how to do all the annotation right you want  to build this data set from scratch you want to   annotate all of your images and the annotation of  an object detection dataset is something I have   already covered in one of my previous videos in  my previous video where I showed you how to train   an object detector using yolo V8 I think it  doesn't really make a lot of sense to cover the   entire process again in this tutorial so if you  are curious to know how to annotate your custom   data then go ahead and watch that other video I'm  going to post a link somewhere in this video and   now let's go to pycharm to a python project  I created for this tutorial and these are the   requirements for this project as always please  remember to install these requirements before   starting with this tutorial otherwise nothing  is going to work so please remember to install   these packages and now let me show you these  three files they are called train.py util.py   and loss.py let's start with train.py this is the  file we are going to execute in order to do all of   our training all of our training process and you  can see it all starts with a very very long docstring   explaining how you need to format your  data this is very very very important in the   util.py and loss.py we have many different  functions and we also have a class definition   we have many different... code... we have a lot of code  which already handles the entire training process   already handles the... parsing the data already  handles everything so the only thing you need to   do in order to make this training process to work  as expected is to put the data into this format   to put your file system into this format too you  need to put everything as it's specified in this   doc string so let me show you you can see that the  annotations should be provided in yolo format this   is this format class xc yc which is the X and Y  position of the center of the bounding box of   the annotation and then the width and the height  of the bounding box now let me show you one of my   annotations files let me show you how it looks  like you can see for example in this case we have   five numbers the first one is a zero and then we  have four float numbers so this is exactly the   annotation the bounding box in yolo format the  first number the zero is the class ID which in   my case is always going to be 0 because I only  have one class in this data set and then these   two numbers are the X and the Y positions of the  center of the bounding box these two numbers are   the center of the bounding box and then  this number is the width and this this number   is the height of our bounding box right so please  remember to format all of your annotations into   the YOLO format which looks exactly like this  class ID X and Y position of the center of the   bounding box and then the width and then the  height and then your file system needs to be   structured exactly like this let me show you in my  computer in my file system if I go to data this is   my root directory where my data is located you can  see I have two folders one of them is called train   the other one is called val within train I have  two other folders one of them is called images   and the other one is called anns within images is  where I have all my images all my training images   and within anns is where I have all my annotations  for my training images and then if I go to val   you can see exactly the same structure I have two  folders one of them is images the other one is anns   within images I have all my images and within anns  I have all my annotations for the validation data   now this is exactly what's described here right  you can see that we have a data directory and   within data directory we have two folders train  and val within train we have two additional   folders images and anns and the same about val  we have two directory images and anns and then this   is exactly what I show you in my local computer  and then for absolutely every single image in   this directory we have an annotation file in  this other directory with exactly the same name   but a different extension this is very important  and please remember to structure your data your   file system exactly like this otherwise nothing  is going to work because all the functions which   are in this file which handle all the parsing and  reading the data and getting the annotations and   so on all of these functions are expecting the  data exactly like this so please remember to   structure everything as it's described in this doc  string otherwise you are going to have issues   in this training process now let's continue if I  scroll down you can see I have this argparser and   these are all the arguments we can specify, we can  define for this training process you can see that   we have the data directory obviously this is very  very important then we also need to define what   are all the names of our classes if I go to my  file system you can see that in my case I have   this file which is class dot names and in my case  it only contains one class name which is alpaca   but if you are doing something like a multi-class  object detector then most likely you are going   to have other classes as well now let's go back to  pycharm you can see that another argument is the   output directory this is where all the models and  all the results everything is going to be saved   and then we have different hyper parameters for  example the learning rate this is going to be the   learning rate of our training process then the batch  size the number of iterations of our training   process then the device if we want to do this  training in a CPU or in a GPU this is very very   important then this argument is the checkpoint  period which means how often we are going to save   the weights of the model we are training right  we are going to be training by this number of   iterations and every a given number of iterations  every 500 iterations we are going to be saving the   weights of this model and this is something we  are going to see later on this tutorial when we   are doing the validation of the model we trained  now another hyper parameter which is very very   super amazingly important is model this is where  we are going to specify the baseline we are going   to use in order to do this training process in  my case this is the baseline I have set you can   see it's coco detection retina net R 101. and  now let me show you something that's very very   important which is where this model comes from let  me show you in my browser this is detector2 model   zoo and baselines this is very very very important  when you are working with detectron2 you have   many many many models to choose from so this  model zoo is basically a collection, it is a very   very large collection of all the baselines of  all the pre-trained models you can choose from   when working with detectron 2. if I scroll down  you can see that we have many different sections   for example here we have a section which is Coco  object detection baselines then we have another   one for instance segmentation we have another one for   keypoint detection then panoptic segmentation and so   on right we have many many different sections for  all the different types of algorithms right you   can see that we have many many many models many  architectures many baselines we can choose from   and basically the idea is that when we are  training our own model when we are training our   custom model we can just take whatever pre-trained  model we want we can take whatever baseline we want   and we can just train our own model on top right  this is very important because you can see that   we have many different metrics we have all  the different performances of all of these   different models we also have the inference type...  the inference time of all these different models   so this is this is amazing because we can just  choose the model we like the most for the specific   project we are working in and in my case this  is the model I have selected, I have selected the   retina net R 101 so this is the model we are going  to be using in this tutorial but in your case please   go ahead and choose whatever model you want because  it's basically the same I mean the entire process   I'm going to show you in this tutorial is going  to work exactly the same for whatever other model  you want to choose from here so this is the model zoo  this detector 2 model zoo and please take a   look at this zoo take a look at all the models which  are available and this is just amazing you can see   that this is like a very very large collection and  it's just amazing so let's go back to pycharm and   you can see that this is where you are going to  specify the architecture... the model you are going   to use in your training process now let's continue  you can see that after I just parse through all of   these different arguments the only thing I'm  doing is calling util.train I am calling the   train function which is defined in my util.py  file and I am just calling this function from   a very very high level right I'm just calling this  function I'm putting all these arguments as input   and that's it and this function is going to take  care of the entire training process if you have   watched my previous tutorials on yolov8: the  image classifier, the object detector, the instance   segmentation model, the keypoint detector...  absolutely all of my models, you will remember that   the training process is super super simple super  straight forward the only thing we need to do with   yolo V8 is to code a couple of lines and that  is it, so for this video, for detectron2   I wanted to give you something that's like the same  level of complexity the same level of abstraction   right I wanted to give you something super super  high level which you can just go ahead and use   without really caring about all the different  details and about everything that's working under   the hood right so that's why I made this train.py  file like this right you can just set up all of   your arguments and then you can just call train  and you can just forget about all the complexity   about using detectron2 that's something I wanted  to do for you because that's going to make things   much much simpler for you to just train your  custom... your model on your custom data and that's   it and I don't know about you but my case detectron2  yolov8 or whatever other machine learning   framework, algorithms, whatever you can think of, for  me they are only tools I use to solve my problems   right so being able to train an object detector  using detectron2 by just calling a   function like this from a very very high level for  me is amazing I don't know about you but if you're   anything like me then you're just going to be  super super happy with this function and if that's   the case then just jump to the next chapter where I am going to show you how to continue this training   process and how to do this training from a google colab  but if you do care about the details, if you do   care about how everything works like under the  hood, if you want to know exactly... how these   functions work and exactly how the data is parsed  and everything if you want to know more details   then just continue watching and I'm going to give  you more details. And now let's move to util.py you   can see that we have four different functions and  these functions are the functions that take care   of the entire training process so let me start  with train this is the function we are calling   over here in order to start with the training, with  the training process so this is a very good place   to start with this util.py file you can see  that we have many many different parameters   many different arguments into this function and  for each one of these parameters, for each one of   these arguments, we have a very short description  of what they mean of what this parameter is so   this is very important please remember to take a  look at this documentation, at this docstring   when you are reviewing this file when you are  reviewing this function because this is going   to help you a lot to further understand what each  one of these parameters and arguments mean now   let's continue we can see that in the first line  we are calling another function... we are   calling another function in this util.py file  which is register_datasets, by the way detectron2   works we always need to 'register' the datasets  before starting with the training process now   let me show you this function and this is another  function in the util.py file you can see that we   are taking two parameters as input which is a root  directory and the class list file and basically   the only thing we're doing in this function is  calling this method we're calling dataset catalog   dot register and we are just doing something  else right... we are just taking these two arguments   as input into this function but basically just  remember that in this function we are registering   the data all of our data all of our annotations into  detectron2 and this is a very important step   when we are working with this detectron2, you  can see that we are registering the training set   under the keyword train and we are registering the  validation set under the keyword val this is very   important because we are going to make a reference  to these two words (to train and val) later on so   please remember and then the second argument is this  Lambda function we have over here and you can see   that basically we are calling another function  in this util.py file which is called get_dicts   so this is basically the function we are putting over  here and we are putting these two arguments which   is basically the location of the images and the  location of the annotations for the training   set and for the validation set we are iterating in  the training set and in the validation set and for   each one of these iterations we are registering  each one of these sets right now let me show   you this other function get_dicts and you can see  that basically the documentation we have in   this function is very very proper it's a very good  documentation and it says read the annotations for   the dataset in yolo format and create a list  of pictures containing information for each image   the arguments are a directory containing images  and another directory containing annotations and   the return is a list of dictionaries with all  this information the file name for every single   image and an unique identifier for every image  then the height and the width of the image and   then the annotations; the bounding box and also  the category ID the class ID right and if I show   you the code it's very straightforward the only  thing we're doing is iterating in absolutely all   the files in the annotations directory and for  each one of these files we are opening the image   for this annotation... the image that belongs to  this annotation and then we are just taking the   height and the width for this image and we're just  creating this dictionary with all the information   which is the image file name the ID the height  and the width of this image and we are just   saving everything into a dictionary and then the  only thing we're doing is we are parsing through   all the annotations and we're just getting all  the bounding boxes and we are just getting also   the class ID right so basically we are parsing  through all of our data and we are getting all the   information of our images and all the information  of our annotations and something that's very very   important is that if you remember our annotations  are specified in the yolo format which means is   the class ID and then is the X and Y position  of the center of the bounding box and then it's   the width and then it is the height and we are converting  the annotation into another format which is the   x y w h in absolute coordinates this is very  important because this may be confusing but just   remember that we are taking the annotation which  is in the yolo format and we are converting into   this other format x y means that it's the upper  left corner and then it's the width and then its   the height of the bounding box right so that's  basically what we are doing here we are converting   the annotation from this format into another  format and then it's just getting the class ID   and that's basically all and at the end of this  function we are returning this list right is a   list of dictionaries with all this information  right so just go through this function and it's   going to be super super straightforward and you  have this super comprehensive docstring telling   you exactly how everything works and all the input  parameters and all the output and so on so this is   is all for this function for get_dicts and  now let's continue by reading register_datasets   so after we register the dataset... the training dataset  and also the validation dataset then we need to   tell detectron2 exactly what are the class  names right because so far we are parsing through   the data we are parsing through the annotations  but we are only parsing through the class IDs   right the annotations they only contain the class  ID but they don't really have the class name so   it's very important for detectron2... it's  very important we tell detector 2 what are the   class names and that's why we are calling this  function after we register the datasets so that's   pretty much all for this function for register_datasets  and now let's continue so after we register   the datasets we can continue with the next line which  is get config this is where we are going to create   the entire configuration file we are going to use  in this training process and the first line is get   config and this is basically a detectron2  built-in function and basically we are getting   something like a default configuration file with  many many many default hyper parameters that's   basically what we are getting here a very very  long very comprehensive default configuration   file and then the next line we are updating this  file with many other values which are specific   to the specific model we are using here right in  my case it's retina net r 101 and basically after   we are getting this default configuration file we  are just updating this file with many other values   which are specific of this model right that's very  important and then the only thing we're doing is   just setting other values... we are just manually  updating other values in this config file right   you can see that for each one of the other lines  it's config dot a given key and it's the value for   that key right in this case we are updating the  the value of the training set the validation   set the test set and so on right it's very  self-explanatory each one of these configuration   values right for example here we are telling  detectron2 to use CPU here we are setting   the weights of this model which is basically  the pre-trained weights of this model we have over   here and then we are just setting the batch size the  checkpoint period which is how often we are going   to be saving the checkpoints the learning rate and  so on and I would say this is the most important   part of this function, by far, right, because this is  where we are telling detectron2 where is the   training data and where is the validation data  right because we have registered the datasets   and if you remember we called one of the datasets  train and we called the other dataset val so this is   where we are telling detectron that the training  data is the dataset we registered under this keyword   train and the validation data is the data... the  dataset we registered under this keyword: val, this   is very very very important and this... I would say  it's the the most important part of this function   and that's basically all for this get config  now let's continue and you can see that then   the only thing we're doing is we're creating the  output directory then we are creating this object   which is the trainer, the trainer... the one  that's going to take care of the training process   then this line and actually these three lines we  have over here this is also very important because   when we are training a model using detectron2,  during the training process we will have a lot   of information regarding the loss function in the  training set but we will not have any information   regarding the loss function in the validation set  that's the way detectron2 works by default   so if we want to add this information if we want  to add the loss function in the validation set if   we want to access this information this is exactly  what we need to do and this is why I created this   class we have over here which is validation  loss, this is the class which is defined in loss.py   so long story short these three lines we have  over here it's related to creating this custom   output creating this custom debugging information  regarding the training process so we have more   information regarding how the training process  is doing... how the training process is going and   this is very important because this way we are  going to have additional information and this   is going to be super super useful once we are  validating this model so now let's continue you   can see that this line is resume or load and this  is pretty much if we are resuming this training   or if we are training from scratch and  in my case I'm training from scratch so resume   equal false and then the only thing I'm doing is  calling trainer.train and this is pretty much all   it takes in order to start this training process  and that's pretty much all so this is a much more   detailed explanation of these four functions we  have here, in the util.py file and also of the   function or actually the class definition we have  here in loss.py so this is in order to give you   more details regarding all these other functions  and this class definitions and so on and now let's   continue and let's go back to train.py because  now that we already have all the code we need in   order to do this training the only thing we need  to do is to press play and that's it you can see I   press play I get some huge output and this is... I'm  just going to stop the training I'm going to show   you something this is pretty much all the model...  all the network... all the hyper parameters for this   network we are using in order to train this model  in my case remember it's retina net r 101 and then   the only thing we will need to do from now on is  just wait until the training is completed but in   my case I'm not going to train it locally because  it's going to take a lot of time I'm going to show   you how to do this training from a Google collab  because this is going to make the process much much   much simpler and much much faster this is going  to take care of the entire training process much   faster than if I would do it in my local computer so I'm  going to tell you how to do it from a google colab the first thing you need to do is to obviously  upload your data and this is very important please   remember to upload your data, otherwise obviously you  will not be able to train this model from a google   colab and in my case you can see that this is my  data the same data I showed you in my local computer   these are my train and my val directories and  now you also need to... sorry, you also need to upload   these files which are util.py train.py loss.py  and class names right so basically is these files   over here train.py util.py loss.py and also the class  names which is this one so remember to upload all   these files otherwise nothing is going to work and  now let's move to this google colab, to this Jupiter   notebook and I'm going to tell you exactly how you  can train this model from here basically this is   a very straightforward process the only thing you  need to do is to execute each one of these cells   so it's something very very simple to do and  we are just going to be executing the code we   have over here, the first step is to connect your  Google colab with Google Drive so basically you   need to execute this cell and you need to wait a  couple of minutes... click on connect to Google Drive   I select my account I scroll all the way down  and I click allow and that's pretty much all then   I need to wait a few seconds and that's going to  take care of connecting my google colab with Google   Drive okay so everything is completed then I'm  going to continue with the next cell I'm going   to run this... install these requirements which  is running all this pip installs... this is going to   be very straightforward the only thing you need  to do is wait until everything is completed... okay   that took a few minutes but now it's completed  and you can see I have an output over here you   must restart the runtime in order to use newly  installed versions and if I scroll up I got a   similar output over here so basically remember  to restart the runtime if you have a similar   message and that's going to be pretty much all  so... Google Drive is now Mounted so everything   is okay we have installed the requirements now  let's continue then you need to change the working   directory of this notebook so you need to execute  this cell but it's very important you update this   path to the path where you have uploaded the  data and all of your files right in my case is   content gdrive my drive and then this is the  location of my data if I show you my Google Drive   you can see this is my drive computer vision engineer  TrainDetectron2ObjectDetector and if I show   you here my drive computer vision engineer   TrainDetectron2ObjectDetector so please remember   to update this path with the location of your data  in your Google Drive, your data and also   your files right everything should be located  in the same directory once you have edited this   location the only thing you need to do is to press  Ctrl enter and then that's going to be pretty much   all in order to change the working directory and  then the only thing you need to do is to execute   this cell so you can see that we are executing  the train.py file and I'm setting these arguments   which are the device I am setting device into  GPU it's very important because that's pretty   much the reason why we are using a Google colab  a Jupiter notebook in Google collab so this is   very important then I'm also setting the learning  rate in this value and I am going to train for    6000 iterations, I would say these two values  these two arguments are not absolutely needed   you can just use the default values but in my  case for my data for my problem I noticed these   values were better it was much better to use  this learning rate and also it was better to   do a shorter training only 6000 epochs would be  just fine so now I have to press enter I need to   execute this cell and that's going to be pretty  much all to do this training process you can see   how simple this is once everything is within  this train.py file right once I created these   functions and I put everything in this util.py...  we can just execute everything from a super super   high level calling train.py and please let  me know what you think in the comments below   but I think it's just amazing we can just train  detectron2 2 from a super super high level as we   are doing over here the only thing we're doing is  calling train.py and we are passing the arguments   exactly like this from a super high level we  don't really care about the details we don't   really care about the complexity we don't really  care about nothing it's amazing, I don't know what   you think but I think it's amazing please let  me know in the comments below what you think so   this is pretty much all for this training the  only thing we will need to do now is we will   need to wait until everything is completed and  this is going to take some time this is going to   depend on your data on your annotations on your  problem on your specific problem in my case for   my data it took something like two hours to do the  entire training process so we are not really going   to wait until this is completed because I have  already trained this model when I was preparing   this tutorial so let me just show you what the  output looks like, once you trained your model you   are going to have a directory which is called  output exactly like the one I have over here   and this is where you're going to have all  the results of your training process you can   see... you're going to see all of these models  all of these checkpoints which are the weights   of your process of your training process in all  these different steps right and this is where we   are going to notice this argument over here  checkpoint period because we have set that the   checkpoints should be saved every 500 steps and if  you notice this is are all the checkpoints these   are all the weights we have saved and if you check  the numbers you can see that these are... 499   then 999 1499 and so on so all of these files are  500 steps apart and yeah so that's basically what   it means to save the checkpoints every 500 epochs,  at the end of your training process or actually   during your training process you are going to be  saving the checkpoints the weights exactly like   this so at the end of your process you're going  to have many many many many weights files exactly   like I have over here so these are my weights but  what I'm going to do is I'm going to take this   file... this is the file with all the information  of our training process in the training set   and in the validation set so this metrics.json  file is the one we are going to inspect is the   one we are going to analyze to validate this  model so I'm just going to download this file and now let's go to pycharm because I want to  show you this file which is plot_loss.py so I   have already downloaded this file and it's in  my directory let me show you todays tutorial   detectron 2 code and this is my metrics.json  file I have just downloaded and now if I show   you this plot_loss.py basically what we are  doing over here is parsing through this file   right parsing through all the information we have  in this file let me open this file for you so you   can see exactly how the information looks like  you can see it looks very crazy right we have a   lot of values we have a lot of information and  basically we need a way to parse through this   information and we need a way to visualize all  this information super super quickly so that's   why I created this... plot_loss.py because  it's going to help us a lot in order to just get   all the information we want from this file and  just plot everything into a very nice looking   plot so we can just do this validation much much  quicker so let me show you how it looks like I'm   just going to press play I'm going to tell you in  a few minutes why I have commented these two lines   I'm going to press play plot loss play and you  can see that this is the training loss and the   validation loss, the blue values are the training  loss and the orange values are the validation loss   but obviously this is something that we cannot  analyze because this is a lot of information this is...   this doesn't really look very well right  so we're going to do something now which is going   to make everything much much prettier which is  we're going to do a moving average on each one of   these functions basically we are going to apply  another function which is going to smooth these   values and it's going to make everything much much  smoother, I already made all the code we need and I   already made this function which is moving average  and the only thing we need to do is I'm going to   delete these comments and basically now we are  going to plot the loss values... the same loss   values we are getting from this metrics.json file  and then we're also plotting the moving averages   right we are plotting the same functions but  the averages and you can see that this is how   the averages look like right this is something  that's much much prettier, and in order to show you   much better I'm just going to remove these two  plots and we are only going to plot the moving   averages right this is much prettier this is much  much better so now we have in blue the training   loss I'm going to adjust the labels okay now you  can see that everything looks better we have this   values over here in blue we are plotting the  training loss and in orange we are plotting   the validation loss and we can see that both of  these functions are going down and that's a very   good sign but in the case of the training loss  it seems we have reached a plateau over here so   the training process goes super smoothly until  it reached something like 5000 steps and in the   case of the validation loss it seems we also  reach a plateau but much much sooner right so   this is basically where we are going to validate  this model and this is also where we are going to   decide which one of our checkpoints we are going  to choose from this model right because we have   many many many weights we have many checkpoints  and we can just use any of these files in order   to produce our inferences in order to produce  our predictions so this is where we're going   to Define exactly which one of these checkpoints  we are going to use and I would say I would... I like   how everything is going until this point over  here because you can see that the training loss   is going down and the validation loss is kind of  going down as well and this is pretty much where   everything is like a plateau right so if you ask  me I would keep this checkpoint over here in the   3000 epochs so in the... sorry in the 3000 steps this  checkpoint over here so this is where you're going   to draw your conclusions and you're going to make  decisions regarding what you're going to do next   obviously another conclusion could be to do the  training again it all depends on what's going   on with your data ideally your training loss  and your validation loss should be like closer   together right in this case they are very far  apart and that's something I don't really like   but that's like ideally I think that if we take  this model over here in the 3000 steps   I think everything is going to be just  fine but ideally I would like to have like   these two plots more closer together right because  otherwise this could mean the model is overfitting   to the training data and the model is not going to  perform well in unseen data but never mind let's   just take this model, the one we trained over here  in the 3000 steps and let's see what happens so   I'm just going to get back to pycharm because now  it's time to make our predictions now it's time to   take the model we trained to take the checkpoint  we chose and let's just make some predictions   with this checkpoint with this model so let me  show you how to do that I'm going to this file   which is predict.py and this is the file we are going to  use in order to make our predictions and you may   see that everything is already coded so everything  is ready and I'm just going to explain absolutely   every single line of this file so you understand  exactly how it works and you understand exactly   every single line of this file you can see that  the first few lines are a few Imports so I'm just   importing a few functions which are important  in order to make these predictions these are a   few imports from detectron2 and I'm also  importing CV2 then the first line is getting a   configuration file absolutely every single time we  use detectron2 we need a configuration file we   need an object which is going to contain all the  configuration for the specific task we are going   to do with detectron2 in this case we are  just getting this default configuration file with   a lot of default values and then we are updating  this file this default configuration with many other   values which are specific to the model we use  in my case as I used this model, as I use this   pre-trained model, this Baseline, I have to use  exactly the same one here and the only thing   I'm doing is updating this default configuration  file with many other values which are specific   to this model then this is very very very very  important I am setting the model.weights to the   location, the path, of the checkpoint we are going  to use and if I show you my google drive remember   these are all the checkpoints we generated with  this training process and as I am going to use   the one we generated at the 3000 steps this  is the one I have already downloaded and it's   already in my file system you can see that this  is the directory, the folder, of my python project   and this is the file we are going to use: model_0002999.pth  so this is exactly the file we are going to use  and this is exactly the location of this file   then as I am going to make these predictions in  my local CPU I am setting device to CPU then I am   creating this object which is this predictor and  basically this is going to be the predictor we   are going to use in order to make our predictions  then I am loading an image, very important, because   we definitely need an image if we are going  to make predictions and this is the image    I am loading let me show you this is exactly  the image of an alpaca we are going to use   in order to make predictions ideally we should  be getting the location, the bounding box of this   alpaca let's see what happens but this is what we  should get ideally we should get the location of   this alpaca and this is the location the path to  this image right you can see its data val images   data val images and then it is just this name if I  search you can see this is the image we are going   to use now let's go back here and then the only  thing we need to do is to call predictor and we   need to input the image we are going to predict  and then we are just going to get all the outputs   right? all the results but let's just stop for  a minute and let me show you exactly how output   looks like I'm just going to print outputs  I'm going to comment everything else   this is maybe the only coding I'm doing in  this tutorial so I'm just going to press play   and you can see that this is the output we got  right so these are the predictions for this   image you can see that we have many different  fields one of them is pred boxes and these are   basically all the bounding boxes, all the objects  we are detecting, this is the first one and then these   are all the other objects we are detecting  something like 8, 8 different objects in this   image and this is very important these are the  all the bounding boxes and these are the X and   Y coordinates of the top left corner and these are  the X Y coordinates of the bottom right corner so   these are the bounding boxes you can see that we  are also detecting the scores we are also getting   information regarding the scores the confidence  values of each one of these bounding boxes for   example the first one is 88.6 percent and the  last one is 5 percent so... you can see   that these are all the different confidence values  and then we are also getting this information   which is the class we are predicting right in  my case I'm only using one class which is   alpaca and it's encoded with the number zero but  this is where you will have all the different   numbers of all the different class IDs of all  the objects you are detecting and please mind   that in my case although my data set contains only  one class ID because I'm only detecting alpacas   you may notice that some of these objects were  detected with a different class ID I have a 39   47 56 I have used detectron2 many times in many  projects and this is an issue I have found in   different projects so you can see for example in  this case I should be getting only zeros because   I only have one class in my dataset but I'm also detecting other random numbers so please   please take a look at the numbers you are getting  here and please take a look that everything makes   sense and just make sure you are only detecting  the numbers you should be detecting and if you are   getting some random numbers as I'm doing right now  just don't use those predictions do something like   an if something like that and if the number you  are getting is not within your predictions   is not within your classes then just don't use  those predictions, do something like that, but also in my   case for example now you can see that the random  numbers the random values are some detections   with a very very low confidence for example  this one is the fourth one so one two three   four this one which is something like an 8  percent confidence and then this one which is a   5.8 percent and this one which is a 5.3 percent  so I guess it's most likely this is going to be   an issue with those objects with a very very low  confidence value but you never know so please make   sure the numbers you are getting they make sense  now let's continue so I showed you the output you   you are getting from the detectron 2 now I'm just  going to uncomment everything and I'm going to   continue explaining this file so you can see that  the next line it says threshold equals 0.5 and   this is the detection threshold we are defining so  we are only going to consider valid all of those   detections with a confidence value Which is higher  than 0.5 now let's continue you can see that this   is basically... we are parsing through the outputs  you can see that we are detecting three objects   the pred boxes the scores and the pred classes  right so the only thing I'm doing is I'm parsing   through this information and I'm just getting  these objects pred classes scores and bounding   boxes and then I am iterating in all the bounding  boxes and for each one of these boxes I am getting   the score, the confidence score, of that specific  detection I'm getting the class ID I'm detecting   the number of the class ID I am detecting and then if  the confidence value is greater than the threshold   then I am just getting all the values, the X Y  position of the top left corner and the bottom   right corner and then I'm just drawing a rectangle  on top of my image, I'm not really checking that   I am getting only zeros, I am not doing it here  but that's a very good homework for you, I invite   you to make an edit into this file and you make  something like an edit for example here, and you say   something like if the confidence score is greater  than the detection threshold and pred is within   the class IDs of my class dot names file, of  this file over here, right, if the prediction we are   getting is within my classes, is a valid number, if I am getting a number which makes sense then   and only then draw the bounding box right that's a  homework for you that's a very very good homework   for you so yeah I'm just going to continue and  you can see that I'm drawing the bounding box and   then the only thing I'm doing is plotting this  image so let's see what happens now I'm going to   press play and let's see if we are detecting this  alpaca properly or not... amazing we are detecting   the alpaca super super properly remember this the  image we are using and we are just detecting the   only alpaca we have in this image and yeah we are  just drawing the bonding box and the bounding box   is enclosing the alpaca super super properly  so everything is working just fine so this is   going to be all for this tutorial this is exactly  how you can train your object detector on your   own custom data using detectron2 and this is  going to be all for today. hey my name is Felipe and welcome to my channel  in this video I'm going to show you how to use Amazon recognition as an object detector, Amazon  recognition is a very interesting tool and a very powerful tool which I have used many times in my  projects as a computer vision engineer now let me show you super quickly all the different  categories all the different objects you can detect using AWS recognition and you can see that  this is a very very long and a very comprehensive list of objects right for example you can detect  dinosaurs you can also detect diamonds you can detect driving licenses e-scooters and so on if  I scroll down you can see that these are many many categories and in total we have something like 290  different objects so this is definitely a lot and this is a very interesting tool because there are  many cases in many situations in many projects in which you need to detect a very specific type of  object and in some cases it may not make a lot of sense to train an entire object detector only to  detect a very specific objects in some cases it may be more convenient and it may be easier and it  may be quicker much quicker to just use something like Amazon recognition out of the box and you can just  detect all of these different objects in the list right for example if we were working in a project  and we need to detect Wheels we can either train an object detector from scratch to detect wheels  or we can just use Amazon recognition out of the box right so this is a very interesting tool  and I have used it many times in my projects and this is exactly what we will be doing today and  in this video I'm going to show you how to use Amazon recognition to detect zebras this is a  random category a random object I have chosen from this list so this is exactly the object we will  be using in order to show you how to use Amazon recognition now let me show you super quickly the  video we are going to use as an example so we can use this tool and you can see that this is a  video in which we have many many many zebras we are going to use this video in order to detect  all the zebras and in order to show you how to use Amazon recognition, now... what we're going  to do now is going to pycharm and I'm going to show you the entire process of how to create  a project how to create all the files we need how to install the requirements I'm going to show you  absolutely every single step of this process we are going to start this project and we're going  to build this project from scratch right so the first thing I'm going to do is I have already  opened pycharm I'm going to file new project and I'm going to just create a project and I'm  going to create this project exactly here which is this folder I have over here and I'm going to  create it here where it says tutorial AWS reko this is where you are going to choose the exact  directory where you want to create your project then I'm going to create a new environment and  this is where my environment is going to be located and I'm going to create this environment  using python 3.8 now I'm going to click on create I'm going to choose this window because I'm going  to open this project over here and you can see that this is a completely and fully and absolutely  empty project the only thing we have is the virtual environment which is called env and that  is it, now the first thing I'm going to do is to install the requirements is to install the python  libraries we are going to use today so I'm going to settings then I'm going to a project and python  interpreter and I'm going to click on this button this plus button over here and then I'm just  going to choose... I'm going to type opencv python this is one of the libraries we're going to  use, I am going to click on install package and then we are also going to use boto3  and that's pretty much all, these are the two libraries we need in this project  and then I'm just going back to pycharm and now I'm going to create the first file  we are going to use in this project and I'm going to click here new python file and  then I'm going to call this file main.py so the the first thing I'm going to do for now  is to just write the entire pipeline the entire process we will be doing today the first step  will be to create an aws recognition client... aws reko client right this is going to be  the first step in this process then we are going to set the class set the target class we are going  to be detecting right I already mentioned we were going to detect zebras in this tutorial so this is  exactly where we are going to specify exactly what the object what's the category we are going to  be detecting then we are going to load the video right the video we are going to detect today then  we are going to read frames from the video the next step is to convert the frame to jpg this  is a very important step then we are going to convert this... we are going to get a buffer from  this conversion and we're going to convert this buffer to... to bytes right, it's going to be  the next step in this process then the only thing we need to do is to use Amazon recognition in  order to detect objects and then we are going to write all the detections to our file  system right we are going to write everything to our disk to our local computer and this is exactly  the process in which we are going to be working today now let me show you something else I'm going  to create another file which is called credentials because in the first step in this process  we are going to create this AWS reko client, and in order to do so we are going to need a  couple of keys we are going to need an acces key which I'm just going to set in none for now  and we're also going to need a secret key which I'm also going to set in none for now  right we are going to need these two keys in order to continue with this project because we  need to use these two keys in order to create a client, an AWS rekognition client, now  let's go back to my browser and let me show you exactly how to create these two keys so  let's go back to my AWS Management console and I'm going to show you super quickly how to  create these two keys we need in this project but first obviously you need an AWS account  in order to continue right this is very very important and also you need to login into your  account once you have an account once you have created an account and you are logged into your  account you are going to see something like this this is your AWS Management console and these are  all the services which you have available in AWS right these are a lot... but in today's tutorial  we are only going to use one service only one service which is IAM so we need to type IAM  over here and we need to select this option then this is your IAM Management console  and you need to select users we are going to create a new user then you need to select add  users and we are going to choose a name for this user I'm going to say something like AWS reko  tutorial right this is the name of my user this is the user I'm going to create then you need  to select attach policies directly and we are going to search for rekognition right and I'm  going to select Amazon recognition full access I click here then next and that's pretty  much all so I'm just going to create user so the user is now created and then I'm going to  select the user over here AWS reko tutorial and then I am going to security credentials because now  it's where we are going to create the two keys we need in our project so we scroll down until  we... until this section over here access keys and create access keys then you you can see that  we have all these different options and if I'm not mistaken it's pretty much the same how you  create this access key pretty much absolutely all these options are going to create  exactly the same keys and you can just use... use them from your project if I'm not mistaken but  we are going to use this one over here which is local code because this is the description which  fits better to our project right you plan to use this access key to enable application code in a  local development environment to access your AWS account if I'm not mistaken it's pretty much the  same if we use any other option but let's just use the option which fits better with our use case  and now you can see we have a warning over here which is alternative recommended use an integrated  development environment IDE which supports the AWS toolkit toolkit enabling authenification through  iam identity Center and this is very important because this is a warning we get from AWS because  it means there is a better way or there is a more secure way to create these keys and to access this  service but in this tutorial we are not going to mind this warning because it would involve to  create a solution which it's only useful for a very specific IDE right in my case I'm using  pycharm and if I follow these instructions I would be using a solution which is only useful for  pycharm right and I want to make this tutorial as generic as possible and I want you to use it as  well so in case you're using a different IDE let's just create these access keys in a different  way right the only thing I'm going to do is I'm going to select this checkbox over here I  understand the above recommendation and I want to proceed to create an access key and I'm going  to click next right I'm going to show you a very very generic way to do it which is going to work  for whatever your IDE is right if you use pycharm or if you use visual studio and so on so I'm not  going to type anything here so just create access key and these are our access Keys something that's  very very important is that access keys are very personal and you should never disclose them with  anyone in any situation right so you should never do something like I'm doing right now right just  making a video with my access keys completely available for anyone watching this tutorial never  do something like this right in my case it's not really that important because I'm just going to  delete these Keys once this tutorial is over but please be super super Mindful and super careful  with who has access with your access keys, with your private access keys, because this is very very  sensitive information, so the only thing I'm going to do for now is to copy these two Fields  I'm going to start with this one which is access key I'm going to copy this field and I'm going  to get back to pycharm I'm going to my file to credentials.py and the only thing I'm going to  do is to paste the access key over here right then let's get back to this page and I'm just  going to copy the secret access key and I'm going to head back to pycharm I'm just going to  paste the secret key and that's pretty much all so these are the two keys you need in this project  and now we can continue with the main.py file and we can just start coding our entire pipeline  so let's get started and the first thing I'm going to do is to import boto3 and let's  import opencv as well so we can just focus on everything else right so I have  imported the two libraries we have installed in this project and now let's get started by  creating this AWS reko client and this is how we're going to do I'm going to call this client  reko client and this is something like boto3 dot client and then I need to input rekognition  and then this is where we are going to input the access keys right so we're going to have two  keys one of them is AWS access key ID and then the other one will be something like AWS secret  access key right and now the only thing you need to do is to import credentials I need to import  these two variables right so the first one will be something like credentials dot access key and  then the other one will be credentials.secret_key and that's pretty much all so now let's continue  and we're going to set the target class so I'm going to create a variable which is Target  class and this is where I'm going to define the class we are going to be detecting today, as I  already told you we are going to be detecting zebras so now let's continue now it's time to load the  video and what I'm going to do is to go to my directory where I have the video and I'm going  to copy and paste this video to my directory where I have created this pycharm project so now  the video is located in this pycharm project and it's called zebras.mp4 so let's go back to  pycharm so now let's call... let's load this video exactly like this I'm going to call CV2  video capture and then this will be zebras dot MP4 and this will be cap okay now let's read frames  from the video so I'm going to define a variable which is ret I'm going to initialize it as true  and then while ret I'm going to read frames from the video like this ret frame equal to cap  dot read right so we are reading frames from the video and now let's convert this Frame to  jpg and this is how we are going to do I'm going to call CV2 imencode if I'm not mistaken then  this will be jpg and then frame right and this is going to return two variables one of them we  are not going to use it so it doesn't matter and the other one is a buffer okay now let's convert  buffer to bytes and I'm going to do it like this let's call this something like image bytes and  this will be buffer to bytes if I'm not mistaken something like this I'm not sure about this  character I'm just going to execute this file I'm going to do it for only one frame so we make  sure everything's okay and let's see what happens okay and I got an error and it says something like  could not find encoder for this specify extension in function imcode let's see if I have a... if  I have a character missing I think it's dot jpg let's see now... now I have an error which is object  has no attribute to bytes so I'm almost sure that this is without the underscore let's see now...  and now everything is okay okay so I'm just going to remove this break and now it's the most fun  part of this tutorial because now it's the time to use Amazon rekognition to detect objects  in this video so this is exactly how we are going to do I'm going to call the client we have just... we have just created reko client and I'm going to call detect labels I'm going to  input the image we have just created this image bytes and this will be something like image I'm  going to open a dictionary and this will be bytes and then image bytes and that's pretty much all  and now I'm going to set the minimum confidence value in which we are going to... for which we are  going to detect objects right we are going to set this value in 50% so... and I think this  is a capital M and this means that we are only going to detect objects if the confidence value  is greater than 50%, for everything else we are not going to get the object right we are going to  filter all the detections with a confidence value lower than 50 percent that's exactly what it means  and this will be something like response right and now let's do it like this okay  and now I'm just going to iterate for for label in response labels right I'm going to iterate in all the results  we got so this is how I'm going to do... if label name equal to our Target class right so  if the object we have detected is a zebra then we are going to iterate  for instance number in range Len label instances and if I'm not mistaken  this is with a capital I right so we are going to iterate in all the zebras  we have detected and now let's continue now let's get the bounding box we have  detected with this... in this object in each one of these objects so this will  be something like label instances instance number and then  we need to call bounding box let's execute the code so far to make sure  everything is okay and let's see let's just let's do it for only one frame so  I'm going to break the loop here labels right because this is  with a capital L most likely okay everything's just fine so I'm going  to delete the break and I'm going to get back here and let's continue so  now I'm going to unwrap all the information in the boundary box and this is  something like X1 is equal to bonding box left and I'm going to cast it to int  okay then y1 is equal to int bonding box top okay then the width of this bounding  box is equal to bounding box... um width with a capital W if I am not  mistaken and then the height is equal to int bounding box height and let's see what happens if we  just execute... if we just print these values so I'm going to print X1 y1 width and height and also I'm going to remove the int for now  because... for now let's just remove it so I can show you something and then I'm going to add  the int again but let's just for now to make sure everything is okay I'm just going to execute  this as it is... okay let's see what happens okay you can see these are the values we are getting and  this is why I removed the int and this is why I'm not casting to int because otherwise everything  will be a zero or a one so these are the values we are getting and everything it's in relative  coordinates this is very very important so what we need to do now is to multiply these values  for the width and the height of the frame we are reading right so I'm going to Define two  new variables which are H and W and these are the height and the width of every frame so this will  be frame dot shape and now let's just continue by doing something like this so X1 will be bounding box  left multiplied by the width of the image right then y1 will be exactly the same but for H...  times H and then this is times W and this is times H okay and now I'm going to cast it to int okay and then let's print the values for X1 y1  width and height again and let's see what happens okay and now you can see that we are getting  integers and everything seems to be okay right we are getting objects we are detecting objects  so everything is okay so the then the next step of this pipeline is to write the detections but  before we do so let's just make sure everything is 100% proper everything is working just just fine  and let's just visualize some of the frames with all the bounding boxes we are detecting on top  and let's see what happens so I'm going to call cv2 dot rectangle I'm going to input the frame and  then X1 y1 and then x 1 plus width and y1 Plus height. and then I need to input the color if I'm not  mistaken which is going to be green and then the thickness of the rectangle which will  be three for now... and then let's see what happens I'm going to visualize  this Frame by calling imshow frame frame and CV2 waitkey okay so we are plotting  a bounding box on top of absolutely every single frame we are plotting a bounding box for each  one of our objects and let's see what happens I'm just going to execute this file and let's  see if we are detecting all of our zebras and everything seems to be working just fine right if  I just press a letter you can see that we are just detecting all the frames this is not running on  real time because obviously we are detecting many many many zebras and we are plotting a rectangle  a bounding box for each one of these zebras so this is not running on real time but you can see that  nevertheless this is working just just fine so the only thing we need to do now is to take all these  detections and we need to write these detections to our file system to our computer so this is how  we are going to do I'm going to remove all the plotting because we are not going to do it anymore  and now let's just write the detections and in order to do so I'm going to create a new variable...  with the output directory with the location of the output directory which is  where we are going to save all these detections so I'm going to Define this variable like output  dir and this will be my local computer and the directory will be called Data so let's go back  to the directory of this pycharm project and let's create a new directory which is called Data  I'm going to press enter and that is it now let's save all these detections into the YOLO format  so I'm going to create another directory which is imgs I'm going to create another variable for  the images directory which will be something like output dir imgs and This Is os path join output  dir and images right I'm going to import OS and then I'm going to create another variable  for the annotations for the detections I'm going to call this other variable anns...output dir anns  and this will be something like this and now I'm going back to my local computer to my  file system and within this data directory I'm going to create two additional directories one of  them for the frames for the images which I'm just going to call imgs exactly how I have called  this variable over here and then I'm going to create another folder which is called anns right  exactly as I have called this other variable over here so now everything is set everything is  ready we have just created the directories where we are going to save all the data now let's  get back over here and the only thing we need to do is I'm going to do something like with  open I'm going to do it here before we start this iteration is going to be much much better if  we do it here for every single one of these frames we are going to open a text file and the path  name will be something like with os path join output directory anns and then this will be  the file name which I'm going to call frame Dot txt and then I'm going to input the frame  number format frame number which we haven't defined so I'm going to Define it in a second but  let's just say string frame number zfill 6 right now let me explain this in a few... in a  couple of seconds but for now let's just get here I'm going to define a new variable which  is frame number I'm going to initialize it as -1 and then I'm going to increment it  for every single frame we read over here okay so we are initially in -1 we are incrementing  this variable here and then for absolutely every single image we are creating this file name  which is frame and then this integer... this number but with six zeros we are filling this  number with six zeros so we make sure all the file names are all the same length that's  very important, that's actually more for formatting reasons it's not 100% needed but  it's going to make it look much much nicer so now let's just continue and I'm  going to open this... as write and then as f and then that's pretty much all okay and now  for each one of our detections the only thing we need to do is to write these detections and  this is how we are going to do f dot write we are going to write five numbers remember we  are going to do it in the yolo format so we need five numbers and this will be something like the  first one of these numbers will be a zero because we will be detecting only one object which is  zebra so this will always be the number zero then it's the the X and the Y coordinates of the  center of this bounding box so this is something like X1 plus width divided by 2 and then it's  exactly the same but for the y coordinate plus height divided by two okay and then it's the  width and then the height and I see there's an issue here okay a parenthesis missing let's  see now okay and... okay perfect and if we are using the yolo format remember we are just  converting all these values into integers but if we are going to save the annotations into  the yolo format we don't really need to do this conversion right so I'm just going to  delete the integer and this multiplication I'm going to something like this because remember  how the YOLO format works we need the coordinates into the relative... we need relative coordinates so  we... with the values like this will be just fine and that's pretty much all okay so we are writing  all the detections and once we have written all the detections the only thing we need to do  is to close the file and that's pretty much all and let's save the images as well let's just  prepare this dataset as if it were a data set in the yolo format so we can just take this  dataset and we could potentially train a model we could train an object detector with the data  we are going to be saving, and in order to do so we need to save the detections but we also need to  save the images so I'm going to save the images over here we can just do it after we save  all the detections we can call cv2 imwrite then the file location which will be pretty  similar to the um to the detections but we are going to change txt by jpg and that's pretty  much all but we also need to change the directory which will be imgs okay and then we need to  input the frame... and that is all... okay so let's see now if everything is okay let's just run it  for only one image and let's see what happens everything is just fine and if I go to my local  directory I open anns you can see I have a file with many many detections which makes sense because  we have many many zebras and then if I go to the images directory you can see I have a frame  the first frame from the video so everything seems to be just fine so the only thing we need  to do now is to execute exactly the same process but for absolutely all the frames so I'm going to  remove this break and then let's see what happens okay I see I got an error because we  should be doing everything else only if we have read a frame right so this is a very  small mistake and also while I was waiting for the execution to be completed I realized another  mistake which is we should be dividing only the width and only the height by two these are the  X and the Y coordinates of the center of the bounding box everything should be okay now so  in order to be 100% sure everything is okay I'm just going to execute this file again now the  execution has been completed and we don't have any errors so everything is just fine and if I  go to the images directory you can see I have 755 images because we are starting from zero so  we have 755 images and these are the images of our zebras right these are all the frames from  the video and then if I go to the annotations directory you can see I have all my annotations  and I also have 755 files right we have 754 and we are starting from zero so we have 755  so everything is working just fine so this is exactly how you can use Amazon rekognition  as an object detector this is exactly how you can detect objects using Amazon rekognition and  it's going to be all for this tutorial in this video we're going to work with  automatic number plate recognition and this is   exactly what you will be able to do with this  tutorial you can see that we are detecting all   the license plates in this video and we're also  reading the text from these license plates we're   using 100% python we're going to use an object  detector based on yolo V8 we are going to do   object tracking and we are going to read the text  from the license plates using easyocr so this will be an  amazing tutorial my name is Felipe welcome to my  channel and now let's get started so let's get   started with this tutorial today we are going to  work with automatic number plate recognition and   let me show you a few resources a few repositories  which are going to be super super useful for   today's tutorial the first one is Yolo V8 because  we are going to be detecting license plates and   then we're going to be reading the text from  the license plates right and in order to detect   our license plates we are going to use an object  detector which is based on yolo V8, so yolo V8   is going to be super super important in today's  tutorial and I'm going to show you more details   in a few minutes but for now let me show you the  other repository which we are also going to use   in this tutorial and it's going to be super super  important and it's sort it's an object tracking   algorithm which is called sort because today we're  going to do object detection and we're also going   to do object tracking this is going to be an  amazing tutorial and in order to do object tracking we are   going to use sort and then once we have detected  the license plates once we have implemented all   the object tracking once we have done everything  we need to do we are going to read the   content of the license plate using easyocr so this  is a python Library which is going to be super   super super important in this tutorial and now let  me show you the data we are going to use in this   tutorial let me show you the video we are going to  use in order to test the automatic license plate   recognition software we are going to use in this  tutorial you can see that this is a video of a   highway and we have many many cars which are going  through this highway and the important thing about   this video is that all the cars... we have like a  very very frontal view of absolutely all the cars   and most importantly we have a very frontal view  of all the license plates right you can see that   for absolutely every license plate we detect in  this video we have a very very very frontal view   and this is an ideal point of view to build a  project like this so this is exactly the video we   are going to use in this project and now let me  show you something else if I go to Google and I   search for license plate and I go to images let me  show you something you can see that we have a lot   of diversity when it comes to license plates right  we have many different types of license plates we   have some license plates which are comprised only  with numbers like this one then we have other   license plates which are only letters like these  two and we have many many different examples we   have many different types many different formats  I would say that absolutely every single country,   absolutely every single state, absolutely every  single time in history have its own a license   plate format right its own license plate style  its own license plate system right there are many   many different type of license plates there's a  lot of diversity when it comes to license plates   and obviously that it it's very very challenging  to build an automatic license plate recognition   software to deal with absolutely every single type  of license plate right, it's... I'm not going to say   it's impossible it's not impossible but it's a  very very challenging task so in order to make   it more simple in order to simplify our problem we  are going to focus only on one very specific type   of license plate which is this one we are going  to be working with the United Kingdom license   plate system, with the United Kingdom license plate  format, which is comprised of seven characters the   first two characters are letters then we have two  numbers and then we have three more letters right   so we have two letters two numbers and three  letters and this is the exact structure of the   license plate type we are going to be working  today in this tutorial right, this is the exact   same type we are going to be detecting with the  software we are going to build in this tutorial   but today I'm going to show you a very generic  process and a very generic pipeline so by making   some adjustments into the code we are going to be  making today you will be able to apply the same   process to other types of license plates right we  are going to work with this type in this project   but you will be able to make some adjustments in  everything we're going to be doing today so you   will be able to apply the same process to other  types of license plates right so that's something   I'm going to show you better in a few minutes  but for now let's continue now let me show you   something else, when we were starting this tutorial  I showed you that we were going to use an object   detector based on yolo V8 to detect license plates  now let me show you the data I used in order to   train this license plate detector right this is  exactly the data set I used in order to train this   detector, and I'm going to give you a link to this  dataset in the description of this video, and if   you want to know exactly how I trained this object  detector I invite you to take a look at one of my   previous videos where I show you how I train an  object detector using yolo V8, in that video is   the step-by-step guide of how to train an object  detector using yolo V8 and that's exactly the   same process I followed when I was creating this  license plate detector so this is the data I used   and if you want to know exactly how I trained that  object detector then just take a look at the video   I'm going to be posting over there right so now let's  continue I already showed you all the resources   we were going to use in this tutorial I already  showed you the type of license plate we are going   to be detecting today and now it's time to go to  pycharm so we can start implementing all the code   of today's tutorial, and now let's go to pycharm  let's go to this pycharm project and let me show   you some files I have over here, you can see I have  many many different files and for now let's just   focus on these two: main.py and util.py. main.py  is the file in which we are going to be coding   the entire pipeline of this tutorial right you  can see that this is a sequence of steps which   we are going to follow in order to build this  automatic license plate recognition software   you can see that the first step is loading  the models then loading the video then we're   going to read frames and so on this is the entire  pipeline the entire process we are going to be   building today and then we have this other file  which is util.py, in this utils file we have   five functions let me show you these are the  functions we have defined over here and from   all of these functions we are going to focus  on these two which are read license plate and   get car, these functions... if I open these functions  you can see that they are completely empty right   we need to implement these functions in this  video and then the other three functions they   are already implemented right everything is ready  and we're just going to use them and the idea is   to focus on these two functions over here because  these two functions are way more important from a   computer vision point of view right so these are  the functions we are going to focus the most and   this is the util.py file now if I go back to main.py  now it's time we start with this process now it's   time we start with this Pipeline and in order to  do so we are going to start importing YOLO so I'm   going to say from ultralytics import YOLO and then  we are going to load the models that's the first   step in this process and the interesting part is  that we are going to have two models because we   are going to be detecting license plates but we  are also going to be detecting cars that's going   to be a very important part in this process so  I'm going to be loading two models I'm going to   call the first one of these two models coco model  because this is a model which was trained on the   coco dataset and this is going to be YOLO and  we're only going to use a pre-trained   model from YOLO V8 which is Yolo V8 nano.pt right  we are just going to call this pre-trained model   and this is the model we're going to use in order to detect  cars it's very important we detect cars I know we   are going to detect the license plates and we are  going to read license plates but detecting cars   is going to be super super super important and  you're going to see exactly why in a few minutes   then we're also going to load the license plate  detector and we're going to call it license plate   detector and this is going to be YOLO  and we need to input the path to this   license plate detector and the license plate  detector is located in a directory which is   called models and is called license plate  detector.pt so I'm just going to... models... okay now it's time to load the video we are  going to use today and in order to do so I'm   going to import CV2 and I'm going to call  CV2 video capture and I'm going to input   the video location which is something like the  current directory and it's called sample.mp4 okay and this is going to be cap okay now  we are going to read frames from the video   so I'm going to define a variable which is  ret I'm going to initialize it as true and   then while ret I'm going to read frames from the  video like this ret frame equal to cap dot read   if ret then I am going to continue okay and this is  going to be pretty much all for now so we are   reading frames from the video and now it's time to  continue detecting all the vehicles right we are   going to be detecting all the cars and therefore  we are going to be detecting the vehicles and in   order to do so this is where we are going to use  the first model which is the model trained on the   coco dataset so we are going to do something like  this I'm going to call coco model and I'm going to   input the frame and this is going to be results  right I'm going to call   this object 'detections' and in order to move one  step at the time... I need to access the first element...   in order to move one step at the time I'm going to  print detections and I'm only going to execute   the first 10 frames otherwise it's going to be  very... this is going to take a lot of time so   and frame number lesser than 10 and obviously I  need to Define a variable which is frame number   I'm going to initialize it in -1 and  then I'm just going to increment it here   okay and I don't really need the pass anymore  and let's see what happens if I print detections   okay so everything seems to be working just fine  this is all we got and you can see that this is a   lot of information these are all of our detections  so everything seems to be working just fine   so what I'm going to do now is we are going  to iterate for detection in detections   and this is going to be for detections.boxes  dot data dot to list and let's print detection   again so we know exactly how this looks like  and we know how to access all the information   okay so this is how each one of our detections  looks like right you can see that we have one   two three four five six numbers and the  way this works this is going to be something   like X1 Y1 X2 Y2 then we will have the score  and then we will have the class ID right this   is detection so remember we are using a  model which was trained on the coco dataset   so we are detecting many many different  objects right this is the class ID this is   exactly the type of object we are detecting at  every single time at every single one of these   detections so this is very important and then  we have the confidence value right this is how   confident our object detector is of this specific  detection and then this is the bounding box right   so we have X1 Y1 X2 Y2 the bounding box then  the confidence score and then the class ID   and something that's very very important we are  doing all of this in order to detect Vehicles so   as the coco dataset... as this model which was  trained on the coco dataset is detecting   many many different objects we are going to say  something like this if int class ID in vehicles then we are going to continue and vehicles is a  variable which we haven't defined and we are going   to Define it with the indexes with the class IDs  of all the vehicles in the coco dataset this is   a list of all the objects which we can detect  using this model right you can see that these   are a lot of objects and some of these objects  are related to vehicles and some other objects   are not for example you can see we have person  bicycle car motorbike airplane bus train   truck and so on right so from all this very very  long and very comprehensive list we are going to   make sure we are detecting a vehicle so we are  going to say if the class ID we are detecting is   either a car or a motorbike or a bus or a truck  then we are going to continue and if not we are   going to neglect the bounding box, the detection  we just got, and the indexes we are interested   in are 0, 1, 2 for car so we are just going to put  two then three for motorbike four five for bus   and then six seven for truck right we don't  really have any motorbike in this video I know   for sure because I already watched the video but  nevertheless in order to make this more generic   I'm just going to add a motorbike as well so  if our class ID is within our vehicles then   we are going to continue and I'm going to  create another variable which is detections_   and this is where I'm going to save  all the bonding boxes of all the vehicles we   are going to detect in this video so I'm going  to do something like this if we have detected   a vehicle then I'm going to append the bounding  box and the confidence score to this new variable   and please mind that I'm not saving the class ID  from now on it's not really that important, we are   not really going to care about the specific class  ID we have detected from now on the only thing   we care about about our detections is that they are  Vehicles right and we don't really care to know   exactly what type of vehicles so this is the new  variable in which I'm going to be working from now   on in this tutorial in this process and now let's  continue and now it's the time in which we are   going to implement the object tracking remember  we were going to work with object tracking in   this tutorial and now it's the time where we are  going to implement this tracking functionality   into this project and before we do so let me give  you a very very quick explanation Regarding why   exactly we are using this tracking why exactly  we are going to implement this object tracking   and basically every time we solve a problem every  time we solve not only a computer vision problem   but any type of problem you need to use absolutely  all the information you have available regarding   that problem and in this case we are going to be  tracking license plates which are moving through a   video right we are going to be detecting license  plates on individual frames and these license   plates are objects which are moving through a  video so if we are able to track this license   plate through all this video we will have more  information and this additional information is   going to be super valuable in order to build a  more robust solution so that's pretty much the   reason why we are going to implement this object  tracking and we are going to be tracking... we're   not going to be tracking the license plates  themselves but we are going to be tracking the cars,   the vehicles, and I'm going to show you exactly  why later on so this is what we are going to do   we are going to work with this repository remember  I showed you this repository when we were starting   with this tutorial and the first thing you should  do is cloning this repository into your local   drive into your local directory you need to clone  this repository into the directory into the root   directory of your pycharm project so in my case  this is the root directory of my pycharm project   this is where I have all my Python scripts and  this is where I have all my files related to this   project and you can clone this repository in one  of these two ways let me show you one of the ways   is opening a terminal and typing something like  git clone and the repository URL so I'm going to   click here I'm going to copy the repository URL  and then I'm going to paste the repository URL   here and then the only thing you will need to do  is to press enter right and that's exactly how you   can clone this repository into your local computer  but there is another way in which you can do it   and actually this is a much more simple way and  maybe you prefer to do it like this which is just   downloading the entire repository as a zip file  and once you have downloaded this file this ZIP   file the only thing you need to do is to copy and  paste is to take this directory this sort Master   directory into your local directory right that's  the only thing you need to do is to drag and drop   this directory into your local computer and that's  it and please mind that this directory is called   sort-master but you will need to edit the name  you will need to rename this directory into sort   right you can see here in my computer  this is my directory this is called sort if   I open this directory you can see these are all  the files which are in this repository so basically   remember to rename this directory into sort it's  going to be called sort-master but you need to   rename this directory into sort that's very very  very important otherwise you will have some issues   possibly you will have some issues with the next  steps in this tutorial so let's go back to pycharm   this is the repository you need to clone  into your local directory and remember to call   the directory containing this repository remember  to call this directory sort now let's take it back   to pycharm and what I'm going to do now is just  importing sort... let's call from sort dot sort   I'm going to import everything we are going  to import absolutely everything from this   library and then I'm going to call an object I'm  going to create a new object which is called   mot_tracker and this is going to be equal to sort  right this is the object tracker we are going   to use in order to track all the vehicles  in our video and now let's get back here   and what I'm going to do now is just calling   mot_tracker.update and I'm going to input a numpy array... of this list we have created containing all the  vehicles in our video right and this is going   to be something like track IDs right so track IDs  is going to contain all the bounding boxes of all   the vehicles we have detected in this Frame but  with the tracking information right it's going   to add an additional column an additional field  which is going to be the car ID the vehicle ID   for each one of the cars we are going to detect  and this vehicle ID or this car ID is going to   represent that specific car through all the  video right so let's continue so now we are   tracking all of our objects all of our cars and  now it's the time to detect the license plates   right so far the only thing we have detected is  the cars in the video but now it's the time to   detect the license plates in order to do so we are  going to use this detector over here which is   license plate detector and we're going to do it  exactly the same way as we have detected the   cars right I'm just going to copy and paste  this sentence and I'm going to replace coco_model   by license plate detector right and this  way we are going to be detecting all the license   plates I'm going to call this object license  plates and then I'm going to iterate in all the   license plates we detected within this Frame and  in order to do so I'm going to call for license   plate in license plates dot boxes dot data dot  to list and that's pretty much all and then let's   unwrap all the information we got from this  license plate exactly as we did before so this   is going to be something like X1 Y1 X2 Y2 score  and class ID this is going to be license plate okay then we will need to assign each license  plate to a given car right because we have   detected all the cars in every frame and all  the license plates in every frame but so far   we have cars and we have license plates and we  don't really know which license plates belong to   which car right and we know for sure that every  single license plate will be on one of our cars   but we don't really know which one goes with  which one right so now in this step is where we   are going to assign a car to absolutely every  single one of our license plates right and in   order to do so we are going to use one of the  functions in our util.py file we are going to   use this function which is get car this function  receives a license plate and receives this object   we have over here receives this object  with all the tracking information for all the   cars in that specific frame and it returns a  Tuple containing the vehicle coordinates and   its ID right so we are going to call this function  get car and this function is going to return the   car this license plate belongs to right this  is what we're going to do I'm going to import   from util import get_car and now I'm going to call get_car   I'm going to input the license plate and  I'm going to input this object which is   track IDs remember this object contains all  the bounding boxes and also all the tracking   related information right that's very important  and the return will be the coordinates of the   car this license plate belongs to so it's  going to be something like X car 1 Y car 1 X Car 2 Y Car 2 and then the car ID  for this car right remember every   single car in our video will have an ID  it will have a very unique ID which is   going to identify the car through all the  frames in the video that's very important   and also please mind that this function is  completely empty for now right this function   is only returning some very dummy values and  this function is completely and 100% empty and   this is exactly what we will need to implement  in the next step in this project right once we   are completely ready once we have completed this  pipeline then at the end of this pipeline at the   end of this process then we are going back here  to util.py, to this file to the util.py file, and   we're going to implement this function right so  now we have assigned the license plate to a very   specific car now we know what's the car this license  plate belongs to and now we can continue with the   next step which is cropping the license plate  and this is how we're going to do we are going   to call frame and then we're going to input  the license plate coordinates which is int Y1 int Y2 and then int X1 and int X2  right so this is the license plate crop   and that's pretty much all we need to do in this  step of this process and now let's continue to   the next step which is processing this license  plate right now we are going to apply some image   processing filters to this crop we have over  here in order to further process this image   so we improve this image so it's much simpler  for the OCR technology for easyocr to read the   content from the license plate now it's time to  apply some image processing filters to this crop   and specifically the filters we are going  to apply are a grayscale conversion and then   we are going to apply a threshold so let's see  how we can do that I'm going to call CV2 dot cvt   color I'm going to input the license plate  crop and then I'm going to call CV2 color   bgr 2 gray and this is going to be license plate  Gray license plate crop Gray right now we have   converted the license plate crop into a grayscale  image and now the only thing we need to do is to   call CV2 threshold we are going to input this  grayscale image then is the threshold which   I'm going to set in 64 and then it's the value  at which we are going to take all the pixels   which are lower than the given threshold right  which is 255 and then I say the value at which   we are going to take all the pixels which are  lowered than the threshold because we are going   to use the inverse threshold we are going  to use the thresh binary... thresh binary inverse type of threshold and this type of threshold is  going to take all the pixels which are lower than   64 and is going to take them to 255 and all the pixels  which are higher than 64 is going to take them to   zero right that's exactly how this threshold works  and if you want more details on how this function   works I invite you take a look at one of my  previous videos where I show you an entire course   of opencv with python and one of the lessons in  this course is exactly about thresholding right   it's exactly about this function so I'm going  to be posting a link to this course somewhere   in this video so you are welcome to take a look  at this course and this lesson particularly to   get more details on how thresholding works now  let's continue this is going to be equal to a   variable which we are not going to use in the  tutorial so it doesn't really matter and then   I'm going to call the output license plate crop  threshold right so this is going to be the   thresholded image and its exactly the image we are  going to input into our OCR technology into   our easyocr algorithm, in order to be more more  clear about the difference between these two   images I am going to visualize these images super  super quickly so you see exactly how they look   like I'm going to call imshow and I'm going  to input this image which is license plate crop   I'm going to call this window crop I'm  going to call it original Crop so it's   more clear this is the image we are cropping  from the frame and then I'm going to call   cv2 imshow again and in this case I'm  going to be plotting the threshold and I'm going   to input this other variable and then the only  thing I'm going to do is to call CV2 wait key   and let's take a look at these two images super  super quickly so you see exactly how they look   like and this is what we got and you can see  that this is the frame this is the crop we are   making from the frame so this is the license  plate and this is exactly how we are cropping   this license plate from the frame and this is  the thresholded image right you can see that   in this image absolutely every single Pixel is  either white or black and this type of image   this thresholded image will make it much much  simpler to easyoce to our OCR technology to read   the content from this image right this is the  image we are going to use in order to read the   license plate number because this is going to  make it much much simpler to easyocr so it's   going to be much simpler to our OCR to read the  license plate so that was like a very very quick   way to show you how these two images look like  and now let's continue now it's the time to read   the license plate number we are almost there we  have almost completed the this process and this   is how we're going to do now we're going to call  another function which is defined in util.py and   this function is read license plate and you  can see that this function is not implemented   either this function is completely empty we are  returning some dummy values and this is another   function which we are going to implement later  on we are going to implement after we are happy   with this process once we are completely and  absolutely happy with this pipeline then we   are going to move to util.py and we are going  to implement this function as well. But for   now we are just going to... we're just going to use  this function so I'm going to import it as well   uh no this is not the function name... read license  plate... something like this and now let's see   how we can use this function I'm going to call  util Dot read license plate and this is going   to return two values let's look at the function  documentation to see exactly what are the values   which are going to be returned here... we are going  to... it is going to return a tuple containing the   formatted license plate text and its confidence  score so this is going to be something like   license plate text and then license plate text  confidence score right these are the two values   we are going to be getting from here and the  input should be the license plate crop in our   case we are going to input the thresholded crop  right this thresholded version of our crop and   that's pretty much all right remember we are just  completing the pipeline the most generic process   then we are going to get back here in order to  implement this function and this other function   right and now let's continue now the only thing we  need to do is to write the results we are almost   there we have almost completed this process and  now obviously if we want to take these results and   we want to visualize these results or if we want  to analyze these results whatever thing we want to   do with these results we obviously need to write  these results to our local computer so this is how   we are going to do in order to write these results  we are going to use another function which is also   defined in this util.py file and it's called write csv and this function is implemented this function   is 100% and fully implemented you can see that  this is all the code we have for this function   and everything is just ready and we can just use  this function as it is remember in this tutorial   and in basically all my tutorials we always focus  on the computer vision part of the problems so   writing this csv file is not really that important  from a computer vision point of view so that's why   we are not really going to implement this function  live in this video but this is already implemented   and we're just going to use it so let's see what  this function does and it says write the results   to a CSV file and it receives two arguments which  are the results which is a dictionary containing   the results and then it also receives a path to  the CSV file we are going to produce and this   is going to be the path in which we are going to  write this CSV file right it's the path in which   we are going to save the CSV file we are going to  produce so if we are going to input a dictionary   then we need to produce a dictionary in order to  input into this function right we need to take all   all of our information and we need to put all of  this information into a dictionary right that's   very very important so that's what we are going  to do now because for now the only thing we have   done is just Computing all the information but  we have not saved this information into any type   of dictionary or anything like that so I'm going  to create a new variable which is called results on results is going to be a dictionary and  then this is where I'm going to save all   the information and this is how we are going to  do the first key in this dictionary will be the   frame number right we are going to save all the  information and we are going to start with the   frame number we are going to have a different  key for absolutely every single frame in our   video and then for absolutely every single frame  we are going to save all the information which is   related to all the cars we are detecting  and most importantly to all the license plates   right so then I'm going back to the end of this  pipeline here and I'm going to say something like...   I'm going to make a very quick edit first which  is going back to this function and instead of   returning two None I'm going to be returning two  zeros right because we are going to reserve this   other output we are going to reserve the None, None  output for those times in which we are going   to find an error or we are going to have any type  of issues reading the license plate and this is   going to be much more clear later on once we are  implementing this function but for now just bear   with me that it's much more convenient to return  some dummy values which are different than None   so let's get back here and this is where we're  going to say if license plate text is not None   we are going to save all the information about  this license plate in this dictionary we have just   created so we are going to take this variable  over here which is results for that specific frame   number and we're going to create a new entry with  all the information for the license plate we have detected right and this is how we're  going to do I'm just going to write it first   and I'm going to explain it once it's done once  I'm completed and this is what I'm going to do   I'm going to say the next key is the car ID   right this is going to be results frame number car  ID and then for this car I'm going to create a new   dictionary which is going to have two keys one  of them is car and the other one is license plate for car we are going to have another  dictionary which is the bounding box   and that's it right and for the license plate  we are going to have another dictionary which   is something like bbox... the bounding box  then also the text we have detected then   the confidence value for the bounding box and  then the confidence value for the text right okay and that's pretty much all so I'm just  going to format this a little nicer   and that's pretty much all now let's see what  exactly we need to input in each one of these   fields okay so basically for the car bounding  box we are going to input these values over here   which are the car bounding box right   these are the coordinates of the bounding box   of this specific car and then for the license  plate bounding box we are going to input these   values which are the coordinates for the bounding  box of this license plate and then for the text   we are going to input this value which is license  plate text for bounding box score we are going to   input this value which is the score in for  in which we have detected this license plate then   for text score we are going to input this variable  which is license plate text score and by doing   so we don't have any errors and everything is  okay so for every single frame for every single   frame number we are going to be saving all the  information which is related to each one of our   cars and all the information for each car will  be the information of that specific car where   the car is located and then all the information  about the license plate which we have detected in   that specific car right and for the license plate  we are going to save all the information we have   right and we're going to save all this information  only in those cases in which we have detected the   license plate and every time we have successfully  read the license plate number from this license   plate so this object is not None we are going  to be saving all these information into this   dictionary only in that case only when we have  detected the license plate and when we have read its   license plate number right and please notice the  structure I have built for this information for   this dictionary because remember every time we  detect a license plate it will not be floating   around in space completely isolated no that will  never happen every time we detect a license   plate it will be on a given car and this car will  be on a given frame right so this is exactly why   this structure I have decided for this dictionary  and once we have created all this information the   only thing we need to do is to call... I'm going to  import this function as well, I am going to input   the name was something like write csv so let's  import write csv as well and something is going   on because we are not really using this import  we have over here so if I scroll down I see I'm   not really importing the function itself I think  there we should be okay okay so now let's go back   here and I'm going to call this function which  is write csv and I need to input the dictionary   so I'm going to input results and I'm also going  to input where I want this CSV file to be saved   and I'm going to save all this information into  a CSV file called test.csv so what I'm going to   do now is I'm going to execute this pipeline I'm  going to execute this process as it is and then we   are going to take a look at this file and then we  are going to continue right then we are going to   see if the file we are going to create it  makes sense right so I'm just going to press play   okay the execution is now completed and now if I  go to my local directory to the directory of this   pycharm project this is test csv so this is the  file we have just created and if I open this file   you can see that this is all the information we  have saved and we have extracted from this video   right remember we are processing only the first  10 frames we are still processing only the first   10 frames so this is the all the information we  have extracted so far and please remember we are   just Computing some dummy values from some of...  from some of our functions so this is this   is not really all the information this is all the  information we have compute so far but other than   all of these zeros over here and these zeros over  here you can see everything looks pretty pretty   well right, we are just producing an entire  CSV file with all the information we have computed   from this video we are almost there and actually  we are there we are ready we have completed this   pipeline we have completed this process we are  almost almost there the only thing we need to do   now is going back to ulil.py because we need  to implement these two functions get car and   read license plate and once these functions are  implemented then we are going to be producing   a real file right we are going to be using a  file with the entire information here and   here right we are going to be producing the real  license plate number and the real license plate   score and also the car bounding box and the car  ID for absolutely every single license plate in   absolutely every single frame in which we have a  detection right so we are almost there I am super   excited and now let's continue to the util.py  file so we can Implement these functions and let's   start with get car remember from the main.py  pipeline we were using this function which is   get_car in order to assign which car each license  plate belongs to right we have many many cars and   many many license plates and for each one of these  license plates we want to know what's the car this   license plate belongs to so this is exactly where  we were using this function get car and now let's   see exactly how we are going to implement the  function and in order to do so I'm going to show   you a few pictures this is a random frame from our  video right you can see that this is a frame   we have many many cars and this is only a frame  from the video once we have detected all the cars   we are going to have a situation like this we are  going to have many many many detections because   at every single frame we are going to have many  many many many cars I don't know how many cars   we have in this picture but they are many they are  something like I don't know 20 30 50 maybe 60 cars   they are many many cars so for every single frame  we are going to have many many detections which   are going to be our cars, we are going to have  many bounding boxes for all of our cars and also   at every single frame we are going to have all  of our license plates but please focus please   mind that we are only going to have maybe one  or two or three license plates for every single   frame right so we are going to have many cars but  only a few license plates and the idea is to know   which car this license plate belongs to and the  way we are going to know that is by looking at   all of these bounding Boxes by looking at all of  these cars and by finding the car which contains   the license plate right by finding the bounding  box of the car which contains the bounding box   of this license plate right that's the way we  are going to find what's the car which belongs   to this license plate so that's exactly the idea  of what we are going to be implementing in this   function now let's see exactly how we can do that  the first thing I'm going to do is unwrap all the   information in license plate so in order to do  so I'm going to do something like this because   this is exactly the same object license plate  so I'm just going to do this okay then I'm going   to iterate in all the cars we have over here I'm  going to say for... let's say for j in a len vehicle   track IDs we are going to be iterating in all the  cars we have detected and remember this is the   entire information this is the bounding box and  this is also the car ID remember so now we are   going to unwrap all the information for each one  of these cars and this is going to be something   like x car 1 y car 1 x car 2 y car 2 and car  ID this is exactly the information which is in   each one of the elements of this object vehicle  truak Ids and this is vehicle track ID j okay   so that's pretty much all we are iterating in  absolutely all the bounding boxes of all the   cars we found in this Frame we are iterating  in all these bounding boxes for each one of   these bounding boxes we are going to verify  if it contains the license plate right that's   exactly what we are going to verify and this is  how we're going to do it we are going to see if   X1 is greater... remember X1 is the upper  left coordinate of the license plate if X1   is greater than x car 1 and Y1 is greater than  y car 1 right we are verifying that this   coordinate over here it's greater than this  other coordinate over here we are trying to verify   if we meet this condition right and then the  other condition we need to meet is if this   point we have over here these coordinates  we have over here they are lesser than this   other point we have over here right we need to  meet these two conditions and this is exactly   how we're going to do it if X1 greater than x  car 1 and Y1 greater than y car 1 and X2 lesser than x car 2 and Y2 lesser than y car 2 then we are we are going to... we  have found the bounding box this license   plate belongs to we have found the car on which  this license plate is located right that's   what it means if we have met all of these  conditions that's what it means so in this   situation we are going to... I'm going to  define a new variable which is foundIt and   foundIt is going to be false at the beginning  and then it's going to be true in this case and in this case we're also  going to break the loop right   and then I'm also going to Define  another variable which is going to be   car_index and car index will be j  okay now if foundIt then return this value which is going to be... okay so if we have found the car which contains  this license plate then we are going to return   these values which are the bounding box of the  car and also the car ID and in any other case we   are just going to return this output in order to  make it more clear that we have not found the   car we are going to return something like this so  it's going to be much more clear so that's pretty   much all... that's it, we have implemented this  function which is get car and now let's continue   so now let's let's see if everything works well  now we should have the uh the right values for all   the cars we are detecting and the only thing I'm  going to do is I'm going to execute this script   again and let's see what happens okay I got an  error and I think I know what's the problem I think   we need to iterate in range len vehicle track IDs  and now everything should be okay let's try again   okay now it's completed and now let's see the new  file we have created the new test.csv file and now   you can see that we have some values for car ID  and we also have some values for the car   bounding box so we are moving one step at the time but we  are making progress right so now let's continue   with the util.py file and now let's move to the  next function which is read license plate now   it's time to implement this function over here and  something I'm going to do first is I'm going to   do an if over here and I'm going to continue with  this pipeline only if car ID is different than   -1 right and now let's continue and let's see  how we can implement this function which is read   license plate and the only thing we need to do is  to call easyocr and let's see how we can read the   license plate and let me show you some variables  I have defined over here these variables are going   to be super super important now this variable are  going to be super amazingly important you're going   to see exactly why and then also let me show you  this reader we have here I have already defined I   have already initialized this OCR reader and  you can see that I'm calling easyocr and then   I'm calling this method which is reader so the  only thing we need to do now is calling reader dot read text and I'm going to input the license  plate crop and this is going to be detections   then I'm going to iterate for detection  in detections because remember we could be   detecting many many many many different objects  many different text objects in this image so   for each one of these objects we are going to  unwrap these objects first and this is going   to be something like bounding box text and score  this is going to be the detection right each one   of these detections is going to be something  like the bounding box of the text we have   detected then the text we have detected and then  the confidence value for which we have detected this   text and then we are going to convert this text to  uppercase and we are going to remove all the white   spaces right this is exactly how we are going to  do and this will be equal to text okay and now   it's the time in which we are going to use this  format right remember when we were starting this   tutorial I told you we were going to focus on  this very specific type of license plate right   we are going to work with this type of license  plates each license plate is going to have seven   characters the first two characters are going to  be letters then two numbers and then three letters   this is the format of absolutely every single  license plate we are going to be working with in   this tutorial so we are going to make sure every  single text we detect complies with this format   and in order to do so I have already created a  function which is license complies format this   function returns a Boolean value which is pretty  much the verification of if this license plate   complies with the format or not we are going  to be verifying if we have seven characters   and we're also going to be verifying the first two  characters are letters and then the second... the   third and the fourth characters are numbers and  then the last three characters are letters again   right this is exactly what we are doing with this  function and this is a very important function   we are going to use now so let me show you  exactly how we are going to use this function   if license complies format text then and  only then we are going to return the text and the   confidence score we are going to return these two  values, these two variables, which are text and   score right only if the text complies with the  format we are asking absolutely all the license   plates right only in this case we are going to  be returning these values and in any other case   we are going to return None right this is very  very very important and this is going to make   our solution way more robust and way way better  and something that makes the solution even better   is that we are not going to return the text on  itself we are going to call another function which   is format license and let me show you exactly  what we are going to be doing with this function   I'm going to call format license text and let me  show you the... let me give you the idea, the   high level idea behind this function sometimes  when we are using an OCR technology when we are   using a library like easyocr sometimes it's very  challenging to tell some characters apart for   example it's very challenging to tell a five apart  from an S right so you can see that the letter   S and the number five are very similar and it's  very very very challenging for an OCR to tell   the difference between these two characters and  we are going to have exactly the same situation   for the letter I and the number 1 or for the  letter O and the number 0 for example right   those are characters which are very very hard  to differentiate, they are very hard   to tell apart so this function I have over here  format license the only thing it does is going   through all the characters in the license plate  in the text and for each one of these characters   it fixes whatever issue we may have with  this type of confusion right if for example we are   reading this character over here and easyocr,  the OCR technology we are using, it says is the   letter S we know for sure it's not the letter  S because we are expecting a number here so if   we have detected the letter S then we convert this  value to the number 5 and the same happens here   if we are reading this value this character and  we are getting the number 5 we know for sure   for a fact that that's not the number 5 because  we are expecting a letter here so we are going to   convert the number 5 into the letter S that's  exactly the idea the high level idea of   what we are going to be doing with this function  we are going to be going through absolutely all   the characters in the license plate and for each  one of these characters we are going to be fixing   these type of issues in case we find any type of  issues like this and that's pretty much all and I   invite you to take a look at these two functions  to format license and to license complex format   and to take a much closer look and to properly  understand exactly how they work right that's   your homework that's your homework from this  video so you properly understand how they work   so now let's continue and now we are returning  format license text and score if our license   complies with our format and we are returning  none in any other case and we are done we are   completed now we have completed our process now  let's see what happens now I'm going to execute   this file again and let's see what happens I'm  going to make a very very small change I'm only   going to execute it for 10 frames but I'm going  to do it like this if ret then if frame number   um greater than 10 then I'm going to break the  loop this is going to be much better and now let's   see what happens I'm going to execute main again  okay it seems I have a typo over here this is   obviously not remove but this is replace I got  confused because I was removing the white spaces   but this is obviously not the name of the function  we want to use here so now let's see what happens   okay now the execution has been completed and  now we have produced a new test.csv file and   if I open this file you can see that we still  have all the information related to the car ID   and the car bounding box and now we have all the  license plate numbers we have read from the frames   from the license plates and also the confidence  score for each one of these license plates so we   made it we have completed this process now  we are completed we are done so everything   is ready the only thing I'm going to do now  is to execute this script execute this main   pipeline for the entire video so I'm just going  to remove this break over here and that's pretty   much all and now I'm going to press play again  and then I'm going to show you how to visualize   this data so everything looks like the video I  showed you in the intro so let's see what happens and now let's go back to pycharm so I can show  you exactly how you can create a visualization   as the one I showed you when we were starting  this video in order to do so this is where we   are going to use these two files visualize.py  and add missing data.py and you're going to   find these two files in the GitHub repository  of today's tutorial so you can just go ahead and   use them in your project and before using these  two files let me show you something first if I   go back here to the test.csv file we have created  let me do something I'm going to filter by car ID   I'm going to show you all the data all the  information we have extracted for only one   of our cars I'm going to select only the car ID  number three right this is only a random car ID in   our data you can see that all the frame numbers we  have detected for this car ID are not consecutive   so this means that we have detected the number  zero... the frame number zero then the number one   then it jumps to the number four then it jumps to  the number nine then 12 13 14 15 16 17 then 27 so   we have many many missing frames right for some  reason we don't have the information for this car   ID for many frames which are in between these... these  two for example right we don't have the information   for the frame number two the frame number three  or the frame number five six seven eight uh 10 11   right there are many many missing  frames for this car ID so that's something that's   going on and remember that we are not saving all  the information because we are only saving the   information for those license plates for which  we have detected the car the license plate   it is on right? the license plate... the car  where the license plate is located and also we're   only saving the information the license plates  for which we have read a license plate... a license plate   number which complies with our format right  so we are not saving all the information,   there's a lot of information which we are not  saving into this CSV file remember how the OCR   Technologies usually work I mean they are very  very good they perform very good but in some   cases they have errors they have mistakes so if  in some cases they are not reading a number which   complies with this format then we are not going  to be saving the information for those frames   so that's the reason why we have some missing  frames over here that's the first thing I want   you to notice then another thing which is going to  be much more important is take a look what happens   with the license plate numbers now we have read  the license plate numbers in all of these frames   and we have read a number which complies with  our format so everything it's okay but you can   see that we have many numbers right for example  we have many many different values many different   numbers if I show you the number we have detected  in the first frame it's different than the one we   have detected here in the frame number four right  and then if I continue scrolling down you can   see that we have also detected other values for  example here this is different and if I continue   scrolling down this is also different here we have  an N we have a P so for every single car ID we   are going to have many many different values for  the license plate and this is a huge issue this   is this is a very very important thing we need to  solve because obviously every single car has only   one valid value for its license plate  so if we have so many values if now we have   so many values for the license plate how do we  make a decision how do we know what's the   the real one right what's the real value the  most accurate value for the license plate how do   we make a decision what's our criteria that's a  huge problem and this is exactly where the object   tracking is involved because for every single car  in the video... because we are going to be tracking   the car through all the different frames in the  video, for every single car we are going to have   the value for the license plate we have detected  in that given frame for that car so if we want to   know what's the value for the license plate of a  given car through all the frames in the video the   only thing we need to do is to select the license  plate we have read we have detected with the   highest confidence score right you can see this  column is the confidence score in which we have   detected every single one of these license plates  so the only thing we need to do is to take a look   what's the license plate we have detected with  the highest confidence and that's it,    that's going to be your criteria to know what's  the license plate number of this car and that's it   that's the way we are going to solve our problem  and that's exactly where the object tracking is   involved and that's exactly why it's so important  to track... to apply to implement an object   tracking algorithm into this problem because this  is how we are going to solve this problem this is   going to be our criteria to select the license  plate number for every single car in this video   so remember we have these two problems this  is how we are going to solve   this problem and then we have we still have this  other problem which is that we have some missing   frames for every single car right this  problem actually is not... it's not really a big   problem and the only thing is going to affect is  the visualization right because now we are going   to take all this information and we are going to  visualize this information so the only thing is   going to happen with all these missing frames is  that we are just not going   to visualize the license plate and we are not  going to visualize the license plate value for   that given frame so let me show you what happens  if we visualize if we create a video from the   text file... the CSV file I just showed you  we will have a visualization which looks like   this which will be okay I guess but it's like  um but... it's not an ideal visualization right   it's like uh it's it's not really pretty it's not  really good looking this doesn't really look good   ideally we would like to have a visualization  which is more stable for every single license   plate we would like to see the license plate on  a fixed position through all the different frames   in which we are detecting the license plate for  that car right that's exactly what we eould expect   and this is not really good looking this doesn't  really look good right so in order to fix this   problem which again is not a huge problem  and the only thing it does is to affect the   visualization we are going to use one of these  two scripts which is called add missing data   and the only thing this script does is interpolate  all of those frames in which we have not detected   a license plate or in which we are not extracting  the information for the license plate so the only   thing we're going to do is interpolate the  values for the bounding boxes for the car and   the license plate in all of those frames, we are  going to interpolate the values and that's it for   example in the frame number 41 you can see we have  the information for the frame number 40 and we   have the information in the frame number 42 but we  don't have the information in the frame number 41   so the only thing the add missing data.py script  does is going to consider the bounding boxes    for this Frame and the founding boxes for  this Frame and it's going to take the average   of all the different coordinates and by taking the  average it's going to compute what it's the value   of the found inbox in the missing frame right and  it's going to compute exactly the same process in   absolutely all the other missing frames so that's  the way we are going to solve this problem all   the missing frames remember this is only a problem  of visualization this is a matter of visualization is   not a huge problem and then once we have fixed  that issue then we can just create the video and   that's it so these two files I'm going to give you  these two files in the GitHub repository of this   tutorial and now let me show you how this works  so the first thing you need to do is to execute   add missing data and you need to change here the  path to the file name you are going to interpolate   right in our case its test.csv and then you need to  specify what's the file name of the CSV you are   going to create with the interpolated data let me  show you super quickly how this file looks like   I'm going to filter by car ID and I'm going to  select the number three again and you can see that   in this case we have computed absolutely every  single frame right we are starting the number zero   just as before but now we have computed absolutely  every single... the values for the bounding boxes for   absolutely every single frame until the number 65  which is the last frame in which we have detected   this car right so this exactly the data  we are creating with add missing data.py   and once we have created this data this new CSV  file then we go to visualize.py and then we input   something like test interpolated.csv and then we  specify what's the file name of the video we are   going to create in this case out dot mp4 and the  only thing we need to do is to execute this file   and then to execute this file and then after a  few minutes we are going to have a video which   looks exactly like this and this is going to be  all for today my name is Felipe I'm a computer   vision engineer and these are exactly the type  of videos and the type of tutorials I make in   this channel if you enjoyed this video remember  to click the like button and also remember to   subscribe to my channel this is going to be all  for today and see you on my next video so on today's tutorial we will  be making an object detection web   application we will be detecting  tumors on a brain MRI image now   let me show you how it works I'm going to  drag and drop an image from my computer   so this is the image we have uploaded and if I  click here on detections you can see that we have   detected two objects we have detected two tumors  on this image so this is exactly the project in   which we will be working today on today's tutorial  we are going to make the entire web application   using Python and streamlit and we're gonna detect  objects using an object detector trained with   detectron2 so my name is Felipe welcome to  my channel and now let's get started so let's get   started with this tutorial and the first thing we  need to do is to create a new pycharm project you   can see that this is pycharm and now let me  show you how to create a new project we need to   click here on new project I'm going to select the  directory where I'm going to create this project   which in my case is here and then I'm going to  select tutorial this is the directory in which   I'm going to create this pycharm project and  then I'm going to create a new environment and   The Interpreter will be python 3.8 everything else  will be just the default values so I'm going to click   on Create and that's pretty much all now the next  step will be to install all the requirements we   are going to use today so I'm going to create  a new file which is called requirements.txt I'm going to name this file requirements.txt  I'm going to press enter and then I'm going   to paste all the... all the requirements all the  dependencies we need to install in this project   which are all of these packages we have over  here so I'm just going to copy and paste these   packages over here and that's pretty much all now  I'm going to the terminal and I'm going to press   pip install -r requirements and that's  pretty much all I press enter and that's going   to take care of installing all the requirements  and you can see that I got an error and basically   this error is because we need to install all  of these dependencies first and then we need   to install this final dependency... right you can  see that this one is called detectron2 we   need to install everything else first and then at  the end we need to install detectron2 so I'm   just going to comment this line and then I'm going  to press pip install -r requirements again   okay now all the requirements have been installed  and now the only thing we need to do is to install   detectron 2 so I'm going to uncomment this line  and I'm going to press pip install -r requirements   again and this is going to take care  of installing all the requirements but as we have   already installed all these packages the only one  that's going to be installed now is detectron 2   so we need to wait a few minutes okay and that's  pretty much all in order to install detectron2   and now we are all set all of our requirements  have been installed so it's time to continue let   me show you how to create a new file let's create  a new python file so we're going to select file   new python file and this file will be main.py  so this is the file in which we are going to   be coding the entire web application of today's  tutorial and remember in this tutorial we are   going to be detecting tumors on brain MRIs  so we definitely need an object detector in order   to detect these type of objects right let me show  you the data I used in order to train this object   detector this is a dataset I found in roboflow  and I'm going to give you a link to this dataset in   the GitHub repository of today's tutorial so you  can just go ahead and take a look at this dataset   if you want to, and this is an object detector  I trained using detectron 2. and I'm not going to   going to show you the details of how I trained this  object detector because that's something I have   already covered in one of my previous videos in  one of my previous videos I showed you how to train   an object detector using detectron2 and  I showed you the step-by-step guide I showed you the   entire process so if you are curious to know  how exactly I trained this object detector   I invite you take a look at the video I'm going to  be posting over there and now let's continue this is   the data I used in order to train this object  detector and now let me show you the entire   pipeline in which we are going to be working  today let's get back to pycharm and let me show   you exactly what are all the steps we are going  to be making in this tutorial the first step will   be setting up the title of the web application  so this is the first step in this process then   the next step is setting up the header right  the third step will be creating a file widget   upload file so the user can upload an image about  a brain MRI so we can detect all these objects on   top of this image then the next step is loading  the model right loading the object detector we   are going to be using to detect objects then we  are going to load the image the user has uploaded   then we are going to detect objects and then the  last step in this process will be to visualize the   objects we have detected on top of the original  image and we are just going to display this   visualization to the user right so these are the  steps of the entire process the entire pipeline   in which we are going to be working today and  I'm going to show you every single step of this   process so you can see these are one two three  four five six seven steps in only 7 steps we will   have this web application up and running so let's  get started and the first step in this process is   importing streamlit as st okay then in order  to set up the title I'm going to call st dot   title and the title will be something like brain  MRI tumor detection then in order to set up   the header I'm calling st dot header and this  will be something like please upload an image   okay then in order to create the file upload  widget I'm going to call st dot file uploader   and I'm going to input two parameters the  first one is an empty string and then it's   all the types we support in this widget  and I'm going to say something like png   jpg and then jpeg okay and that's pretty  much all and in order to move one step   at the time let's see if everything executes  just fine I'm going to execute the code   as it is so far so I'm going back to the  terminal and I'm going to type streamlit run main.py this is going to open my browser  and we are going to see exactly how our web   application looks so far and everything looks just  perfect so we are okay in order to continue so   let's get back to pycharm and let's continue with  the next step in this process which is loading the   model loading the object detector we are going  to be using today and remember we are going to   be using an object detector which I trained using  detectron 2 and remember I already showed you   how to use the detectron2 in one of my previous  tutorials so let's go back to my browser and let's   see exactly how we can use this model which  I trained with detectron2 let's go to the GitHub   repository of this previous tutorial and let's  see exactly how this... training this model or how   using this model was all about so I'm going to  this file over here which is predict.py and this   is the file we used in order to load the model in  order to make predictions with a model trained   with detectron2 so the only thing I'm going  to do in this tutorial is to copy some of the code   in this file and I'm just going to paste it in the  main.py file of our ocyharm project right remember   that in this tutorial we are not going into the  details of how to use detectron2 so I strongly   recommend you to take a look at my previous video  to take a look at this video over here which you   are going to find in my YouTube channel so you  can see exactly how this... using this model how   using detectron 2 works right because we are  not going into the details in this tutorial   right so this is my strong recommendation for  you please take a look at that previous video   the only thing we're going to do now is just copy  and paste some of these lines which I'm going to   explain super super quickly right you can see  that we are getting a configuration file then   we are getting the weights for this model and  we are getting the weights from this model from   our local drive so we are specifying a file path,  a location in our... in our local drive and the   only thing we're doing is specifying the weights  location then we are creating an object which is   our predictor and this is exactly the model we  need in order to continue with this process so   this is a very quick explanation regarding this  code we have over here and now let's continue now   you can see that we need to make a few Imports  because we are not finding these objects we have   over here these functions we have so I'm going  up all the way up and I'm going to say something   like from detectron2 dot config import get config  and that should be... should be all for this function   we have over here then from detectron2 dot engine  import default predictor and that should fix this   issue over here and now we need to import  from detectron2 import model zoo and   that should be all in order to fix this issue  over here I'm going to delete these comments   and that's pretty much all so everything that's  here is everything we need in order to load this   model but obviously we need a model in order to  load right because this is just the default code   we had in our GitHub repository so let me show you  exactly where it's my model in my local drive if I   go to my file system you can see that I have this  file over here which is model.pth and I have this   other file which is labels.txt this is the model we  need model.pth these are the weights of our model   and what I'm going to do is to copy this file and  I'm going to paste it in the in the directory of   this pycharm project right you can see that this is  the main.py in which we are currently working in   this is the requirements.txt file we created a few  minutes ago and this is exactly where I'm going to   paste this model and I'm going to do something  else which is creating a new directory which is   called model and this is where I'm going to put  the model and everything it's okay and remember   I showed you we have another file which is all the  labels we are detecting but in our case this is   a very very dummy labels.txt file because we  only have one category we are only detecting   one class which is tumor and a very very very  quick note is that remember the dataset I used   in order to trained this object detector in this  dataset we had two classes which were negative and   positive and this is something like two different  types of tumors... or that's what I think... but what   I decided to do when I was training this object  detector was merging these two labels these   two categories into only one object and I called  this only one object I called it tumor right   so that's exactly why we have only one class over  here given that... although the original dataset I used   had two categories so that was it's a very quick  note regarding the model I trained and now let's   go back to my file system we are not going to use  this directory anymore I go to model and this is   the model we are going to be using this is the  weights... the model weights we are going to be using   so remember this is within another directory which  is called Model I go back to pycharm and the only   thing I'm going to say is something like model  and then the name is something like model.pth   okay and that's pretty much all in my case I'm  going to run this code in my local computer   which is using a CPU so this is what I need to  specify if your computer or the computer where   you are running this code has a GPU then the  only thing you need to do is to comment this   line and everything will run on your gpu but in  my case I'm going to run it locally on my CPU so   I'm just going to leave this line as it is, now  let's continue now it's time to load the image   we are going to use in order to detect all these  objects so this is what I'm going to do if file   I have to make another edit so we are uploading  a file and we are calling the file the user has   uploaded we are calling this object file so now  if file so if the user has uploaded something we   are going to continue and we are going to call the  image we're going to call this object we're going   to call it image and then image will be Image  dot open file and then something like to RGB right an image is an object we are going to  import from pillow right from PIL import   image okay that should be all, okay now in order to  move one step at the time let's go back to my browser   and let's see if everything executes just fine  I'm going to refresh and everything is just fine   and now I'm going to select an image let's see if everything it's okay the data I'm going to use   it's located over here this is train and val I'm  just going to select a random image which is this   one and let's see what happens we have an error  because this is not called to but this is called   convert if I'm not mistaken let's see now I'm  going to refresh and I'm going to do the same process   again I'm going to select the same image and I'm  going to drop it over here and you can see that   now we have another error because it's not covert  but it's convert I had another typo okay now   let's see what happens I'm going to refresh again  let's hope everything is okay now I'm going to   take the image I'm going to drop it here and let's  see what happens now we have to wait a couple of   seconds we may be loading the model so this may  take a few seconds... and everything it's okay we are   not visualizing the image so if we are not getting  any error that means everything is okay so let's   go back to pycharm everything it's okay so far  and now it's time to detect objects we are moving   super super quickly we are almost there right we  have almost completed this process this Pipeline   and the only thing we need to do now is to detect  objects and in order to detect objects with this model   which was trained with detectron2 I am going  back to my browser and to this repository because   let's see exactly how we can make this prediction  the only thing I'm going to do is to copy and   paste everything that's from here up to here we  don't really need to draw the rectangle but let's   just copy everything so I'm going to copy then I'm  going to pycharm and I'm going to paste it here   we will need to make a few edits but most of the  code will remain the same right I'm just going to   fit this image over here because if I go back  to my GitHub repository you can see that this   image is actually a numpy array right we are  reading this image using opencv so the format   is a numpy array and we need to input a numpy array  right over here so I'm going to do something   like I'm going to Define a variable which is  image array and this will be numpy as array   right we will need to import numpy so I'm  going to say something like import numpy as np   and that's pretty much all now I'm going  to input image array and that should be it   so this is pretty much all, we are going to be  detecting all the objects we have... we are going   to be returning all the objects we have detected  with a confidence value greater than 50 percent   and other than that everything is just fine  and that's it and we don't really need to draw   the rectangle so I'm just going to delete it and  that's pretty much all, so we have loaded the image we   have detected all the objects on top of this image  using our model and now it's time to continue with   the visualization now we are going to take all the  detections all these objects we have detected and   we are going to draw bonding boxes on top of the  image the user has uploaded so this is amazing   because we're moving super super quickly and let's  see how we can continue with the visualization, now   it's the time in which we are going to draw bounding  boxes on top of our images and in order to do so   we are going to use plotly, plotly is an amazing  python Library which I have used many times in   my projects it's an amazing Library you can  do some very very crazy visualizations using   plotly, some very Dynamic visualizations so this  is a very amazing Library we are going to use now   and something that's very important is that in my  tutorials we always focus on the computer vision   part of the problem and everything that's  related to the visualization is not really that   interesting from a computer vision perspective  so what we are going to do now is just taking the   code for the visualization which I have already  prepared over here right this is a function which   is called visualize and this is the function we are going  to use in order to visualize the bounding boxes on   top of our images so please pay attention please  focus because otherwise you may be lost please   take a look at what we are going to be doing  now I'm going to the project I'm going to file   new python file and I'm going to create a new python  file which is called util.py then I'm going back   to this file I have over here and I'm just going  to copy the entire file I'm going to press Ctrl C   and then I'm going to press Ctrl V over here  so this is all the code we need in order to do   the visualization remember the visualization is  very very very interesting and very important but   it may not be the most interesting thing from a  computer vision perspective and that's why we are   not really minding everything that's visualization  everything that's related to how to visualize all   these bounding boxes on top of the images we are  just going to use this function and that's pretty   much all, I need to do a few Imports otherwise this  is not going to work I'm going to import streamlit as st and that's pretty much all if I'm not  mistaken yeah now let me show you something   which is related to all the code I have just  copied you can see that this is the code of two   different functions right one of them is called  visualize and this is a function we are going to   use now in a few minutes in order to visualize  all the bounding boxes on top of our images and   the other function is called set background and  this is another function which is only going to   make a very very very small and very aesthetic detail  at the end of this tutorial which is changing the   background of the web application right this  is only a detail this is definitely not the   most important thing from a computer vision  perspective right this is just changing the   background of the web application of the browser  so this is something we are going to do at the   end and this is also in the code I have just copy  and pasted into this file but now let's focus   on this other function which is visualize you can  see this function receives two parameters one   of them is image and the other one is bounding  boxes and you can see that the image is the   input image and then the bounding boxes are a  list of all the bounding boxes in the format   X1 Y1 X2 Y2 so now let's go back to main because  let's see exactly how we can call this function   over here the first thing I'm going to do is from...  from util import visualize right now the   function is imported into our main process and now  let's go back here and then this is where we are   going to call this function remember we need to  input two parameters one of them is the image we   are going to import... the image we are going  to use in order to draw all the bounding boxes   and we need to input the image in the pillow  format and then the other variable is bounding   boxes right bboxes and please please focus,  please pay attention because we already have a   variable which is bboxes but if we go back to  the documentation you can see that this variable   is a list of bounding boxes in the format X1  Y1 X2 Y2 so this is not the same as this other   variable we have over here please pay attention  because otherwise it may be a little confusing   so this is what I'm going to do I'm going to  define a new variable which is bounding boxes   underscore this is going to be a list and  what I'm going to do here is just appending   the bounding boxes exactly as we need them to  be right so this is what I'm going to do and   if I go back to util.py this is exactly what  we need to input okay so we have this object   over here and the only thing I'm going to  do is to paste this object over here and    I invite you to take a look at this file... I invite you  to take a look at this function visualize so you   can see exactly how it works and you are going  to see exactly we are using the plotly library   and we are calling some functions and we  are doing some stuff which is related to   visualization right I invite you take a look at  this function this is going to be available in   the GitHub repository of today's tutorial but now  let's continue and let's see exactly what happens   if we refresh this website and if we upload  a new image and let's see exactly what type   of visualization we will be getting with this  function so I'm going back to my local computer   to my file system I'm going to take a random  image again and I'm going to drop it over here   and you can see that now this is what we get  which is exactly the same image I uploaded   over here this image over here but now we have  these two buttons one of them is original which is...   which means this is the original image we have  uploaded and the other one is called detections   and if I press this button you can see that we  are plotting the bounding box exactly on top of   the tumor of this brain right... I mean I'm not  a doctor so I have no idea what I'm looking at   I have the impression this is a brain and this is  an MRI and based on the colors I have the feeling   that this is the issue right this is a tumor so  it looks like we have detected exactly what we   should have detected right but this is the data  I used in order to train the model right this is   the training data now let's see if we have exactly  the same performance with a data... with an image   in our validation set right this is completely  complete and absolutely unseen data for my   model so let's see what happens if I just take  a random image like this one I'm going back here   this is the image I have just uploaded remember  now we are taking completely unseen data for my   model and let's see what happens if I move to the  other tab to the other bottom which is detections   and we are detecting successfully detecting the  the bounding box the object we should be detecting   in this image so everything is working just fine  and in order to make it more challenging and more   fun let's see if we can detect an image with two  objects I know that there are a few... like this one   which has two objects so I'm just going to drop  this image here and let's see if we can detect   both of these objects both of these issues and  we can see that we can detect both of them so   everything seems to be working just fine and this  is pretty much all in order to set up this web   application up and running you can see that we  are uploading images and we are just detecting all   the issues in this image and we are just plotting  everything exactly as we should the only thing I'm   going to do now is to use this other function we have  over here which is set background right the only   thing I'm going to do is to change the background  of this web application so we make it a little a little   nicer and this is exactly how I'm going to call  this function so I'm just going to main.py I'm   going to import... from util import visualize and  then set background and then I'm going back to my   file system and this is an image I have prepared  in order to change the background it may not be   the perfect background ever but I think it's going  to work we are going to put this background in our   web application so let's see what happens I'm  going to copy and paste it over here and now   I'm going back to pycharm and I'm just going  to call set background and I'm going to input bg.png and let's see what happens if I refresh  and you can see that now we have a much better   looking background so everything looks much much  better now, now let me open a new image I'm just   going to select for example this image over here  so we can see how the entire web application looks   like with this new background we have to wait a  couple of seconds and now we are getting the image   with all the detections on top so this is going  to be pretty much all for this tutorial this is   exactly how you can create an object detection  web application using Python and streamlit and   this is going to be all for today if you enjoyed  this video I invite you to take a look at other of   my previous videos where I show you how to make  an image classification web application and I'm   going to be posting a link to this othr tutorial  over there so remember if you enjoyed this video   most likely you will enjoy that video too  because it's exactly the same process   and it's a very very similar web application. Congratulations. You have completed my course on object detection. My name is Felipe. I'm a computer vision engineer and this is exactly the type of videos and the type of courses I make in this channel. If you enjoyed this video, I invite you to click the like button. And I also invite you to subscribe to my channel. This is going to be all for today and see you on my next video.
Info
Channel: Computer vision engineer
Views: 16,212
Rating: undefined out of 5
Keywords:
Id: UL2cfTTqdNo
Channel Id: undefined
Length: 275min 25sec (16525 seconds)
Published: Mon Jul 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.