Create YOLO (v5) Dataset for Custom Object Detection using OpenCV, PyTorch and Python Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys in this video you're going to learn how to create your own euro version 5 data set and you're going to do that using the darknet format this will be part of videos in which we're going to create our own data set then we're going to fine-tune your version 5 model and deploy of it into production let's get started for the purposes of our tutorial we are going to take this data set which is provided at Cagle and the name of it is quoting item detection for e-commerce and this one is provided by data trucks and you can see that this one is relatively old 2018 is the year that this one was submitted and you can go to the data Turks website and you can see that we have 907 items and some of those were manually labeled and we have in total nine categories and you can probably guess that those categories represent some kind of clothing and well we are going to use this for example this one images like this one to create a data set which is going to be compatible with your version 5 euro version 5 was publicly open sourced about two weeks ago so this is a very new project and the source code is available on github it is done by a company called ultra lytx or something like that and the github repo contains a lot of documentation a lot of examples and a benchmark about the model and in this video we are going to just focus on the tutorial right here which is going to be trained custom data and I have that opened up right here and this basically walks you through the process of which you have to convert a given data set into the darknet format and it's very well explained right here at this particular video we're going to focus on creating the data set itself so this one is going to take some form of labels and images and I'm going to show you how you can basically recreate all this for our own custom data set the data set that is available on Cagle is just a single JSON file and if you go here you can see that file right here so you can go ahead and download this but I've done something a bit better I've already uploaded this file into my Google Drive so you'll be able to reproduce the steps of this tutorial very easily and here we have an empty Google Cloud notebook and here I'm going to download this JSON file I'm going to use G down for that I'm going to paste in the ID I want with the command I am going to run this this will have to connect to the runtime and here I'm using no hardware accelerator because we don't need one to create our build our own dataset so you can see that this is actually downloading this file called coding dot JSON and this is the whole dataset so let's start with some imports which we are going to need in order to export this data set and I'm just going to copy and paste the imports that we're going to use so most notably we are using top lip we are going to we actually don't need this one I guess we are going to import URL Lib because we are going to download some images we are going to use open CV for image transformations and we are basically setting up our porting libraries right here so let me execute this one and next we are going to read the annotations that are available into this JSON file that will download it so for that I'm going to create an array and this array we are going to fill every wine of the JSON file that we have because this JSON file is actually every wine contains an object so we are going to open the file and for each wine into of this file we are going to the JSON contents of this wine and we are going to obtain the results to the array that we've created so I'm going to do json dot words wine and once this is complete but i have an error right here so once this is complete we are going to have a look at how many examples do we have here so we have 504 which is exactly the number that we had right here so this is pretty much working as expected next let's see what a single annotation looks like and I'm going to have a look at the first one so here we have image height image width we have a label we have some modes which is some text and we have the points and those points are XY and XY so a couple of XY and XY or a couple of points and you can clearly see that these points are not in pixels they're just relative percentages so they're between 0 and 1 and they are based on the image height in the image width and this is going to be some sort of an issue when we are going to pre-process or post process this data what we are going to have to handle it and finally we have this link to the image in it you open for example this one you can see that this is opening an image in s tree of the the example that we are looking at so we have a new era to the image and we have the annotations or the bounding boxes which we are going to need for the object detection right here all right so another thing that we are basically going to know about is whether or not we have multiple annotations per image because you can see that here we have an array of annotations so that I'm going to loop over the annotations and if the length of the annotations is larger than one I'm going to simply display the whole coding annotation and we have just a single example but single example is enough so we have to handle it we have a single example with this image which contains basically two labels one is jackets and the other is skirts so there is a jacket somewhere around here oh yes alright so we are going to write our annotation preprocessor to be a bit more robust based on this information next we are going to get a list of all available categories because we are going to need that to create the darknet format annotations so to do that I'm going to create an array called all categories or let's ages its categories and here I'm going to whoop over all annotations and I'm going to all we're all invitations into the coding examples and I'm going to extend the categories with the labels for each annotations because here you can see that the labels are actually an array so we are adding all those right here and then I'm going to get the unique values so I had a some sort of hiccup right there so I'm going to pass in the categories and then I'm going to convert this set into a list and I'm finally going to display all the categories that we have so have jackets and glasses tops shoes skirts trousers etc etc so we have all the nine categories and we going to check that by getting the land of it go here the main categories so we have the categories and now we are going to split our data set into training and validation set so that I'm going to use the Train test split from SK one and I'm going to create train coding and validation coding split this passing the all of the annotations and I'm going to reserve only 10% of the data for validation so this is not defined so I'm going over here and I'm going to import from SK one model selection import treinta split all right with this and I'm going to go down here I run this again and this should be running and now I'm going to have a look at how many examples do we have into the training and validation sets so we have four three four or five three examples into the training and roughly 50 examples into our Valley so this is pretty good but note that we have a very very little data and we are going to see how well your version 5 is going to handle this very small dataset into next into the next video alright so recall that we need to get this image basically from s3 and we are going to have to save it on our hard drive at least into this Google App machine so what I'm going to create or get a single annotation and if I run this you're going to see again the contents of this annotation and I'm going to take the image or download it using URL Lib request URL open and I'm going to take the content which is going to be this URL right here next I'm going to use the pew image to open the contents of this byte data and I'm going to convert this into RGB format finally I'm going to save this demo image into JPEG format and I'm mixing the casing here so you can see that we have this image right here and if you open it this is the image so this looks very well or very good we have the image on our local machine or quad machine and next I'm going to use OpenCV to read the image demo image/jpeg and rico that i need to convert this into the RGB format because OpenCV is really crappy like that alright so this image is now uh numpy array actually you can see that and you can get the shape of it for example we have the width the height and the number of color channels right here in this rain so next i am going to create or apply the annotations for this image and I'm going to do that by iterating over all annotations and all labels and right here I can get the width of the image which is going to be stored let me just print the roll right here so I'm going to take from the annotation I'm going to take the image height in the image width so I'm going to start with the width from the annotation image width and Google coop is nice enough to understand the structure of this dictionary and then I'm going to take the image height so we have the width and height and we are going to basically need those because again the point coordinates are normalized into values between 0 and 1 so next I'm going to take the points and I'm going to take the first and the second point because we have array of points and next I'm going to take the x1 y1 from those points but I'm going to apply the width and the height so we are going to get real pixel coordinates based on the image width and the image height of this image so I'm going to do p 1x so I'm going to point at this value right here and I'm going to multiply that by the width so I'm basically multiplying this by the width okay so next we have p1 Y so this value right here and I'm multiplying this by the height to get again pixel values of those points and those coordinates so I'm going to do the same thing for the next point p2 all right so we are basically ready to drown in the label other annotation rectangle and I'm going to do that using CV dot rectangle and right here I'm going to pass in the image I'm going to pass in the first point and I'm going to convert this into an int and I'm going to pass in the second the Y point actually and I'm going to do that for the second point as well I'm going to apply a power which is going to be with RGB values three colors for each red green and blue and we are going to specify zero for right 255 or maximum color for green and zero for blue and then I'm going to apply a thickness which is going to apply the thickness to the border of this rectangle that we have right here so if I run this I expect to don't give us any output but the result of this is going to be this image right here all right let me zoom out this and you can see that we have the image I want with the annotation of it which is pretty good and just to make sure that this is actually doing the work I'm going to read the image again not apply this cell run this one and you can see that we are just getting the image without the annotation so this one is working very well but something is missing from this annotation and the bounding box is here but we don't see the label of the annotation so I'm going to apply that to the annotation as well to do that I am going to start by getting the text size which we are going to apply over the annotation and the text is going to be this label right here which we are currently not using and I'm going to pass in a tool to call the CV dot get text size and this one takes a lot of parameters but we are interested in supplying the text itself or the label in our case we are going to specify font face which is going to be formed Hershey plain we are going to specify the font scale which is basically how big the text is going to be and the thickness of the text so this returns a couple of two values of course and the first value is actually another tuple and we are basically interested into this first value so I'm going to extract the information using something like this label width and label height so these give us yeah this give us the label height and label width of this example or this text so using that I'm going to start with a background I'm going to basically draw green background somewhere right here and on top of this green background I'm going to write into white using white letters the label itself so let's start by drawing this rectangle background for the label I'm going to pass in the image and I'm basically going to take those values except that we are going to do some changes for the thickness I'm going to say that we want this rectangle to be filled and for the yeah for the second point I'm going to add the label width to the first point and then I'm going to basically add some padding which is going to be around 5% of the width and for the second point I want to let me actually do this it might be a bit clearer all right for the second point I am going to add two to the first point the label height similar to the point above and again I'm going to add some padding but this time we are going to add around 25% of the padding of the label height so if I read this image again run through this and show the resulting image you can see that we have this rectangle right here so this is placed perfectly and finally we need to do the we need to basically show the text right here so to do that I'm going to see v2 dot put text and I'm going to apply this over the image I'm going to write out the label and this next thing is the point of the origin which is basically bottom left so this is bottom left point of the text so here I'm going to pass in X 1 and I'm going to pass in y1 then I'm going to apply the label here height because we want to go to the bottom and I'm going to apply the same padding so our label will be all the way to the bottom of this alright so now that we have the point at which the text is going to be written I'm going to specify the font face which is the same font that we are using for the gating of the text size this is really important because the text size won't be accurate of the result of the font text the text size won't be accurate and I'm going to say specify the same scale and for cover we are going to go all white and the thickness is going to be 2 so if I run this you can see that we have the class name or the label right here on this background now that we have some intuition about the data set in the annotations that we have so far we are going to convert or adapt this data set to your format or the darknet format so what are the requirements you can go to this train custom data wiki tutorial which is provided by the ultra lytx and you can clearly see that we are going to be required to create a text file which is going to contain the labels for each image that we have and - in this text file we are going to have the class or the category of this object we have the X Center and white center of the bounding box and the bounding box is this thing right here this with the green borders and then we have the width and the height of the bounding box but you can see that we are required to have unnormalized values so this one actually requires our numbers to be between 0 and 1 and here is an example of organization of the folders that are required for the model to run so I'm going to basically create a single function that does everything for us including downloading of the images and creating the labels but to do that I am going to basically paste a hint of what we are going to do and this will be available of course into the notebook that you're going to receive once and this tutorial is complete of course free of charge so this one is a bit more understandable at least for me so we need the cost index in each role of this text file or the labels we have the cross index we have the bounding box center of the x-value we have the bounding box enter Y value of the point we have the bounding box width and the bounding box height and bounding box coordinates must be normalized between 0 and 1 so do everything we are going to use this function that we are going to create which is going to be named appropriately create data set and here we are going to pass in the annotations the Koala incantations and then the data set type remember that we are going to create two date sets first one is going to be for training and the next one we are going to use for validation so right here I'm going to start and by specifying a path for the data set of the images I guess so I'm going to call this data set images and Here I am going to specify quoting images and then I'm going to pass in the data set but I am going to make sure that data set images is exists and this will go ahead and create all the necessary folders if any of those doesn't exist and if those exist is going to do nothing so next I'm going to do the labels so let me rename this into images up and labels but folding labels and pretty much the same thing dead set type for the labels but I'm going to basically copy and paste the same insurance for that our directories are available for us so next I'm going to iterate over the annotations in the coding and for that I'm going to use the index as an image ID and I'm going to get the annotation into this row variable and here I'm going to stick the M to follow the progress of this enumeration first I'm going to start by creating an image name for this and I'm going to use the image ID dot jpg next I'm going to download the image which is really similar to what we've already did I'm going to get the content of this I'm going to open the image and I'm going to convert it into RGB format next I'm going to save this in two images but and I'm going to pass in the image name so we are going to use the magic of the over audit metals right here so this would be a valid Python code and I'm going to save this into using a JPEG compression next our we are going to specify the label name of the file and I'm going to again use image ID and dot text so in this text file we are going to write out the annotations for this image or the bounding boxes in the classes so right here I am going to use the label spot and label let's say that this is label file actually alright and I'm going to open this in mode for writing all right so I can see label name let's revert this and I'm going to use this label file so for each annotation into the annotations and for each label right here I'm going to take the category index by using the owl category the categories that we had and I'm going to take the index of that category next I'm going to take the points which we've already did somewhere here I'm going to take all this I'm going to paste it right here I'm going to fix the indentation and I'm going to remove the multiplication by width and height because darknet requires us to have those in to 0 to 1 normalize coordinates and I'm going to calculate the bounding box width by subtracting the value of x to the value of x1 from x2 and for the height I'm going to do exactly the same thing using the Y values and finally I'm going to write into category index I'm going to write out the x1 center point for the bounding box for the x value so I'm going to take x1 and I'm going to add to that half of the width next let me just do this next I'm going to do the same thing for the Y values but I'm going to use the height and take half of that next I want to specify the bounding box with the bounding box height and finally I'm going to add a new wine all right here so we have the category let me just double check this we have the cross index or the category index we have the bounding box X Center we have the bounding box Y center point so X center Y Center and then we have the bounding box width and height and those are normalized between 0 and 1 because our dataset has already done this for us all right so let me just continue but by calling this function on the training data set and we are going to do the same thing for the validation they'd set and let me just open up this and start running so Michael you've got an expected keyword argument parent so it's parents all right if I execute this and run this you can see that the process has started so this took about a minute to complete and you can see that right here we have this quoting directory we have the images and the labels and if you open up those you can see that we have trained and validations and training validations so let's just start or continue with exploring the structure I'm using this tree command which we need to install and this will basically give us a nice representation of the structure for two levels deep and we have the calling we have the images the labels exactly what we've seen right here and we have trained and validations for each of those so finally I'm going to get a look at let's say the labels for the first training example that we have and this is the example so we have category index of 4 we have the next values which represent the center of the bounding box and have the bounding box width and height and all this is in normalized values that are required to be between 0 & 1 now you have a complete darknet compatible data set and you can go ahead and try out this dataset using the Euro v5 project or you can wait for the next video in which I'm going to show you how you can do some transfer running or fine tuning with your v5 using the data set that we've just created so guys please like share and subscribe if you like this video and I'll see you in the next one bye bye
Info
Channel: Venelin Valkov
Views: 50,286
Rating: undefined out of 5
Keywords: Machine Learning, Artificial Intelligence, Data Science, Deep Learning, Object Detection, YOLO, YOLOv5, PyTorch, Dataset
Id: NsxDrEJTgRw
Channel Id: undefined
Length: 36min 32sec (2192 seconds)
Published: Sun Jun 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.