YOLOv5 - Training Yolov5 (object detection model) on a custom dataset using openCV and Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello hello welcome back i'm munip siddiki and i'm a computer vision engineer i did my bachelor's and master's in computer science and engineering with a particular focus on machine learning and deep learning in this video i'm going to show you how to train yolo v5 on a custom data set we're not going to touch with the network architecture but you can recon figure the network architecture as well so the main focus of this video would be how we prepare our data to train eulo v5 on our own data set i'm going to use a google open image data set i download some data and then i will show you how to prepare the data before you train yolo v5 i'm not going to details of the network architecture or for because this vid i'm going to keep this video short so i'm not going to uh go for the details of the yellow network architecture but i can just give you a little bit of information that yellow v5 is a well-maintained repository on github so there are different models you can use it from very small model to very big model for object detection of course there is a trade-off between a small model and big model in terms of speed and um accuracy it means the small model would would have the small model has faster inference while the less accuracy and the heavy model has um better accuracy but a little bit lower inference so in practical scenario you would have a little bit delay on your fps let me see so to train eulo v5 on your custom data set the first thing you need to do is to come to the ulv5 repo which is ultralytics hula v5 in here uh you'd you'd find enough information about the eulo v5 it's very well documented and you'll get almost anything that you need to train this network on on your local machine as you can see first is you need to clone this data which i have already done so please do it and then switch to the ulv5 repository after cloning then you need to create a virtual environment using conda or the pip virtual environment and then install all the requirements in that environment and when you're using pycharm make sure that your environment is actually set up for this environment that you have just created it's always good practice to separate your environments or virtual environments for each project because it will avoid some of the conflicts among dependencies and then if you want to use the yellow v5 on your own uh like the pre-trained uh yellow v5 without any customization here is the just like four lines of code and you can use it but uh remember that eulo v5 is trained on cocoa dataset so you would have around 80 classes right that's all for eulo v5 a little bit information on open image dataset supported and provided by google in here if you come to if you just type open image data set on google you'll you'll find this and then if you type the name of the object that you are interested you will get example images with bounding boxes segmentation and localization so if i click the segmentation part and then choose the category of tomato i'm getting the segmentation for this tomato um there there is a command line that you can use to download the data whether you want to download it into like only the images are the images with the annotation file it's all up to you if you have your own dataset without like downloading this data and using that and if you want to annotate that data using a free tool please check out another video of mine called uh label me yeah you you by using the label me you can annotate your own data set and the good thing about label me is that while you're annotating your data it automatically annotates your data according to euler v5 uh input data format so you will not need the pros the steps that i will use to convert this annotation into hulu v5 format i think it's enough talk for the theory and a little bit information on the data set let's jump into code and see how we can prepare our data so um so basically i downloaded four classes of images from open image dataset and here i just want to show you a few examples of images from binicular so i'm just going to delete these images because they're going to create problems later on they're not true images so it's like cached so we have the binoculars images as you can see in here and i have another class of mango we have the images of mango and i have the images of tomato and finally zebra so these are the four classes that i'll be using for um for for this project and for each class and each image we have uh pascal format data which is stored into xml file uh there are some important information in this that we need to use that to train our eulo v5 model and that is the image name the width and height of the image for normalization and binocular then oh sorry the bounding box of the image so this this image is uh basically the annotation file in here is uh in the form of pascal of a data set format which we have the x min y max y min and y max of the boundary box however the hulu v5 format needs something different pascal gives you the min of x y and the max of the x y but yellow needs the center of the object uh the the center coordinate of the bounding box of the object and the width and height of the bounding box and it has to be normalized along the x and y axis of the the image which is in here so in this example we have um two objects uh which is person and then we have uh tie and then as you can see in here this is the bounding box for the first person which is done and then we have another bounding box for this person and then we have the center uh coordinate of this bounding box and then we have the height and width of the bounding box similar for this object and finally similar for this object so we need to convert our xml file into a file similar to to this one sorry to this one in here we have the the class id which is an integer referring to the class name in this case class id 0 refers to person class id 27 refers to thai and then we have the x center the x center shows the the center point of the bounding box and the center point of the bounding box on the axis and then the y center shows the y y center point of the bounding box and then we have the width and the height of the bounding box as you can see numbers are normalized which means that they are less than they're in the range of zero and one so we're going to normalize the annotation file and make our data ready and then give it to the date to the yellow v5 so how are we going to normalize our uh how are we going to convert our data from xcml2 hulu v5 input data format there are scripts available online there are different types of coding you can use i do it in two steps first i want to extract information from the xml file into programming how to say like in a programmable fashion and then i'm going to store this uh extracted information from each file into a separate folder and change the the xml suffix to.txt um [Music] all right let's do it to read the xml file um just please structure your program like this the code is available you can download it and run it onto your machine to check we are going to use the opencv so import the packages and this is the package which is very important in this case to parse the xml file we have the windows name to preview our images and finally we have the class name to id mapping which i showed you before that for each class we have uh an id so in this case i'm going to give it an idea of zero one two three four uh biniculars mango tomato and zebra respectively and then uh to to show you i i don't want to like confuse you with the the number of files in here i'm just going to choose one um file xml file from here and then i want to do the conversion and the reading everything on this file and then i'm going to put it into a loop and then so that the program will will deal with each file i'm going to so this is the the file that i have xml file again and this is all right in here i have this html file and then i'm doing i'm going to convert this uh i'm going to extract information from this and also i have the driver program to check um each method that i create in here uh again i'm going to tell you that you can you can build your own code for this you can use a class to develop the code or you can just create separate functions so it's all up to you it's it's okay the the point is to solve the problem i'm going to define a function to extract information from this xml and then i'm going to give it an xml file as an input i call it root and then from et i call purse parse and then i give the xml file to this parser and then get rid so we're going to extract the information from this xml file first and it has to be in a dictionary format so i'm going to initialize the dictionary info dictionary and then i'm going to um well one of the elements of this dictionary will be the bounding boxes i'm going to initialize it with that one as well um bounding boxes and then this is going to be a list of bounding boxes because in each image you will have in some of the images you may have more than one bounding box like let's say there is two zebra in one image so you will have two bounding blocks for that and the next thing is i i want to loop um through elements of this html file here so i want to loop through these elements in root and then check the the tag if element dot tag equals to okay first let's get the file name from this one uh we have this uh file name so i'm going to check the the tag um for the file name so i will call it file name and if file name if we get the file if the the tag of the element equals to the file name in that in that xml file then i'm going to store it into the info dictionary so i'm going to say the key of this info dictionary will be file name and then element dot text in here we use the text so element.txt will return a string and that is going to be this string which is the filename of this which is going to be basically this one next say else if element dot tag equals to so next let's check the the size which is um the image width and height and the channel so i'm going to use the size and then i'm going to create a so i'm going to store this size into a list which is the first index would be the width and height the width and the second index would be the height and the third index would be the uh channel of the images let's do that so i'm going to create an image size a variable and then loop through each elements which is basically i'm going to convert that so let's do it like this for sub element in element and then i'm going to get that sub element and then convert to integer since it's going to be it takes into your sub element for else sub element in element and since we're going to get the value of this i'm going to use the text so that's it and then let's save this into our dictionary that we have initialized above i'll call it info dictionary and the key of this would be image size equals to it's always best practice to store your like to convert your list to topple because tuple is a little bit faster than the the less so just convert it to tuple nothing more so we use the tuple and yeah so so far we got the file name and the image size from this html file the second the third information that we need to get from this uh xml file is the bounding box and also i want to get this um the name as well so the class name and the bounding box let's do that so we say else if element.tag equals to so we have object i guess yeah the tag is object if it equals to object then we're going to create a bounding box variable a dictionary of bounding box and then we are going to add each element separately into this dictionary so let's let's check for sub elements into this element because we have sub elements so we call it sub element and element and if the sub element dot tag equals to name which is basically the class name name then we're going to add the bounding box we call it class equal to sub element dot text else if so we got the sub element name which is um this one and then we're going to check the b and d box else if subelement dot tag equals to bnd box which is bounding box then we have another loop so to get that x-min y min x max and y max so this has to be double equal then we're going to check the sub sub element so we're going to say for sub sub element in sub element and then we are going to store that into let's get the tag the tag and then we are going to get each um x-men y-man x max and max so i'll get a sub element sub sub element dot txt and since the dot txt returns a string and the value are integers let's convert it back to integer and then we need to add this bounding box dictionary into our um top level dictionary so we go we say info dictionary and then we call it bounding box um bounding boxes and then equals to bounding box yeah so that's all for extracting this xml file into a little bit like a readable format i would say our programmatic way readable and then we're going to return that finally return the info dictionary let's check it out and then you will get a better understanding of how does how does it look and why did we do that um i'm going to print it right away extract info and then i'm going to give that xml file path here and run this all right so yeah this is the the information that we needed to get from the xml file we we have a dictionary that uh the first key of the section dictionary is another dictionary and the first key of that dictionary is uh the the the the class name we have the x-min x max y main and y-max and then we have the file name which is pretty necessary to get to to save it and convert it we want to save each file according to the file name that we have and then we have the image size which is the um width height and okay width height and depth of the image so this is the step one we created um i'm just calling it a step one step one extracting extract xml info and in step two what we need to do step two we're going to save the extracted info into disk so um we're basically right now trying to convert and save it to the disk so convert to ulo v5 format and save the extracted info into the dex disk so we're going to convert it and then save it to the disk to the disk let's define our next function and convert to euler v5 and then in here um what it takes as an input is the info dictionary that we have just um extracted so i'm just giving the name as it is and it's also good practice if you want to give it a hint for example saying that this is a dictionary or this is a string or how does it return so for the next videos i'm going to use that to make python a little bit more readable and understandable from a program programmer perspective and all right so now let's let's assume that we got our info dictionary which is uh something like this now let's first uh loop through each uh bounding boxes that we got which is this one bb box uh b boxes so we call it for b box in in for dictionary bb block b boxes and then try to find um the let's let's convert the classes into its perspective ids we have this class name and we want to convert it to their um corresponding ids try and it's also kind of sanity check to make sure that we have the classes that we already defined so we say class id and then we have class id to map and then we give it the bounding box that we have created that we have taken and then we from this bounding box we give the class name yep that's it for each try we need to have an exception and that would be key error and if this happened we say invalid class must be one from let's say the last name added to mapping dot keys yeah so we got our class id first step second step uh what we need to do is to transform this uh bounding box um in here we have this uh bounding box as a format of x-min x max y min y max to to change this to into x center y center and bounding box within height let's do that transform transform bounding box this is into ulub5 format so i'm just writing x center y center and then we have width and height all right um let's just call it box box x center i'll call it b box x center which is bounding box and then it's a little bit more intuitive b box y center and then we have the bounding box with and then we have bounding box height so it's we get the we are going to add up the bounding box x-men x-men plus bounding box xmx and then we are going to divide it by two what is this you are not welcome here all right and then we divided by two and then we have the bounding box which is pretty similar so let's let me just copy paste this ctrl c ctrl v and then i'm going to change this to y and then this one to y and that's it for finding the width we need to do is to get the bounding box um this one i'm going to copy paste this x-min uh basically it has to be x max minus x min and to find the height we are going to have y mix and then y min yup that's it we converted our bounding boxes to from x-min y-min x-max to y-mix 2 to the format that is required by yellow v 5 and then the next step is to normalize it so let's say normalize the coordinates by coordinates normalize the coordinates along the x along the width along the width and height of the image and it's uh easy all we need to do is to get the image with image height and we don't need the channel so i'm just putting it at dash and we get that as well from the from the info dictionary so in here i'm going to write input dictionary and yeah it has to be image size and then we are going to normalize the bounding box center which is basically um this one divided by image width divided by image sorry image with yep and what i'm going to do is to get a duplicate of this and then we have y and then this has to be height and then i'm going to normalize the width and height so let me just width divided on width and then we have the height divided on the image height and stored back into the bounding box height so we did that normalization as well if we come back to the format before we we finish with the with the format we got the class id we normalize our image we convert the image uh the annotation the bounding box x-men y min x makes y mix into a bounding box center value the boundary box width and height and then we normalize it onto in the range of zero and one along the image width and height let me come back so the next step is we need to save this into into into the disk to do that um well there is uh there is i'm going to create another another function to do that for all the images and annotation files for now i will just make things simple all you need to do is to create a folder in here new directory and then give it the name annotation annotations and then we have these four classes right we have created these four directories as well so we have biniculars biniculars close and then we have mango and then we have tomato not a python directory simple directory and then we have zebra and then in here just address of those classes and save it so i'm going to use print to save this file so i'll create a print buffer in here print buffer equals to a list and initialize the print buffer with a list and let's say print buffer dot append append and then we have this five information like five pieces of information we have the class id we have the bounding box x center y center and width and height so i'm going to create five of this and then reformat it let me give it a space our first our first is going to be the me format it so the first uh piece of information that we're going to put into print buffer is the class id then we have the bounding box x center the order is very important by the way bounding box x center and then bounding box y center and then the width as i told you the order is very important so we're going to check this again we have yeah so we have the width and then the height all right uh the width and the height bounding box height and here let's make it that we want to store only the up to three floating up to three digits of the floating points three digits of the floating points up to three digits of the floating points up to three digits of the floating points and then let's save it uh so we're going to save it into each file that we have created so i'm going to give it a name save file name and that's going to be ospat.join um we have this annotation annotations and then we have this info dictionary and then from that info dictionary if you remember we got the bounding boxes so i'm going to say the bounding boxes bounding boxes and then we get the all right let me since we are using single quotation inside let's put double quotations with a single quotation would be treated as a string and then we have from here we get the first element and from the first element of dictionary we have the key called class let me get that key as well so class so what what what it does basically it takes us to annotations and is the class name because i'm going to structure it in a way that we can use it later on in our loop so i make sure that it's it should work automatically rather than like typing it by myself and then yeah i'm going to give that and this is going to be the first element uh the first parameter of this and then the second one would be info dictionary uh getting the file name here file name and then from the file name i'm going to replace replace by the way why we get this okay i think we're okay to go ahead um let me just do it this way yep i think we're fine to go all right replace um rip replace html with txt uh again i'm going to use it later on for the image purpose so i'm going to call it replace jpg to html so the file name in here is actually a jpg file uh yeah we have this jpg and i'm going to replace it to txt and then store it that's it we're going to save it so let me save this dot join and then i'm going to just join it with the information that we got from the print buffer and then we have file i'm going to open that file that we just created save save file name and write it and then return a message also print out a message saved let's say saved let's say save successful i'm keeping it simple all right so again to to summarize it what we did in here we got our dictionary information and give it to another function called convert to yellow v5 we are going to get the class id we transform the bounding boxes into yellow v5 format which is the bounding box center value and the width and height and then we normalized each value of the bounding box along the x and y axis of the image and then we save this information into a file and save the file into the disk and we show a successful message in here i'm not going to print it i'm going to store it into a dictionary into a variable and then i'm going to convert it to your v5 so convert to your v5 convert to euler v5 let's see hopefully we are not getting hopefully where we we do not get any error yolo v5 and i'm going to pass the info dictionary in here let's just say the string indices must be integers oh let's see that let's see how we can solve it let me turn on the debugger in here and then debug it what's the information from the bounding box that we get okay so we have the bounding box equals to the class and that's not what we need to get so basically i think the information is um the problem is caused by this function extract and pose so we have the bounding boxes that we created in here we have the we added this bounding box into this bounding boxes and then if you come here all right so we have a list in here and then we equal that list to that bounding box which is not correct we gotta append it each bounding box into that list and hopefully it works now let me rerun this debugger one more time and see okay now we have we have gotten our class which we need the binocular uh the class name in this case biniculars and then we get the x-men ymin from the bounding box i think it should work now let me rerun it oh my laptop yeah save successful if we come here and this is the text file that we got we got the class id x enter y center and width and height of the bounding box that's it and the next step would be to do it for the all data that we have let's do that part so before we uh before we go to generate all this data and then convert it to uh their respective an annotation file before we do that let's do some sanity checks which is to make sure to make sure that each bounding box corresponds to uh their each annotation that we have just converted corresponds to the correct bounding box on the image uh to do that i'm just going to create another function call called draw bounding box and then we get the image and annotation text file that we have just converted so i'm going to open that file with the open annotation text file as a readable file then annotations would be annotations equal to file dot read and then split the space that we have in the file the space the each line if there are more than one line like if there is uh two bounding box on the same image and making sure that we get it from from all the lines as many bounding boxes that we have we gotta we gotta split that and then store it into annotations and let me since we are using with single file in here let me just call it annotation annotation and the second step is to create it so let's see for x in annotation x split yeah let's display the space from the annotation file as well and then annotation equals to for y in um for x in annotation again and then we will have another list in here for y and x and then convert this annotation value into float so we have we loop through the annotations which is the line that we have and then for each annotation like the x y um like the value that we have we want to make sure that all the values in in in float format including the class id it's okay so we convert this to annotations to a numpy array so basically it has a form of um again something like this class id we have the x center y center bounding box width and bounding box height yeah and then so let's just uh if if you would like to know that you can just print it but i'm just going to quick so let me get the image width height and channel from image shape that we get so that we can denormalize our annotation files denormalize let me print this so that you'll understand better and we have annotation value before denomination values before denormalize before dnr amalization and then we have this annotation here and then we are going to denormalize which is basically getting the actual uh x y and image bounding box width and height coordinates uh on the on the actual image so we're going to denormalize the value that we normalize it during conversion so denormalize and that is denormalize annotations so what we need to do is to denormalize it we need to get there let's let me just keep a copy of this annotations copy numpy copy annotations and then annotation copy so let's go through all the rows that we have and then we get the uh x x center and width of the bounding box x center x center and width of the bounding box and then we we're going to denormalize it so basically to denormalize it we are going to save it into annotations again all the values all the rows but only x center and width of the image so we're going to multiply it by the image width and similar to this we are going to have we have a problem in here and that says that annotations come we have this annotations the values from one to three okay first index and second index and then multiply it by with and then here we have the second and last index and then multiply it by height of the image and this has to be four and this has to be 2 as well so in here we are going to denormalize the x center and bounding box with and in here we're going to denormalize the y center of the bounding box and bounding box height so the denormalization done we can print it out if you're interested a notation value after the normalization normalization and this is the annotations cp so we did denormalize our value to to make it suitable for uh opencv all we need to do is to convert it back to min max value um similar to the first value of the annotation boxes to do that we um let me do one thing this shouldn't be in here okay yep and to do that uh what we need to do is to convert our annotation back uh annotation values from x center y center to uh min max values let's convert it back convert to x-men y men x max y min and y-max y-min and y-mix to draw what to to draw cv2 dot rig tangle so to do that we need to get the annotations um first element of the annotation cp and then we say annotation cp the first element minus annotation cp the third element and then we are going to divide it by two so what we are doing in here we try to find the the minimax for each uh annotation so we're going to convert each element into minimax uh for the x and y of the bounding box and to do that we get the first element of this annotations and then the third element from this annotation which is the x center and bounding box width so i'm going to the center minus the width of the bounding box so we have the center minus width and then we are going to divide it by two to get the uh the rectangle and then so it's like this you are getting the minimum value so we get the first element and then we're trying to we're trying to like you get the first element and then you convert it to minimum value and then you get the second element and then you convert to the maximum value so we did we did it like that and then we're going to copy this four time this has to be four and this has to be three it's okay and this has to be four i'm just doing it quickly you do the calculation it's pretty easy this has to be one and this has to be two so what we are doing we're we're we're doing the opposite of this one we're doing to transform the bounding boxes into back opencv format and then we we are going to draw each annotation box so let's do it and we don't have this 2 in here and also we don't have the stay in here so to draw it we come and say for single bounding box for single annotation in annotation cp object class which is the class name we have the x 0 y 0 x 1 y 1 which is basically the x min y min x max y mix and we call it single annotation uh we get from the single annotation then cv2 dot rectangle we have the image we got and then we are going to use the um values that we get so point one would be integer x integer x 0 and integer y 0 and point 2 would be integer x1 and integer x2 y1 and then we have the color so i'm just going to give it a color two five five zero and zero rgb color so probably blue color since opencv is working with bgr images and then i'm going to use cv2 dot um give let me give it a thickness f2 and then cv2 dot i am show the windows name that we create that the the constant name that we have in here i'm going to use it in here windows name and finally the image give it a wait key of one delay let's let's have a delay until we press the skip and then once we skip the with once we s press the skip button then it has to destroy all the windows dictionary created and then let's run it so if you come here it has to yeah file saved and if we come to zebra we have this annotation file and then i'm going to read that just for the sanity check purpose to make sure that the annotation uh the normalized annotation file that we have just created and the hulu v5 format is working uh fine and then let's run this and see so as you can see we have a problem in here the annotation uh is not really um like the the bounding box is not drawn properly on the uh image so what we're gonna do we come back i think we have a problem with the mix man um in here we get the first element okay i think it has to be two three and four let's run it again okay let me stop this and rerun yeah so the bounding box is uh correctly drawn on the image so it's okay now it's working fine uh the the final step to to to make sure that our our data is properly converted into all of our data is uh converted to i don't need this anymore so the the the next step is to uh create a loop so that all the data would be uh taken from this image data set and each should be converted into their uh into the annotation file format of euler v5 let's do that that one as well i'm going to create another function called data generator and then i will call the function that we have just created and then i'm going to call these functions into that data generator function so to do that let's define our function call the data generator and then this data generator will get a root pad for the data and the type of that has to be string and i'm going to get the root path back root path and then we get the annotation dirs annotation dir initialize it with a dictionary and then we call for classes dir in where is the list dot root path so in in our case the root path is basically the image data so we are going to get the room image data and then we have these all classes okay so we say if class is dir classes dir equals to um there is a hidden file called ds store in my um in in here um if i list it out uh on cmd there is a hidden file called ds store so i just want to ignore that if the that contin if the class directory name matches to this file to this directory then ignore that go back to the loop and else we want to have our annotation directory the dictionary that we we initialized we get all this class dir and then we want to get the path pad.join for each class dir we want to add the path for the for uh annotation directories which is here so we get the root path and then we join it with a string of classes dir and then we have the pascal class in here so we have something called pascal in here we get the pascal pascal pass call yep we have um i think i need to put it into a single partition all right it has to work now yep and if you want to know the the file path for that you can just take the keys for this one annotation keys annotations dirs and then annotation dir directory the two keys you get there so we are trying to get the to this file directory okay for each of these classes here and the next thing that we are going to do is to get the class name for each of these directories let's do that for class name in annotation dir dot keys that we have just printed out and then we are going to get the xml file for xml file name i'm going to use um bar to show us that we are going to read each xml file so i'm calling this for xml file in tqdm dot remember that i imported tqdm above here so if you just import to qdm you can just use tqdm.tqdm otherwise just tqdm is fine and o is dot list dir and then we have annotation dir and then in here we have the class name and let's get the xml file path xml file path is equal to alwayspad.join and then we get the annotation dir class name and class name and then the xml file name so that's how we read each file xml file from each of these classes that's all and let me put our case if always path path exists if always path exists dot annotations so you know that we have this annotation file let's check if this that one exists then we call that function extracted info um did i just call it extract inf oh my goodness this is info and we're coming back extract info and then we get give that xml file path and we call this output as an extracted information and then we pass this extracted information into our uh convert to yellow v5 information so let's let's do that convert to ulv5 and extracted info a few if you recall there is a print that i have created so i'm going to remove that i don't want to like have a lot of logs in here and if this is not the case then make dirs or it should be like this if this exists do this else create uh make dirs i think i i can do it in a better way if not if not always exists then or is that make dirs create that and then get out of the loop so let's create the directory which is uh in fact we have the directory but in case we don't have it so i'm i'm trying to figure out that part we're going to create a directory as the name annotations and then class name okay we're going to create a number of directories and then extract info and then convert that yep that's it else we don't need this else and we say xml2 at the end just gives us of print euler xml2uv5 txt conversion successful and then we we call this function in our driver program so let me just do this data generator and then i'm giving it the path which is the root root path so root path in this case is image data that's it and i'm giving it the root path here so we have this um files and then we want to like um uh create all these annotations and move it to this one this one so i'm going to delete this you can delete all of this it's okay it doesn't matter so let's run this hopefully it works right so we actually converted xml to your v5 conversion successful but i don't know why we get this after the tqdm which is here it has to be printed at the end class name and this one we have class name anyway don't we have anything weird here so if we come back to the annotation file um we do not have the the files in here all we need to do is to make sure that all the files are moved to the annotation file so we have to do that let me just if not exist it works but it exists so it does not work if exists do that if it does not exist make dirs if it does not exist if it exists if not always exists class name create dir okay and then extract info and then convert to yellow v5 and save it i think this has to be here ctrl x let me just bring it here and then remove this part and then it should work let me just try it again okay let's check this part again come into annotation folder and you'll have all the annotation files and then you have the mango and similar to that for the other one as well for the tomato and zebra so we successfully converted our annotations from yellow from xml to yellow version the last step is to split this data into um train validation and test it to do that i'll just create a utility function called um let's just like move these files to another another folder to do that i'll just create a utility function called move files to folder and then we give it a list of files files and destination folder and then we use for f in list of files try we have the module called shuttle dot um i can copy move so for safety i just use copy for now but you can use the keyword move so that i'm i'm checking if it's working properly or not if it works fine then you can just like um change the copy keyword to move except print f and say assert false that's it um i think we don't need this part i call it utility function to move files and then i'm coming to the very last part of our um the very last part of conversion so we we need to like create uh three folders uh by the name of test test folder and validation folder and trend folder so that we can move our files to there and we also need to make sure that our images are randomly shuffled and while we are randomly shuffling the images uh the annotation and image files should be like a match to each other to do that um we come to we come to our driver program and then we don't need to run this part again so i'm just comment that out and all i need to do is uh to split this images so a split before that let me just do one thing so we need to get our data and we need to get our images and annotation files to do that i'm going to create a data initialize it with a dictionary so this is going to be a dictionary and then we have path to annotation files and is going to be a list and i have path to image files it's going to be a list as well and then i'm going to read all the class id in os dot list dir annotation root so i need to put the annotation root something annotation root which is basically our um annotation files in here so we call it annotation it's annotation yeah annotation root and then we have annotation root in here and we get the annotation class dir always pad the join so we get the annotation root and the class id and we get the image class dir which is directory we get the os path join annotation um okay we can do that with root i guess image root which is there root path i'll call it okay root path roots path where we have root path and then we have um the classes so class id and if you if you look at that root path of this image we have this image images a name so i'm just putting it as images and that's all i have another loop to loop through all the annotation files so i'll call it annotation file in os.list dir annotation class directory and then i have to get the annotation path for each file voicepad.join and then i'm going to get the annotation class dir annotation class dir and then we get the uh annotation files so we are going to yep annotation file annotation path annotation um it has to be annotation file i think path to annotation file path i think this is annotation file assume i just don't want to like use words to confuse you annotation file i think better to use single annotation file and then we have the image path so image path is equal to is paddle join and similar to that we have image class dir and then we have um okay let's get the annotation file in here as well so annotation file and then we replace the txt txt to jpg that's it for our annotation files and then what we need to do is to append this to our list that we have just created so we say path to annotation files dot append and then we get the annotation path we give the annotation pattern here and then path to image files dot append mg path so we store all of our annotation file path which is in here and image images which which are in here and we store it into our list and then what we need to do to split that so let's split split the image data into train val and test sets and to do that we need to use the train split set train split function of the sqlearn so i could learn so i'm coming to um we're getting the images train images and um what will be the next one validation images images let's make it blue plural image and finally and train annotations and validation annotations and we give it uh as train splits it so we get the pad to image files in here and pad to annotation files in here and the last argument is going to be um like the we give the path to annotation file and path to image files and then we are going to uh specify uh the the size of the uh trend so i'm in this case i'm going to use um size like 80 10 10 which is 80 percent for uh and 10 for the validation purpose and ten percent for the uh test purpose so to do that we come to give a test to split in here 0.2 test size which is actually validation size this size will be 0.2 and then random state one equals to one so that's it and then from the validation part we're going to get the actual validation so i'll call it val image val image and val [Music] and test image val annotation and test annotation again the same function we are using train and uh trend test split and here the only difference is that we're going to give it the vowel images in this case and then the val annotation that we define above so val annotations and then test size this time would be 0.1 which means 10 percent random state no needed um i think we're going to divide it into two parts yep so and random state one so we we get our validation uh data which is uh twenty percent and we split it into fifty fifty percent to fifty would be for the validation and 50 would be for tested so to combine together in and from the total portion it will be like 10 10 percent and there is a typo image and then let's let's move our files um there is uh one thing i want to i want to like point out point out and that is that uh in yellow v5 uh all your images and data should be stored in an image and labels format because the eulo v5 model will search for image and then accordingly replace the image part to labels and then search for the uh corresponding labels for that image and that's why we need to be specific with the name so we can't really use in the name for now if you want to change uh or adjust the setting of the yellow v5 for sure you can do that but we're not going to complicate the stuffs i'm just going to keep it simple so files to folder the utility function that we have created and in here i put the train images and then images label folder and the images into the image folder again i want to emphasize that uh the images name and the labels should be like in in this order like the eulo v5 will look at this path and then change the images to labels to find label for the corresponding image let's run this and check the result all right if you come to the uh to each of these folders we see that these are our images if you come to uh since i choose a random one we have like images from each classes like different classes in here and similar goes uh the same same goes for the validation and also for the test and if you come to the train test we have all these text files that is in here we have two bounding box for this image so it's so it shows two rows of uh data and that's it the next step that we need to take is to give this data into the network so we're in the second step of well first step was data preparation that we did second step is to give this data to to the network to train the network so we're going to train training train the network and the third step would be to test the train network to do that uh what we need to do is to create a yml file and then place it into the data folder of this yellow v5 in here we come to this and then right click and then specify file and then we say custom custom data set um i will call a tml file so let me close this let me delete this and create it again in the data folder create a uml file and name it custom dataset dot tml in this cml file what you need to do is to specify the path of your data which is the train val and test so we say train and that is uh in in where we have the train data so it's in the images and train and then we have val so we go to a directory back and then images and then val and then we have the test data images and test in here it's important again like the yellow v5 is looking for for the annotation or the label of each split of the data by changing the images into the labels so what what we need to do is to specify the number of classes let's call it number of classes and c is four we have four classes and then we give the class names so we say names equal to if you come back to our ulob5 the class name has to be in order of zero two three uh which is four classes so we gotta keep the order we have binoculars i'm just going to copy this and then um come in here and then paste it and then remove the keys uh the values let me just keep the keys and since this is a yaml file it has to be column not equal and this one is path pad 2 dataset data splits which is train val and test so we are ready to go to train the model but before that uh let me just go to some of the files in here what do we have if i open here we have in data some sample images in here we have the hyper parameters that you can use for fine-tuning uh the for example if i come to this low um model we have this learning rate uh final one cycle learning rate we have the momentum weight decay uh warm up epochs warm up bias learning rate boxes and so on if you want to like tune or play with this parameters it's still okay you can do that but it's recommended to use the default one unless you change the structure of the network which is you need to specify your own learning rate or optimization function and this is for the uh as in the beginning of the video i mentioned that a yellow yellow v5 come with in common uh three forms which is a small medium and large so we have the low mid and high this is the hyperparameters referring to small medium and large models and in here you get the cocoa data set which is the number of classes and if you if you um the reason i put that data as a custom data set is i choose it the format from here and this is the number of classes and the names so we did the same format and we instead of giving the cocoa data set we're going to give the the actual uh custom dataset that we have created and this is a little bit more information different dataset we have number of classes sku these this visual drone um that we can train it on so that's how how it is and um coming to the models we have a yolo v5 um uh large which is the structure this shows the structure of the model so we have number of classes 80 which is the cocoa data set depth of multiple and course you can change that if you if you want to uh backbone if you want to customize the backbone of the network you can customize it in here and this is the head and that's it so this is for the nano model and this is for the small model and so on and in hub we have all these um other parts i think what we need to do is there is one more task that we need to all right let me check so we don't need to go through that we have some um scripts in here to pro to fetch the yaml files and and um basically give it to the network to train the network or to make the inference what we need to do is to um get the pre-trained weight and then train the network okay let me just train this network to train the network you can do it uh okay so it's here yeah sorry um all we need to do is type python and train dot pi and then here you specify the size of the image which is in my case it's fixed you gotta give it a 640 by 640. so the image size is 630x640 and then from the configuration file we need to say the model i would say a yellow v5 um [Music] well i think we don't have the the the files available so let me just check um i think we don't have the models the models has to be in here okay so yellow v5 is small okay what we need to do get this configuration from models eulo v5 yellow v5 medium.tml and then the hyperparameter basically hyp we get it hyp um so we have the hyper parameters in okay sorry epic parameters are in data hyp and then you gotta specify the hyp so we are going to train it from scratch data hyp dot scratch and the next parameter that you need to specify is the batch size epochs and the data so we say the batch let's say 32 and then the number of epochs 100 and then the data we say custom data and it has to be data custom data that yaml this this part this uh part is very important we are going to give uh we're going to give the custom data to to the model our own data to the model and then finally weights um unfortunately we don't have the weight setting so that we can use the some of the pre-trained weights i think we need to download that let me just still check that weights and then we have to give specify the model weight so if i come here so let's check how we can find the weights um there is uh scripts okay we have this width we need to run that um i'm going to open a new terminal since i don't want to mess up with that one right now so that's all you can you can use it on there it's okay conda activate yellow v5 and then cd desktop yellow v5 again yellow v5 cd well let's run it directly we have it in data scripts and download weights it's not a directory i think we just need to do it this way so these are the pre-trained weights of all the ulu v5 models if we come back here and check we're getting some new files added up here which are basically the yellow v5 weights so let's wait for a while while it's indexing yeah you you get that right there we go we have these widths we need to download that all right um we go back to our terminal and wait for a while so that it gets downloaded [Music] let's see we have hulo v5m we're going to use that one so i'm going to close this and then in here i'm going to specify yellow v5 m dot pt and the last uh the next argument is the number of workers i'll choose a 24 and then um let's say uh the the folder that it will create to save the base model for inference to to save that we just uh specify it in here name we say custom data set train or just custom data set i mean i'll just say custom log and then enter okay let's see uh we're going to wait it's it's going to take time to train that uh of course i'm not going to train it on my laptop in here but i just want to show you that um how uh like it has to work you know okay we got something hyper parameter scratch acceptable suffixes dot tml not dot yml okay i think we made a mistake in the hyper parameters without a scratch hmm okay let me just come to here scratch that emo okay scratch dot yaml well i didn't train that on my laptop in here i did trying to turn to in a gpu and then i got the result back the which i'm going to show you right now so training it on your computer will take you like um two three days to be honest but on a gpu it will take like less than uh 30 minutes less than an hour and while it's getting trained uh during training get some coffee take some work and then come back we'll be back soon after training and i'll show you the result okay so we get the epochs we get the bounding box we have the label image size all the informations we get from during the training and also the mean precision accuracy so all right so we train the network um and then after training uh a a new folder will be generated automatically come to that folder which is called runs and you'll have the train and then you have the custom log that we defined uh during the model training setup and in here you'll have the weights of the model the base width for this from this training is saved in here where we use it for the inference purpose uh while testing the network on the test data set you get a confusion matrix which is basically showing you the uh the accurate the the the parts that the model kind of confused like which uh class well basically confusion metrics uh has its own like detail need a detailed explanation so that you'll understand but i just want to give you a high level understanding that whenever you go for the confusion matrix just go for the diagonal part to see the accuracy of the each um model each class for example for the binoculars we have an accuracy of 0.5 50. for the mango we have an accuracy of 0.42 for the tomato we have an accuracy of 0.44 and for the zebra we have an accuracy of 0.69 what does it mean that it means that our model performs very well on zebra then a little bit bad on on tomato and like like so bad 42 percent accuracy on on mango and 50 percent on on biniculars and there are classes where the model couldn't properly classified or what are the classes that the model classified which were not actually to that class so it gives us detailed information about the true positive and true negative information of each classes we have the f1 curve um f1 curve also please check it out uh online if you need a explanation on this uh theoretical concepts uh write it down in the comment section below this video so that i will prepare another video uh describing these scores and then we have the hyper parameter that we that this model used and we have the labels and they are distributions so we have instances for example for biniculars we have 50 images uh for mango we have around less than 200 for the tomato we have the a lot of images 350 for the zebra around 200. and this is the acrylic rillogram for the labels and this is the options that has been used for example hyper parameters number of epochs batch size image size raked if in case we use it we didn't use the uh resume bucket and so on this is the pnpr curve uh again these are these needs to be explained in detail which is out of the scope of this video but uh basically the pr curve is the r curve or the pr curve is used a lot when we want to uh accurately unders like thoroughly uh want to understand why the model is not performing well or in which class the the model is not doing well which is very helpful if we want to uh improve the performance of the model like in here we have for example for the precision and recall of uh the mango we have a get a precision of 28.29 and we have we get a very bad recall so this kind of this kind of like intersection between precision and recall gives you the the pr curve which which gives you um quite detailed uh information about which class the model is doing well on which class and the model is not doing quite well on which class and this is the r curve again if you would like to if you would like me to give you to provide another video on this concepts which are very theoretical please write it down below in the comment section and if it gives enough vote i will come to that and make a video about that and this is the result for their training and validation this is our train batch uh just as an example so that you you'll get an understanding how the label how the images were labeled and this is another batch of the training as you can see the images are um you know like uh different different images with their corresponding class id and this is another batch of the training and finally we have that um so so this is the the validation batch in here we can we can understand like how well the model uh was performing during the validation for example this is zebra uh is it really classifying zebra's zebra and you can see that it's doing well in here zebra and this is binning colors and uh we have to meet in here and in here the model is not doing quite well in here it's it's doing pretty well and so on and this is um up this is the the label and this is the actual prediction so that's that's it and this is the training screenshot i got from um after training so that you can get an understanding of how that's uh the the final outcome is going to be we uh model summary we have 367 layers around 461 four million six hundred twelve forty six million one hundred twenty four thousand four hundred thirty three uh parameters for this uh which took uh 107 g flops and we have the glasses that we use for this which are the binoculars mango tomato and zebra we have the number of images labels and then we have the precision and accuracy of each of this so well that's it for this video if you have any yeah you can use this weight to to check the the to to make inference on on the test set to do that we just need to use the test or the ticked part of so we train the model we get the best and last weight of the model what we need to do is the last step which is the inference basically inference means that we we got the model now we want to give it just a test image and see how is the outcome we don't have a label in here i mean for the test purposes usually we don't need to have a label and we are not going to give the model in any label and we want to see how the model actually works in real world to do that all you need to do is to come to the terminal and then specify this to do that all you need to do is to come to your terminal and then specify the uh the python detect source the source of the images that we have which are the image files and then the weights which are located in run strain custom log weights and based which is in here we come we have runs train custom log weights and based so [Music] and then we give it a custom log again or we can say the tech log and this is going to be custom log or detection log doesn't matter okay let me just give it to detection log detection log and then run it so now we're going to build the layers again and use the trend weights and then give each images each image to the model and then get the result back let's see how the model really does so basically where we have dedicated 57 images for test purpose and we can test it while it's running i think we can get our images in here let me just check coming back to pycharm yep we have our images um coming to detect and then this is the detection log if i open that you see that it detects zebra with uh 0.7 to 74 percentage which is uh amazing it's great look at that uh well it it doesn't do anything good in here and also in this case but in here it can detect tomato with 34 percent of accuracy 14 74 percent of accuracy detecting another tomato it fails to take this tomato but it does detect uh it does detect these tomatoes and of another tomato i don't know why we are getting all the tomato classes in here but anyway we have to wait for a while you see that it's detecting this these tomatoes as well but it's making mistake uh when it comes to to take this as a tomato the model think that uh well the color of this is uh kind of yellow and it's a mango i guess that's why why the ma the model is making mistake even as a human being we might we are going to make a mistake if we see an image like that but anyway we can improve the model i mean we can we can do something and uh by the way we used a very small size of the data set and we use the the medium size of the model of course if you use the large model with more data and then give it a context like in here if you give it a more context that this kind of situation is tomato then the model certainly perform well perform well those kind of improvements are out of the scope of this video but um certainly i will i will add a video about optimizing the model and how to improve the accuracy of the model so let me just check this again are we getting more images here we have yeah we have mango zebra and a lot more let's come back yep so if i check this one the model fails to take this again this one and in here we have mango and so on yep that's it for this video thank you for watching i hope it's helpful and i hope you like it i made it by heart please subscribe to my channel and write down any comment objection or suggestions that you would have also please give a thumb to my video i will appreciate that thank you so much for for watching
Info
Channel: OVision
Views: 12,386
Rating: undefined out of 5
Keywords:
Id: vXYMJFBLzfc
Channel Id: undefined
Length: 126min 47sec (7607 seconds)
Published: Tue Nov 02 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.