Yolov5 with C++ and OpenCV | Converting PyTorch into ONNX format

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] hi everyone welcome back in this video we are going to learn how to use YOLO Volume 5 with C++ and also uh since this is based on python also the model format is Pythor we need to convert this into the Onyx format which is compatible with C++ and open CV so this is the one important one for this yolow Volume 5 for C++ uh first of all maybe we can clone this uh repo into our uh folder and just open a terminal and get clone and just p paste it's going to uh download and after when it is downloading we can talk about the bug there is a bu uh if you directly try in your case if the the uh torch versions is not compatible you are going to get error according to uh this one already I also get because my old the versions were newer but there are some uh package conflict so I at the end I need to uh download I need to get these version so I switched all my torch V version and torch Vision Audio and kind of things into this uh specific format otherwise I was getting always the same error okay this is done I get my yolow volume 5 folder so also I can show you the my pip list so people can if get any error they can copy my uh versions in here the most important one is torch and torch Vision so the others I think not really related but you can check yeah these are my pip versions why this is important uh if you try different versions I think you will get the same error which is mentioned in here okay after this is done uh the second thing is also maybe we can there is the comment I paste okay just copy this one and paste here okay sorry I need to enter to the related folder just paste okay okay what is this is going to do it's just going to download the pytorch model into the folder which is I think after it's done yeah here and it's going to convert to onx but since the versions I switched now they are not supporting the offset version Onyx offset version so again I need to enter to this export Python and in here there's a 17 I need to change this accordingly to 12 which is the last support one and I need to run again now hopefully it's going to give me the onx format so these are some two errors I already uh met so I'm telling you you can easily pass these issues so this is the onx format I need in the code part also I need this cocon names which is the standard a the labels a objects which is you able to detect so these are also another txt I need in the code so basically I need this also I need onx format these two uh models I'm going to use in the code part for the code uh there are you can find many code in the internet which is uh basically just um using this onx format and giving you the result but I already found a solution found a repo in here so I'm using this one I already copyed to my QT and I'm going to explain simply what this is doing but maybe we can see the result first of all okay uh you see high and here is detecting me and also I have a cup sometimes saying banana but most the saying cup we can play with the configurations but yeah it's working fine and it's working faster so uh yeah YOLO is uh you know from the fast the the specific property is being faster but currently I'm using Cuda even I make it CPU okay so currently I was using CPU sorry now I'm going to switch to the GPU okay I think there's not much difference so because in both since this is a light model and YOLO is faster so we are not noticing clearly this is CPU or GPU but yeah if you your you don't have GPU or you you didn't install open CV compatible with Cuda you can even use your CPU uh yeah we can little talk about codar so we can understand what it's doing in here uh in the in this function I'm just inserting all the labels all the object names into this class list and here I'm reading the uh my camera and after this I'm defining a network from DNN of open CVB and here I'm deciding the first of all I'm uploading the model onx model into the uh Network and after that I'm checking I'm deciding which device I'm going to use which back end I'm going to use accordingly for this one for GPU and for CPU this one these two function is just doing inserting and the loading the network and after that uh where is my main okay here after that I'm just entering to a while I'm reading each frame and uh the most important function is this one I'm giving into this function my frame M type of frame and network I'm giving output this detection is including um IDs which is going to be detected at the end from 0 to 79 is going to give me a number and the precious percentage is going to be given and box where it is detected it's going to give me this detection struct Vector accordingly at the end and also I'm giving my class list and let's go to the function um here the standard Network things is doing just blow from image giving the but uh yeah this part I need to talk this is basically just converting your input image into the squared image why squared image because uh YOLO is since uh trained with the squared images squared kind of images 640 to 640 this part is important if you give your input image s squ the detection pressures is going to increase according so this important even you can give the give your image directly without prep processing but uh this guy is doing just converting it to square squared but not directly resizing is not also suggested it's just uh creating a black image squared black image according to the uh bigger which one bigger width or height according to that one is just creating a squar image and after that copying the input image into the square some part is going to remain black but it's not it can be ignorable it's just going to decrease the time but we can ignore it because it's not really uh much so yeah basically converting input image into a squ for precious result and after that as I said is getting the blops from the network and setting the blob uh and with the forward function getting the outputs so this part starting the important and the complicated part but I'm going to try to explain clearly um yeah this X and Y Factor we can talk later uh it's just getting the output data and this output data is going to include maybe I can say 25,200 element how I know that because if I go here the yeah the export by already telling me the shape of the output is going to be like this and uh the every step every result is going to be given as 8 to5 what this 8 to5 is meaning actually um this AC to Five Element the first AC to Five Element is going to be telling like this for example in here from 0 to three it's going to give me the X uh X Center of X of the detected rectangle then Y the second element then width height in here you see the fourth element and the fifth element which is in here uh this is going to give me the confidency but not specific to the label just confidency if this is bigger than any number currently I just defined as 40 percentage so there is something inside the image detected we just going to know that so this is confidency we can say and the following AC the numbers because you see this is five number which is we are talking about we need active five number at the Five Element the first step so the following eight the numbers is going to represent the score of each label each object for example the first one is person the second one is I don't know banana let's say and continue according to the active these are all numbers by the way flat numbers so these are going to give me the confidency of each label so first you need you get a general confidency if you P this step after that for each uh label you can get the um score for each label so this is continue in that way so 85 comes from there and 25 it means it gives you many result many blobs maybe we can call many rectangles so with this data you need to handle and you need to get the desired data unit yeah this part that's why a little complicated but I think I'm clear so we can continue first of all it's getting the general confidency currently this is uh I defined as I think let's see where is that yeah this is 0.4 which is 40 percentage if the general confidence is bigger than that now I can get for example data plus 5 which means uh also with this class name size is 80 so it's going to get the following 8 numbers which is this one and it's going to uh basically just make a calculation for to be able to use this minan Max log uh you can uh check my video about this function I already made I believe and in here is just getting the scores and getting the maximum class score and also getting the uh class ID so then uh it's going to be able to get okay uh this class ID detected with this score and this score should be this is the second threshold you see in here I made it uh is uh is already made uh 0.2 which is 20 percentage this is small that's why at the beginning when we try the code even my cup sometimes is saying banana and kind of so we need to increase this let's say 0.4 yeah after the second is done is now getting the this XY width and height so but you know right these are the um not given as the directly this is for example start from 2 and2 so it's given as flood type of percentage Factor type so you can also check my YOLO video which is Yolo training video in there I'm explaining how this X Y width and height this is standard format of the training things so uh you need to in some way convert those FL out numbers to the meaningful numbers for example left top width and height that's why in here there's a logic about the fact factors X and Y factors yeah then at the end everything is easy just pushing the boxes and then here is just filling this detection struct and everything is done it's coming into the here just making the last step putting rectangle around the object and everything is done so yeah only complicated part in here because people are confusing in here you can find any kind of code like this because there are many in the internet you can use also uh you can copy this guy report as I did yeah this is just you don't need to know the details but in sometimes you need to know because how it's coming to us let's run the code one more we just Chang the score threshold so I believe my cup shouldn't be banana anymore but sometimes still making remote is saying but yeah we can change we can play with this confidence threshold you can play according to us basically I just in this video I just want to because for some projects I need this yolow Volume 5 also I'm going to I'm planning to try others since we are using C+ plus I needed this one but I get as always with it there are some bugs errors conflicts so I just wanted to share with you because yeah I already tried you can easily continue yeah just this buug and that 17 should be 12 yeah okay then maybe also we can make training of this yolow Volume 5 later as like we did before about yolow volume 4 yeah thank you for watching see you in the next videos

Info

Channel: Computer Vision Lab

Views: 222

Rating: undefined out of 5

Keywords: machine learning, computer vision, deep learning, what is machine learning, machine learning tutorial, machine learning tutorial for beginners, YOLOv5, OpenCV, ONNX, OpenCV C++ Usage, Object Detection, Deep Learning, Computer Vision, ONNX Model, Machine Learning, Neural Networks, AI Programming, Tech Tutorial, Image Processing, YOLO Algorithm, Real-Time Detection, Model Inference, Computer Science, Coding Tips, YOLOv5 Usage, OpenCV C++ Tricks, Object Detection Guide

Id: 1JGNWwFj4VA

Channel Id: undefined

Length: 13min 44sec (824 seconds)

Published: Mon Feb 19 2024